Hello everyone,
I'm experiencing some problems with character encoding, specifically with an UTF-8 encoded HTML file containing accented characters such as à, è, ì, ò, ù and the like.
I'm using a Python script to produce an HTML 5 file, writing data to disk with the encoding='utf-8' argument of the write() Python function. The final HTML 5 file contains "<meta charset=utf-8>" as its encoding declaration. Then, I produce a PDF file from that HTML file using Prince, but every single accented character gets printed as a couple of different characters (clearly a sign of encoding problems). Since the document is written in Italian, the problem is quite annoying.
Note that it's not a font problem: in fact, if I substitute e.g. à with à in the HTML file, the à character gets correctly printed in the final PDF file produced by Prince.
How can I solve this issue? I'm running Prince 7.0b1 on Ubuntu Linux 9.04 and on MIcrosoft Windows 2000 SP4. Both environments are giving me the same problem.
I'm experiencing some problems with character encoding, specifically with an UTF-8 encoded HTML file containing accented characters such as à, è, ì, ò, ù and the like.
I'm using a Python script to produce an HTML 5 file, writing data to disk with the encoding='utf-8' argument of the write() Python function. The final HTML 5 file contains "<meta charset=utf-8>" as its encoding declaration. Then, I produce a PDF file from that HTML file using Prince, but every single accented character gets printed as a couple of different characters (clearly a sign of encoding problems). Since the document is written in Italian, the problem is quite annoying.
Note that it's not a font problem: in fact, if I substitute e.g. à with à in the HTML file, the à character gets correctly printed in the final PDF file produced by Prince.
How can I solve this issue? I'm running Prince 7.0b1 on Ubuntu Linux 9.04 and on MIcrosoft Windows 2000 SP4. Both environments are giving me the same problem.