Forum How do I...?

UTF encoding prince: warning: no glyphs for character U+5143

stevenbristol
I am getting:

prince: warning: no glyphs for character U+5143, fallback to '?'
prince: warning: no glyphs for character U+5143, fallback to '?'
prince: warning: no glyphs for character U+30DC, fallback to '?'
prince: warning: no glyphs for character U+0634, fallback to '?'

when I run:

cat i.html | prince --input=xml - -o i.pdf -v --debug

on the following file data:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "THIS_SHOULD_BE_PROPER_BUT_THE_FORUM_WONT_LET_ME_SUBMIT_A_LINK">
<html xmlns="THIS_SHOULD_BE_PROPER_BUT_THE_FORUM_WONT_LET_ME_SUBMIT_A_LINK">
<head>
<meta http-equiv="Content-Type" content="text/xhtml; charset=UTF-8" />
</head>
<body>
<p>
<strong>line</strong>

<strong>1.0 x 元 1.00 = 元 1.00</strong>
</p>
<p> ボش
</p>
</body>
</html>



Can someone please help me figure out what to do to fix this?
stevenbristol
Here is the proper html for the file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
mikeday
Add a Chinese font to the font-family property. For example, on Linux you might try this:
body { font-family: AR PL KaitiM GB }

On Windows the standard Chinese font is usually MingLiU, but Prince has issues loading this font at the moment unless you reference it explicitly by filename, like this for example:
@font-face {
    font-family: MyMingLiU;
    src: url("\\Windows\\Fonts\\mingliu.ttf")
}

body { font-family: MyMingLiU }
stevenbristol
Thanks for the response Mike.

This is a very slimmed down page. The real page is dynamic and I won't have any idea what font the data should be using. How can I handle that situation?
mikeday
You can specify multiple fonts in the font-family property, eg.
font-family: Times New Roman, AR PL KaitiM GB }

Prince will try each one in turn, which allows text in different scripts to be intermixed.

Note that in the general case, you still have a problem: you really need to choose the font based on the language, as the same character can look different depending on whether you are using Simplified Chinese (Mainland), Traditional Chinese (Hong Kong, Taiwan), or Japanese.
stevenbristol
How does the web browser choose the correct font? Can Prince do the same thing?

I'm afraid that this forces me to look for a new alternative to pdf generation. I just can't think of a way to make this usable.
mikeday
The web browsers have a big list of fonts for each language and operating system which they choose from, taking into account user preferences. We might be able to add a mechanism like this to Prince, which could make life easier than choosing the fonts yourself.

Are you formatting arbitrary web pages, or are they generated by a system under your control?
stevenbristol
I'm pdf-ing a web page from my website, so I have complete control over the it.
mikeday
If you know the languages you're using, and the system that you're running on, then it shouldn't be too difficult to choose appropriate fonts to make everything work.
stevenbristol
But the point is I don't know the languages. My app allows users to input in any language they'd like and the invoice prints in that language. It works great in the browser and so should also work in the pdf. The thought of forcing users to select their font because the pdf tool isn't as smart as the browser seems like a hack and I need to find a better solution.

I wonder if I can ask the browser what the default font is and pass that to the pdf?
mikeday
There is no single default font, but rather a list of fonts. Times New Roman is a standard default, which covers languages using the Latin, Greek, Cyrillic, Arabic and Hebrew scripts. Add three more fonts for Chinese, Japanese and Korean and you will be supporting a sizable subset of the world's languages. (Prince does not yet support Hindi, other Indic scripts or Thai, but these are on the roadmap).
subimage
Mike - I'm getting the same thing on my princeXML installation with the Chinese Yuan character, and some others.

What's the full list of font-families I should be importing so I'm covered for the largest amount of languages possible?
mikeday
Which operating system are you running on? For Linux, "AR PL KaitiM GB" is a nice simplified Chinese font, while "MingLiU" is the standard on Windows, although unfortunately at the moment it must be referenced via a @font-face rule due to limitations handling TrueType Collection fonts. Beyond Chinese, there are dozens of different fonts for dozens of different scripts.