Character ' in the document's file name
Hello, Mike
I noticed an interesting bug:
When the document's file name contains a ' character (for exemple in french "Administration
d'AWP pour les postes Windows.html"), the relative paths calling the parent directory (link or img) are not resolved.
Example:
<img src="../Figures/Figure 1.png" />
generates this log:
file:\C\.....\<main directory>\
..\Figures\Figure 1.png can't open input file. Invalid argument.
When I suppress the ' character, Prince displays the document correctly.
Best regards
Boris
I cannot reproduce this issue with Prince 8.0. Which operating system are you running on, and how are you calling Prince?
This will happen if you have an image file name in the HTML with a single quote in it. I have tried with Prince 7.1 and 8.1. I have tried all kinds of encodings, however none seem to work.
HTML:
*snip*
<img style="width:882pt; height:882pt;" src="/foobar/2005_0711_Test '05.jpg">
*snip*
OUTPUT:
prince: /foobar/2005_0711_Test '05.jpg: warning: can't open input file: No such file or directory
/opt/prince/bin/prince --version
Prince 8.1
Copyright 2002-2012 YesLogic Pty. Ltd.
CentOS 5.5 x86_64
Are you running Prince from the command-line? Is the HTML file local, or accessed over HTTP? Are you specifying --baseurl, or does the file contain a <base> element or xml:base attribute?
mikeday wrote:
Are you running Prince from the command-line?
Yes, command line.
mikeday wrote:
Is the HTML file local, or accessed over HTTP?
HTTP
mikeday wrote:
Are you specifying --baseurl, or does the file contain a <base> element or xml:base attribute?
no. Command looks like:
/opt/prince/bin/prince '
http://internal.foobar.com/path/to/render.php?x=1&y=2&etc /path/to/output.pdf
Thanks, that helps us to track down the problem. It seems that the URL resolution in Prince expects ' (apostrophe) to be escaped as %27, then it will work. For example:
<img src="/foobar/2005_0711_Test%20%2705.jpg">
Here the space has been replaced by %20 and the apostrophe by %27. However, after checking the specification I think this is a bug in Prince. There are definitely characters that must be escaped when used in URLs, including double quotes, braces, and plus signs, but as far as I can tell apostrophe should be used unescaped. So we need to fix this on our end. In our defense, it seems that the URL specifications have changed over the years, as has browser behaviour, as illustrated by this discussion of a
similar bug in Firefox.
We have fixed this issue in Prince 8.1 rev 2, and updated packages are now available for Windows, MacOS X, and Ubuntu.
Has this really been fixed?
With the three files attached here, served via HTTP, it doesn't seem to work. I can reproduce the problem either with Prince 8.1-5 or 9.0 under Linux, and invoking it like this:
$ prince
http://localhost/prince/toc.html http://localhost/prince/messages_d%27erreur.html -o test.pdf
the link on the first page to the second page is broken.
- messages_d'erreur.html 0.3 kB
- style.css 0.1 kB
- toc.html 0.2 kB
The original issue was affecting URLs that contain an
unescaped apostrophe. Your example contains an escaped apostrophe, and this is now failing, unfortunately. Another normalisation problem. Can you run it like this instead:
$ prince http://localhost/prince/toc.html http://localhost/prince/messages_d\'erreur.html -o test.pdf
Or use double quotes:
$ prince http://localhost/prince/toc.html "http://localhost/prince/messages_d'erreur.html" -o test.pdf
I've tried several such workarounds, without success. When I do as you propose, I get another failure: There is a warning on stderr
prince: style.css: warning: Couldn't resolve host 'style.css'
and, indeed, the CSS file hasn't been loaded, as is apparent from the resulting PDF.
That's strange, I can't reproduce that error; even if the links break due to the URL escaping, the style sheet is always loaded. And why would it be trying to resolve it as a hostname, when it's a relative URL? Very odd.
Can you run Prince with the --verbose flag and paste the output log here?
Here is the output log:
prince: Loading document...
prince: loading document: http://localhost/prince/toc.html
prince: loading HTML5 input: http://localhost/prince/toc.html
prince: loading document: http://localhost/prince/toc.html
prince: Applying style sheets...
prince: loading style sheet: http://localhost/prince/style.css
prince: loading document: http://localhost/prince/messages_d'erreur.html
prince: loading HTML5 input: http://localhost/prince/messages_d'erreur.html
prince: loading document: http://localhost/prince/messages_d'erreur.html
prince: Applying style sheets...
prince: loading style sheet: style.css
prince: style.css: warning: Couldn't resolve host 'style.css'
prince: Preparing document...
prince: Converting document...
prince: used font: Times New Roman, Regular
prince: used font: Times New Roman, Bold
prince: Resolving cross-references...
prince: Finished: success
Oh, BTW, that log was produced by 9.0. The error message looks different in 8.1:
prince: http:/localhost/prince/style.css: warning: can't open input file: No such file or directory
Notice that one of the slashes has mysteriously vanished.
Thanks, which operating system is this running on?
Debian GNU/Linux 7.1. I'm running the generic static 64-bit binary.
Thanks, it seems this is actually an issue in libxml2 regarding apostrophes that was fixed in version 2.7.8, unfortunately the statically linked Prince binary is using version 2.7.3. We will have to fix this for the next release of Prince. In the meantime, perhaps you could try the dynamically linked binary?
Yes, running the dynamically linked binary with libxml2 version 2.8 fixes it. Thanks, Mike!
We have upgraded the libxml2 version used in all the statically linked Prince packages for Linux, so this problem should be fixed now in Prince 9 rev 2.
As of Prince 9 rev 4 it should be fixed across all operating systems, except those where Prince is dynamically linked with a libxml2 version prior to 2.7.8.