svg filenames with special characters

Stephan
30 Jan 2015

Hi Michael,

We have a problem with svg image filenames in the html document which contains special characters.

All files are downloaded in advance with wget.
The URL of the svg image contains e.g. '\', which is encoded as '%5C' by wget and so part of the downloaded image filename.

Within the html document the "%" is encoded as '%25'. So in the resulting filename in the 'src' attribute contains '%255C' for the original '\'.

This seems to be all correct for me.

But the generated pdf document does not contain the image.

file system:

pic%5Ctest.svg

html document index.html:

<img src="pic%255Ctest.svg" />

If I rename the file removing all special characters and change the 'src' attribute within the html document accordingly, everything works fine.

I found out some strange things trying to solve the problem:

1. Starting prince with --debug reports no error : prince: debug: loaded resource: pic%5Ctest.svg

2. With special charater '/' (%2F) there is no problem!

3. With png images there are also no problems with special characters

Thanks in advance!

mikeday
30 Jan 2015

Which version of Prince are you using? URL processing has changed in recent alpha versions.

Stephan
30 Jan 2015

I tried with Prince 9.0 rev 4 and rev 5.

Stephan
30 Jan 2015

Could you provide alpha version rpms for SUSE (SLES11 , 64-bit)?
Alternativly for opensuse11 (64-bit) , which seems also to work here.

mikeday
30 Jan 2015

Yes we can do this, but they probably will not be ready until later next week.

Stephan
30 Jan 2015

Ok, but is there now a different URL processing for png and svg, which could be a reason for the problem ?

mikeday
1 Feb 2015

Actually I am having trouble reproducing this problem with Prince 9. I have created a file called "test.html" that contains this:

<img src="pic%255Ctest.png" />

The image filename on the filesystem is pic%5Ctest.png, and it works fine when I run "prince test.html". Is this similar to your experience?

Stephan
2 Feb 2015

As I wrote, the problem appears only with svg files.
Aside from that I used Prince in the same way.

mikeday
2 Feb 2015

Ah I see, trying it with an SVG file I can reproduce the problem with Prince 9, but the problem is fixed in the latest alpha version.

Stephan
2 Feb 2015

Excellent!

Annother problem I noticed in this context:
The SVG file requires the '.svg' extension, otherwise the warning 'Unknown image format' message is printed out and the SVG is missed in the pdf.
Again: With PNG files the filename extension are not needed!

Is this also fixed in the alpha version?

mikeday
2 Feb 2015

At the moment Prince does not try to guess whether a file is SVG or not. If it is a HTTP URL, then it will check the content-type header returned by the server, and if it is a local file it will check if it has an .svg extension.

For other image types, it will try loading it in different ways, due to the high number of misidentified image files, eg. PNG images saved with .jpeg extension.

Are you using a different extension for your SVG images?

Stephan
2 Feb 2015

Because we download all image URLs with wget in advance the filenames become the (CGI) URLs.

Local filename e.g.:

lbdoc?dbname=caedb&server=&dbpath=%2Fhome%2Fengin%2Fpicture

Thereby the CGI-Program exports the SVG from a database. This is why our SVG filesnames don't have any extension currently. It would be extensive to change that.
Perhaps you consider to let Prince also do a SVG check. That would be very helpful for us!

Edited 6 Feb 2015 by Stephan

Stephan
5 Feb 2015

Any opinion to that?

mikeday
5 Feb 2015

Yes I think it makes sense, just thinking about the best way to do it.

Stephan
13 Feb 2015

So can we expect that SVG check in Prince 10 together with our other problem solved ?

When will it approximately be released?

mikeday
13 Feb 2015

Hopefully we will have an updated build with the SVG check within a week, unless we are interrupted by unforeseen circumstances.

Stephan
13 Feb 2015

Thanks,
please remember to provide rpms for SUSE (SLES11 , 64-bit).
Alternativly for opensuse11 (64-bit) , which seems also to work for us.

mikeday
19 Feb 2015

New builds are now available for 64-bit OpenSUSE 11 which include the SVG loading change. Please let me know how it goes.

Stephan
19 Feb 2015

Handling of filenames with special characters (%5C) and SVG format test for files without extension works fine.

BUT: Character '?' in filenames is not accepted anymore.

Prince seems to cut filenames at that character.

With

 <img  src="lbdoc?dbname=caedb&amp;server=&amp;dbpath=%252Fhome%252Fengin%252FENGIN&amp;image=P%252FSamplesProject%255CPD%252FSystem%252FLogic%252FAlarm_u_Trend%255CV3.24.3%255CDOC%255CImages%252Ftest%257Cimage%257Csvg&amp;lang=049" />

=>
Prince reports error message:
prince: lbdoc: warning: can't open input file: No such file or directory

If I change '?' to '_', everything works perfectly.

This will be a small correction hopefully.

I attached a tgz with all files.

Just try

prince index.html

to reproduce the bug.

docdir.tgz‎ 73.5 kB

Edited 19 Feb 2015 by Stephan

mikeday
19 Feb 2015

I think this is not a bug, actually. At least in browsers it has the same issue, unless you escape the ? in the URL as %3F.

Stephan
19 Feb 2015

But it is originally the separator between url component path and query string which is never escaped.
The resulting filename comes from the download with wget.
We had no problems with Prince 9 concerning that.

mikeday
19 Feb 2015

Yes, in the original URL it is fine. But as a local file, I think it needs to be escaped, and Prince 9 was wrong.

Some people work around this limitation of wget using the --restrict-file-names option, as described here.

Edited 19 Feb 2015 by mikeday

Stephan
19 Feb 2015

'wget --restrict-file-names=unix' does not escape '?' as it is an allowed character.

You don't want me to use wget --restrict-file-names=windows in our linux enviroment?
(This replaces '?' with '@')

I think Prince 9 was right.

mikeday
20 Feb 2015

Yes, I was suggesting --restrict-file-names=windows, to avoid the ? character. The other approach would be to rewrite the image URLs inside your HTML document. You could use JavaScript regular expressions to do this.

The new Prince behaviour for URL parsing is more correct according to the specification, and also consistent with what web browsers do, so it is generally not possible to interpret the unescaped ? character without breaking other behaviour.

Stephan
25 Feb 2015

Could you then give us a list of the characters which are not allowed and must be escaped or replaced.

Does the "--restrict-file-names=windows" option ensure the acceptance of the file name by prince?

mikeday
25 Feb 2015

Prince follows the same rules for decoding URLs as browsers, so special characters like ? & # and so on need to be escaped.

I think using --restrict-file-names=windows should be sufficient, yes. Although really it would be helpful if wget had more convenient mechanisms for rewriting links to local files.

Forum › Bugs

svg filenames with special characters