Forum Bugs

Strange tagged PDF output from HTML documents with embedded SVG

David J Prokopetz
Here's an odd one: when converting an HTML document which contains an embedded SVG image to PDF using the command line flag --pdf-profile="PDF/UA-1", in the resulting tag tree, the <Figure> tag corresponding to the SVG image ends up with a text node inside, containing the text "PathPathPathPathPath", or words to that effect.

The text "Path" appears to be repeated once for each element within the SVG file; curiously, the result is always "Path" regardless of whether the SVG in question contains any <path> elements. A minimal example (and a screenshot of the result) is attached; as you can see, the SVG file contains three elements, and in the resulting PDF, the <Figure> element in the tag tree contains the text node "PathPathPath".

This has actually been happening for the last several versions, but I just got around to trying to fix it now, and I can't seem to figure out how to stop this from happening. Any help?
  1. pdfua-svg-test-results.png5.4 kB
  2. pdfua-svg-test.html0.3 kB
  3. pdfua-svg-test.pdf5.2 kB
  4. pdfua-svg-test.svg0.5 kB
wangp
The "PathPathPath" is just how Acrobat shows that the content of the SVG image is part of the <Figure>. It's not related to SVG <path> elements or PDF/UA-1.
David J Prokopetz
My understanding of the PDF/UA spec is that it's only appropriate for a <Figure> tag to contain a child text node if the figure in question actually contains human-readable text. Indeed, if we add a <text> element to the SVG in question, the content of that element appears as a text node within the <Figure> tag as expected – but the garbage repetitions of the word "Path" are also present.

Edited by David J Prokopetz

wangp
As you know, every piece of content on the page (text or graphics) has to be accounted for in tagged PDF. If the SVG content was not marked as part of the <Figure>, it would have to be treated as some sort of artifact, which is incorrect.

You can compare the behaviour of Acrobat on the PDF/UA Reference files, downloadable here:
https://pdfa.org/resource/pdfua-reference-suite/
pdfua2.png
  1. pdfua2.png83.9 kB
David J Prokopetz
Hm. Suffice it to say that I disagree with the interpretation of the spec which concludes that this is an adequate representation of the SVG content, but if that's what the reference documents say, then the problem clearly isn't on your end. I'm pestering the wrong party – apologies for that!