Forum Bugs

PDF Tagging

carl@media.org
I routinely use Adobe Acrobat to add tags to PDF files that come from HTML source. Usually, this is straightforward, but a bit tedious.

I have one source document however, that is generating very weird behavior. Here is the HTML source:

https://law.resource.org/pub/eu/toys/en.petition.html

Here is the untagged PDF:

https://law.resource.org/pub/eu/toys/en.petition.pdf

When you run the Adobe "add tags" command, it adds a bunch of hyphens at the end of many, but not all lines. A screenshot is attached.

I've never seen this behavior on other files, but I do get the same behavior on files using the same style sheet. I've tried upgrading Acrobat Pro to DC but get the same behavior.

Anybody else witness this behavior? Any clues as to something in my style sheet or any Prince/Adobe interaction that might be causing this?
  1. Screen Shot 2015-09-05 at 10.28.19 AM.png80.9 kB
    Screenshot of a page run through Adobe.
rpilkey
I've seen this, but I didn't figure out a solution. I think it's because Acrobat adds tags data using the "Times-Roman Type 1" font. You will see that the font gets added in the Properties window (CTRL-D).

If you run the pre-flight "List text using non-embedded fonts", each dash is listed.

But if you "embed all fonts", it doesn't fixt it.

Maybe if you add tags on a machine that has a font of that name it will work.

This might add some information: https://forums.adobe.com/thread/722466

mikeday
If it's a font issue, it might be worth trying the latest build that uses the PostScript name for fonts and see if that helps at all.
carl@media.org
I downloaded your 20151013 alpha build. Same issue. The preflight "list non-embedded fonts" gives me errors for "Times-Roman 13.0 pt Type 1 not embedded Black (1.0) overprint: Off".
mikeday
Hmm, Times-Roman is one of the PDF built-in fonts, although it's rarely used these days. I have no idea why Acrobat is deciding to add hyphens here or there, is it accurately figuring out possible hyphen locations based on checking dictionary words?

Just another reason for us to hurry up and support tagged PDF directly in Prince.