This might be more of a how-do-I than a feature request, but I'm *reasonably* certain I'm not overlooking it this time: it would be handy to have a way to ignore HTML tags that have no semantic significance and exist only for styling purposes when generating tagged PDFs.
To provide an example, we've got a document where level two headers have very complex border art that requires a couple of nested spans to correctly apply it - something like this:
(Note that the nested spans don't actually exist in the HTML document in this case - they're inserted via Javascript immediately before PDF conversion using the --script command line argument, if that makes any difference.)
In a tagged PDF, this results in the following tag hierarchy:
That's not wrong, per se, but some of the pickier validators may not like the fact that there are a couple of semantically null Span tags floating around in there when the document semantics would be more accurately reflected by:
I'm not sure what an appropriate mechanism for handling cases like this would look like. Possibly some sort of "none" or "exclude" value for the prince-pdf-tag-type CSS property?
To provide an example, we've got a document where level two headers have very complex border art that requires a couple of nested spans to correctly apply it - something like this:
<h2><span><span>Introduction</span></span></h2>
(Note that the nested spans don't actually exist in the HTML document in this case - they're inserted via Javascript immediately before PDF conversion using the --script command line argument, if that makes any difference.)
In a tagged PDF, this results in the following tag hierarchy:
<H2>
+--<Span>
+--<Span>
+--Introduction
That's not wrong, per se, but some of the pickier validators may not like the fact that there are a couple of semantically null Span tags floating around in there when the document semantics would be more accurately reflected by:
<H2>
+--Introduction
I'm not sure what an appropriate mechanism for handling cases like this would look like. Possibly some sort of "none" or "exclude" value for the prince-pdf-tag-type CSS property?