I have a document that has character \u0012. Prince's handling of this character across various DOCTYPEs, charsets, encodings, and document content is confusing. Results vary across messages on STDERR, the exit status, and whether a PDF file is generated.
I'm invoking prince via the following command line:
prince --media=print --no-xinclude --ssl-blindly-trust-server --no-embed-fonts #{file_name} --output #{file_name}.pdf
Description of files:
example1 - xhtml.strict DOCTYPE; using charset=iso-8859-1; character encoded as 
example2 - xhtml.strict DOCTYPE; using charset=utf-8; character encoded as 
example3 - html DOCTYPE; using charset=utf-8; character encoded as 
example4 - html DOCTYPE; using charset=utf-8; raw character
example5 - no DOCTYPE; using charset=utf-8; raw character
example6 - no DOCTYPE; no charset=utf-8; raw character
example7 - xhtml.strict DOCTYPE; using charset=utf-8; raw character
example8 - xhtml.strict DOCTYPE; using charset=utf-8; using <style>; raw character
example9 - xhtml.strict DOCTYPE; using charset=utf-8; no <style>; character encoded as 
example10 - xhtml.strict DOCTYPE; using charset=utf-8; using <style>; character encoded as 
Result for each file:
example1-3, example9:
exitstatus 0
PDF generated with character stripped
STDERR:
prince: example1.html:7: error: htmlParseCharRef: invalid xmlChar value 18
example4-7:
exitstatus 0
PDF generated with character stripped
STDERR:
prince: example4.html:7: error: Invalid char in CDATA 0x12
example8, example10:
exitstatus 1
PDF not generated
STDERRR:
prince: example8.html:20: error: PCDATA invalid Char value 18
prince: example8.html: error: could not load input file
prince: error: no input documents to process
Questions:
Does Prince support this character? If so, how?
Why are there error messages in every case, but in some cases a PDF file is still generated?
Why are the error messages different?
Why is the exitstatus different between the cases?
Why is the exitstatus success (0) for cases where there are error messages?
Is there a way to tell Prince to strip characters it can't handle?
Running tests in attached .zip file:
Run: 'ruby run.rb'
Clean: 'sh clean.sh'
I'm invoking prince via the following command line:
prince --media=print --no-xinclude --ssl-blindly-trust-server --no-embed-fonts #{file_name} --output #{file_name}.pdf
Description of files:
example1 - xhtml.strict DOCTYPE; using charset=iso-8859-1; character encoded as 
example2 - xhtml.strict DOCTYPE; using charset=utf-8; character encoded as 
example3 - html DOCTYPE; using charset=utf-8; character encoded as 
example4 - html DOCTYPE; using charset=utf-8; raw character
example5 - no DOCTYPE; using charset=utf-8; raw character
example6 - no DOCTYPE; no charset=utf-8; raw character
example7 - xhtml.strict DOCTYPE; using charset=utf-8; raw character
example8 - xhtml.strict DOCTYPE; using charset=utf-8; using <style>; raw character
example9 - xhtml.strict DOCTYPE; using charset=utf-8; no <style>; character encoded as 
example10 - xhtml.strict DOCTYPE; using charset=utf-8; using <style>; character encoded as 
Result for each file:
example1-3, example9:
exitstatus 0
PDF generated with character stripped
STDERR:
prince: example1.html:7: error: htmlParseCharRef: invalid xmlChar value 18
example4-7:
exitstatus 0
PDF generated with character stripped
STDERR:
prince: example4.html:7: error: Invalid char in CDATA 0x12
example8, example10:
exitstatus 1
PDF not generated
STDERRR:
prince: example8.html:20: error: PCDATA invalid Char value 18
prince: example8.html: error: could not load input file
prince: error: no input documents to process
Questions:
Does Prince support this character? If so, how?
Why are there error messages in every case, but in some cases a PDF file is still generated?
Why are the error messages different?
Why is the exitstatus different between the cases?
Why is the exitstatus success (0) for cases where there are error messages?
Is there a way to tell Prince to strip characters it can't handle?
Running tests in attached .zip file:
Run: 'ruby run.rb'
Clean: 'sh clean.sh'