Is it possible to kill Prince if an image (or any remote resource) fails to download? If not, what is the best way to check if a resource fails?
Our HTML to PDF conversion includes user submitted images, some of which occasionally fail to download (or time out). If the image doesn't load then the PDF is "wrong" and needs to try again. We don't want to show a broken PDF to our end users.
Ideally, the exit code would be non-zero. But it doesn't look like that's the case. Should I be parsing the output for warnings? If so, is there a recommended pattern I could grep for?
Currently converting this document with structured logging will give this output:
<img src="notfound.jpg">
<img src="http://example.com/notfound.jpg">
msg|wrn|notfound.jpg|can't open input file: No such file or directory
msg|wrn|http://example.com/notfound.jpg|Could not resolve host: example.com
fin|success
If the images are all remote then it may be sufficient to check for any warning message whose location refers to a HTTP/HTTPS URL.
Unfortunately unrecognized/ignored CSS prints out the same
msg|wrn
log. Our customers are able to enter a small snippet of CSS to change their background color and such. So I don't think this is an option for us. Is there any way to get the images to print
That should catch all of them, including "Unknown image format" which is possible when the server incorrectly redirects to a HTML "not found" page instead of returning 404.
The only other HTTP resources we are linking to are external fonts and CSS files. Which, now that I think about it, should also "fail" the PDF. So I think that will work! Thanks for the help.
After some more digging I don't think will work for us.
We need the PDF generation to stop as soon as it encounters an error. Even with setting the HTTP timeout flag our server times out when a few images aren't available for download. For example, five images in a row timeout which adds up to 25 seconds, timing out our server.
Unfortunately making one HTTP request wait for others is inherently risky, as a slow server could make you wait too long even if there is no error. The usual solution is to queue the job and run it in the background, so your server can give an immediate response and then update when the PDF is ready. Another option would be to download the customer images ahead of time, if that is possible.
Background jobs definitely make sense and are something we are exploring. We are trying to avoid this because our customers make a few tweaks and want to view their PDF quickly. They make a small batch of consecutive changes trying to get the output looking right before moving on and actually printing it. So quick response times is important, and we are worried that pushing to a background job would slow this down.
Another related question: is it possible to embed fonts directly into the PDF? We are running on Heroku where it isn't feasible to install fonts to the system. We are currently downloading them with the recommended @font-face decelerations, but that makes a network request. Ideally we would embed them as you recommended with images using data objects.
It is complicated if customers can edit the CSS directly and include links to images on remote servers, ideally if they entered images through the UI then you could download and cache them, eliminating any delay in producing the PDF.
You can embed fonts directly in a style sheet with data: URLs (base64 encoded) if it is not feasible to store them as local files.