I'm pretty new to both Prince and Lambda but I'm noticing something quite weird—the very first time I run prince on a new Lambda it works fine, but on repeated runs within the next few minutes it finishes almost immediately but produces an incomplete PDF. The file cuts off in the middle of a table row after about 100 pages (the document should be about 2,500 pages).
If I wait >7 minutes to ensure I'm getting a cold Lambda, it works again (once).
I'm using the latest Lambda package (Prince 14.2) and the Lambda's resources are maxed out.
Sounds unusual. I would have expected the opposite to happen i.e having it faster and working once the Lambda was warm.
Are you able to try smaller documents to see it it's not a timeout issue? Or maybe incrementally remove parts of the document until it starts working again, and then we can debug from there?
It's definitely not a timeout issue, I'm able to reproduce it now with much smaller documents and it happens even with runs that only take a couple seconds.
I'll keep fiddling with removing different parts of the document, but here's where I am so far:
with a 160KB HTML file it works consistently, cold start or warm
with a 300KB HTML file (107-page PDF) I can reproduce this error repeatedly, but it's now intermittent—sometimes happens on warm starts but sometimes works fine
even with the 300KB HTML file I can't get it to ever fail on a cold start
almost the entire document is one HTML table, but splitting this out into about 20 tables so they're each shorter doesn't seem to make a difference (still sometimes works, sometimes doesn't)
my lambda is running Node 14, Prince 14.2 with the basic package setup downloaded from the website and extracted directly in the lambda directory (not a layer), and has 10GB RAM, 10GB ephemeral storage, and the max timeout (15min)
More details below, but let me know if there's anything else I can do to help debug! Thanks.
- Jacob
Here is the entire Node script running on the Lambda—it just calls prince with the filenames passed in at runtime.
I'm attaching the input HTML, one correct PDF, and one incomplete PDF. You can see that the incomplete PDF ends on page 60 in the middle of a table row and the table of contents entries that should link to pages at the end of the document say page 0.
Here are the prince logs for both those runs on Lambda.
Quick additional note: removing the unsupported 'text-decoration-line' CSS property doesn't fix it, and the incomplete PDFs are always slightly different sizes and cut off in a different place every time. I noticed this testing larger documents too—the first one would work but the following ones would cut off randomly after approximately 100-150 pages but never in the same place.
The PDFs are not corrupted (not cut off prematurely) so the next thing I would check is that the input being fed into Prince is complete. I don't know Node or AWS Lambda at all, but perhaps when you download the input from S3 into the temporary file, you need to flush the pipe or something like that?
to finish running, so Prince was being invoked with just part of the HTML file.
I hadn't seen your comment @alfie but thanks, that's what I started doing, console logging a ton of stuff and trying to print file metadata. I actually couldn't get the file size to reliably show up correctly (even now that it's working), but a bunch of my log messages were out of order which is what eventually helped me pinpoint this.
I would guess the reason it was always about 100-150 pages was that that's how much HTML could be downloaded and piped into the file in the milliseconds before Prince got to it.
Forcing the script to wait for the stream to close appears to have fixed everything:
await new Promise(resolve => {
download.Body.pipe(createWriteStream(html)).on('finish', resolve)
})
This is what I get for just skimming the Node docs on streams.
Yeah that has to be it... I knew cold starts took time to 'boot up' but do they also run slower?
Once the handler function actually runs, I would think it would take the same amount of time to get from 'starting to download the file' to 'starting prince', unless the whole thing runs a bit slower or prince needs to set some stuff up the first time it runs.