Very slow times when doing a simple test

amirraminfar
20 Jul 2015

Hi there,

We are doing a simple test of 1000 sample files that we expect to me around 15ms or less. Our existing tool that is written in Java is about 15ms so we had assumed that it will be faster with Prince. But our initial testing shows that it is closer to 75ms. That's 5x slower and we are hitting our heads trying to figure out what is going on.

Attached is our test. Are there any performance boosts or flags that you recommend?

We are running the same file 1000 time using the follow command

 time sh -c 'for ((n=0;n<1000;n++)); do prince test.html; done'

test.html‎ 0.6 kB

Edited 20 Jul 2015 by amirraminfar

amirraminfar
20 Jul 2015

Hi there,

I have an update. I did the following:

mkdir test
for i in {1..1000}; do cp test.html test/test_$i.html; done
cd test
time prince test_*.html -o output.pdf

The above produces PDF almost 75% faster.

Our code uses the server mode from Java API. It is very crucial for our generation to be around <10ms for us to keep using PrinceXML. The above test shows that the slowness is probably around the loading of PrinceXML. Is it possible to start PrinceXML only once? We need this because we work on single reports in the pipeline. Perhaps a server mode that keeps a process idle until we tell it to shut down. If not, could we ask for feature request?

Any help would be appreciated.

Amir Raminfar
Engineer Manager
Opower

Edited 20 Jul 2015 by amirraminfar

mikeday
21 Jul 2015

Using your loop example it takes about 30ms on my (rather old) desktop PC, and I can speed it up to 24ms by removing the XML junk at the top of the document and parsing it as HTML instead, to avoid checking the DTD catalog and other unnecessary operations.

We will do some further investigation and see if there is any way to get faster than this.

amirraminfar
21 Jul 2015

Hi mikeday,

I am almost leaving for the day. However, we can strip our XML as much as possible and do all kind of optimization. Overall, I am concerned that we may not have full control over the HTML and it won't improve anything. I am working with my lead engineer, and he had a good point. Before, we were using a Java library called FlyingSaucer. FS was written in Java so everything was already loaded and running, as was the rest of our pipeline in Java.

Now that we are in a different process, load and unload will add up. The ultimate solution would be process HTML all the way to the last step and then move all HTML files to one big PDF. We can't do that right now because we are in tight deadline and there is no way to change all moving parts of the pipeline together.

Ideally, we could get some kind of long running process in the background that we send data and receive PDF. How difficult would it be to add that? Is it even possible?

Starting to a get a little worried

Help. I'll check email again tonight.

mikeday
21 Jul 2015

It may be possible, but we will need to check whether that is actually faster, or if there are any other steps we can take to reduce process startup time.

amirraminfar
21 Jul 2015

OK thanks for looking in to this. This is our number one priority right now on our team. We process millions of records so every ms counts.

Here is the time comparison on my Macbook Pro.

1000 separate executions:

$ time sh -c 'for i in {1..1000}; do prince test.html; done'
sh -c 'for i in {1..1000}; do prince test.html; done'  40.20s user 23.58s system 98% cpu 1:04.58 total

1000 pages with one execution:

$ time prince test_*.html -o output.pdf
prince test_*.html -o output.pdf  11.96s user 0.53s system 136% cpu 9.162 total

Almost 4x difference.

mikeday
21 Jul 2015

We are considering two possibilities: running Prince as a server and communicating with it over HTTP, or changing the Java wrapper to prestart Prince processes before they are needed, to reduce startup time. We can try both of these; changing the Java wrapper will take less time to experiment.

mikeday
21 Jul 2015

Is the primary issue a question of latency, or overall throughput?

eg. are you trying to minimise the time a customer spends waiting for one document when they hit the print button, or trying to minimise the time it takes to convert millions of documents?

Because we can use parallelism to increase throughput, but not to reduce latency.

amirraminfar
21 Jul 2015

Our PDF generation is not related to a button click or a web interaction. All of our PDF generation is done in batches. We have clients that have batches of up to 300,000 and some as small as 4,000. These batches are generated weekly. We have SLAs that need to be met. That's why this is such an important requirement for us.

We already do parallelism to increase the throughput. However, I have to make a strong case to use Prince, if prince is 5x slower than FlyingSaucer. Assuming it is 5x slower, we would need to stand up 5x more boxes and play around with our threading model. This might be hard to get approved by our architect team.

For the purpose of performance, I have mostly been comparing linear times with a single box and single thread. We assume multithreading will improve these number and we do that in production.

I like the solutions you suggested. I am available anytime to try new ideas. Do you think prestarting Prince will have signifanct improvement?

Thanks for all your help.

Amir Raminfar, Engineer Manager @ Opower

amirraminfar
21 Jul 2015

mikeday,

I talked to the lead engineer and forgot to mention a couple of things:

- We do usually run up to 20 threads in production. As I mentioned though, I was just comparing apples to apples by working with single threads.
- How will the prestarting work? Right now we call prince once per HTML file. Will preloading prince help here? My concern is that we basically hammer it and want to make sure your test cases are similar to ours.
- When do you think you will have something for us to play with?

Thanks again.

Edited 21 Jul 2015 by amirraminfar

pjrm
21 Jul 2015

As I understand it, the pre-starting idea was for latency rather than throughput -- hence Mike's question of what you're trying to optimize. The information that you have a batch of documents already available is also relevant. So the solution might well look something like your "prince test_*.html" test.

Can you provide a more realistic example of input, so that we can see what things are taking time? (Remember that this is a publicly accessible forum, of course.)

amirraminfar
22 Jul 2015

Hi pjrm,

We are really stuck here. Our last test shows our old pipeline taking 8ms and prince taking 75ms. I checked with legal and generally it won't be a problem to share sample html, however, our product is not released yet and it would make it very hard to share.

Are there any solutions you can think of? This is almost 9x slower. That's not very good.

In regards to doing something like prince *.html. I am going to start a new thread because that feels like a bug with the API in Java.

amirraminfar
22 Jul 2015

pjrm, what about using fork? I think the majority of the loss is reading fonts, reading license, applying stylesheets. So if you can fork multiple process after all of that, then we could technically save time here?

Edited 22 Jul 2015 by amirraminfar

Forum › Bugs

Very slow times when doing a simple test