HTML to PDF / DOCX / RTF Java converter library › Forums › PD4ML v3 Archived Forums (Read Only) › Troubleshooting › Large PDF Issue
- This topic has 4 replies, 3 voices, and was last updated May 25, 2012
18:11:29 by BAC.
-
AuthorPosts
-
February 3, 2012 at 12:59#26674
I am using pd4ml within asp.net to output reports to PDF for my users. With very large reports, 500+ pages, the product is running extremely slow and sometimes crashes the web services on the server. If it runs to completion, it may take 25 minutes! In contrast, the same report run on the same server using crystal reports to output to PDF takes a few seconds. Any thoughts? Thanks!
February 3, 2012 at 13:57#28857I hope you compared similar functions of the tools: HTML-to-PDF conversion.
The PDF output itself by PD4ML is quite quick: 2-3% of the conversion time.
But HTML rendering – the first phase of the conversion – is a resource consuming task.PD4ML does not use native HTML rendering engines (like Webkit or Mozilla), PD4ML is managed .NET code (and 100% Java). Managed code has its benefits and, of course, disadvantages like performance penalties.
For example it instantiates a .NET object even for any standalone whitespace. Bearing in mind generic .NET overhead in CPU/RAM usage, we simply would not recommend to convert such big documents.
If you definitely need that, I would recommend to revise the document layout. First make sure the document is not a huge table. PD4ML does all the layout of all the pages in memory, before it writes anything out. Any cell on, let’s say, page #350 whose width is a bit wider than previous cells of the same column requires re-layouting of previous 349 pages – as it impacts the entire table layout.
So split big tables to smaller ones.
Second, try to avoid nesting of tables where it is possible. Each table cell layouts 3 times: MIN, MAX, OPTIMAL. Each table cell of a nested table layouts 9 times. If the nesting level is 2, a cell layouts 27 times etc.
May 23, 2012 at 15:07#28858We’ve been using the Java version of PD4ML for several years and it seems that we’re hitting a wall at somewhere over 1000pgs. For example, 500pg PDF might take about 9 minutes, which we find acceptable, however, a 1700pg PDF might take several hours. We’re trying to resolve this issue and any suggestions would be appreciated. Here are some statistics for this example:
1) We’re using JSP to produce HTML that is 16-17MB in size
2) There are only a few small images (~100k in size)
3) The resulting PDF is about 7.5MB
4) We’re calling PD4ML.render(java.net.URL url, java.io.OutputStream os)
5) PD4ML v3.80fx3After upgrading to v3.80fx3, we implemented the PD4ProgressListener to try to profile the process. The result is that it’s spending all that extra time in “step 92: layouting…”. We tried saving the HTML to a static file (i.e. Mozilla: SaveAs) then used PD4ML.render(java.io.InputStreamReader isr, java.io.OutputStream os, java.net.URL base) to create the PDF file and it seemed to create the PDF in about 9 minutes (acceptable). We also confirmed that it’s taking about 4 minutes to produce and stream the HTML from JSP. Do you have any idea why would it go from 9 minutes for the static HTML to several hours for the same HTML streamed (using a URL)?
May 23, 2012 at 15:22#28859A usual reason of such problems is that the document layout is built as a single huge table. If it is your case, try to split it to smaller ones.
Also sometimes a borderless table with a single cell is used to limit horizontal width of content. If it is your case, replace it with a
May 25, 2012 at 18:11#28860It looks like we have found the problem. In our CSS, we have “position: relative” style on a
tag that wraps almost the entire document (~1700 pages). So I’m guessing that that after the document was basically created, this was causing PD4ML go back and try to shift every character or element in the PDF. Once we removed this stye then everything seemed to work fine and it generated the PDF from HTML in a reasonable amount of time. -
AuthorPosts
The forum ‘Troubleshooting’ is closed to new topics and replies.