HTML to PDF converter for Java and .NET


PD4ML Web: PD4ML API for PhantomJS converter


PD4ML Web is an alternative HTML to PDF conversion approach, based on PhantomJS (Qt+WebKit) runtime. It implements the well-known PD4ML Java API (with minor differences) and allows you to switch to another converter engine with minimal efforts if your application hits some generic limitations of  PD4ML Java.

PhantomJS/WebKit HTML renderer of PD4ML Web has a number of advantages over the regular PD4ML: it supports JavaScript, offers a better coverage of HTML/CSS standards, a better performance by a conversion of extra big HTML documents. In general it can be used for web sites capture - the domain where PD4ML Java is not strong enough, primarily due to a lack of JavaScript support.

PhantomJS is an open source BSD licensed software; it can be compiled for a variety of platforms, including Linux/UNIX headless environments.

Unfortunately PD4ML Web/PhantomJS conversion approach has also some drawbacks. For an instance, each conversion request forces JVM to start a new PhantomJS process, which is expensive from time/CPU resource consumption perspective. More Pros and Cons are summarized in the comparison table below.


How to switch to PD4ML Web

PD4ML Web API is identical to the regular PD4ML API, but located in org.zefer.pd4ml.web.PD4ML package. The first step would be to add PD4ML Web JAR to the classpath and to change import directive correspondingly from:

import org.zefer.pd4ml.PD4ML;
import org.zefer.pd4ml.web.PD4ML;
PD4ML Web class constructor expects a path to PhantomJS executable (i.e. phantomjs.exe) as a parameter:
PD4ML pd4ml = new PD4ML( "tools/phantomjs.exe" );
That is it. Now it should generate PDFs using the alternative converter engine. After the changes PDF layouts most probably will slightly (or sometimes seriously) differ from previous ones, some API calls take no effect. The comparison table should explain why.

PD4ML vs. PD4ML Web

PhantomJS PDF generator provides only basic PDF output. Some of the missing features are added on post-processing phase by PD4ML Web. Also the post processing fixes a number of PhantomJS's PDF generation issues (like corrupted font kerning). The table tries to summarize the differences and comments some of them.


Feature PD4ML PD4ML Web PD4ML Web comments
HTML 3 * * In general PD4ML Web provides a better HTML/CSS standards compliance, that makes it a good choice for tasks, like web site capture.
HTML 4 * (with some limitations) *
HTML 5 - (only selected tags supported) * (with some limitations)
CSS 1 * *
CSS 2 * (with some limitations) *
CSS 3 - (some properties/selector types supported) * (with some limitations)
JavaScript/DOM - *  
SVG * (with some limitations) * (with some limitations)  
HTML to Image * (PNG, multipage TIFF) * (PNG) More image types to be supported
HTML to RTF * - To be supported soon
Table of contents * - To be supported soon
Hyperlinks * -  
PDF bookmarks * -  
PDF headers / footers with images * *  
Variable height PDF headers / footers * - No workaround
Secured PDF * - To be supported soon
PDF metadata (author, title, keywords etc) * *  
Multicolumn PDF layout * -  
Conditional page breaks * - To be supported soon
PDF/A output * (with PD4ML DMS edition) -  
PDF merge * *  
Dynamic page orientation and format change * - No workaround
Output page range * - To be supported soon
PDF forms * -  
Footnotes * ?  
PDF attachments * - To be supported soon
Fit document to a single page * -  
Custom resource loaders * -  
PDF action handlers * - To be supported soon
Conversion progress monitoring * -  

"To be supported soon" label marks issues we currently work on. "No workaround" marks issues we currently see no solution/workaround for.

Patched version of PahantomJS for PD4ML Web

You may always download a pre-built version of PhantomJS from the official site. However there is an important patch has not been applied to the actual version (at the moment of writing). The patch disables automatic content scaling and makes possible a "font kerning problem" workaround. PD4ML Web distribution includes a patched version of PhantomJS binaries. If you prefer to use an "official" version of PhantomJS, set patchedPhantomJS parameter of PD4ML constructor to false
PD4ML pd4ml = new PD4ML( "tools/phantomjs.exe", false );
In the future we plan to patch some more issues, inherited by PhantomJS from WebKit/Qt, primarily the missing of hyperlinks support.
Additional information
  1. PhantomJS official site
  2. PD4ML Web download
  3. PD4ML API reference


Copyright ©2004-21 zefer|org. All rights reserved. Bookmark and Share