Refactored PD4ML API

  1. New class package structure
    Since PD4ML v4 the public API classes are moved from org.zefer to com.pd4ml package. The main converter fully qualified class name is com.pd4ml.PD4ML now. For backward compatibility we created a wrapper org.zefer.PD4ML (and accompanying utility classes), which makes possible to use the newest pd4ml.jar in applications compiled with PD4ML v3. The wrapper class translates the old API calls to new ones (where possible).
  2. Separated source read and target write methods
    In the new API the conversion process is split into two phases: reading/parsing HTML with pd4ml.readHTML(...) method and target format writing methods pd4ml.writePDF(...), pd4ml.writeRTF(...) and pd4ml.renderAsImages(...). The approach allows to read a source document only once and to write multiple document output types as well as makes possible to analyze parsed document metrics (for example, maximal width of HTML content) and to choose the best suitable target paper format.
  3. Less dependency on Java AWT
    Unfortunately it is not possible for the time being to completely omit Java AWT classes usage in PD4ML: AWT is used to read font metrics, to write to BufferedImage etc. The new API reduces Java AWT dependency in the public API; AWT classes are replaced with better suitable custom ones: ( for example com.pd4ml.PageMargins instead of java.awt.Insets)
  4. Utility methods
    PD4ML v4 includes a lot of features, previously available only in command-line mode or in third-party tools. Now you can directly from your Java application index font directory, analyze PDF documents, merge PDFs, remove selected pages, apply or reset PDF security settings, update metadata etc.

New Features

  1. Page marks and decoration elements
    In addition to well known from previous versions <pd4ml:page.header> and <pd4ml:page.footer> we added new proprietary tags <pd4ml:page.background> and <pd4ml:watermark>. All of the tags allow you to define page header, footer, background or watermark in HTML and to specify a page scope (front page, even, odd or explicitly specified page number range) to apply to. The tags can be placed directly into source HTML, or they can be applied with corresponding PD4ML API methods.
    More info...
  2. Plug-in interface for custom tags
    With the new PD4ML it is possible to register a custom HTML tag and to assign a custom Java handler class to it. The handler receives parsed tag attributes, tag's outer HTML as a string and graphics context to print to (draw text, graphics primitives etc). Typical applications for the interface are, for example, to plug external SVG and MathML renderers.
    More info...
  3. Web fonts support
    In addition to the regular way of embedding TTFs (taken from a local preconfigured font directory or from fonts.jar), PD4ML supports Web fonts referencing using the standard CSS syntax. If specified, PD4ML downloads the fonts from a provided URL (either remote or local) and uses it in the conversion process. Currently TTF, OTF (with TrueType font outlines) and WOFF font file types are supported. WOFF2 support comes later.
  4. Endnotes support
    In addition to footnotes support we implemented a proprietary <pd4ml:endnote> tag. The tag, if present in the source HTML, is substituted with an endnote index, all nested content goes to the end of the document and is represented indexed, similar to footnotes. The feature can be useful for a creating of bibliography section, index of graphs etc.
  5. Print and/or screen targeted document watermarking
    Now you can specify arbitrary HTML content as a watermark (applying transparency, angle etc properties to it). Additionally the watermark can be targeted for particular media: i.e. no watermark by screen view, but watermarked print output.
    More info...
  6. PDF/UA support
    The new PD4ML architecture has been designed bearing PDF/UA (International Standard ISO 14289 for accessible PDF technology) in mind. Now PD4ML can output Tagged PDF conforms PDF/UA and PDF/A-2a standards
  7. Visual refinements
    The new version changes look-and-feel of form widgets (defined with scalable vector graphics now), adds support for rounded borders and partial support for gradient fills.
  8. HTML injection
    New PD4ML API method allows to virtually inject an arbitrary portion of HTML code right after opening <body> tag or just before closing </body> tag of a source document.
    More info...
  9. Page Number Tag
    The new version adds support for a proprietary <pd4ml:page.number> tag. The tag (without attributes) is replaced with total number of pages in the resulting document. With "of" attribute <pd4ml:page.number of="anchorName"> the tag is replaced with a page number, where located a referenced <a name="anchorName"> or an element with id="anchorName"
    More info...

More HTML5/CSS3 Support

  1. HTML5 tags support
    The new HTML rendering engine is optimized for HTML5. We do not claim full HTML5 specification support — some features are irrelevant for PDF/RTF conversion, a usefulness of some probably undervalued by us — but the most important tags and features are already there. Thanks to the new architecture, any missing feature can be added with small or moderate efforts.
    More info...
  2. Full HTML table tags support
    We totally refactored the table rendering subsystem and implemented a quite sophisticated table page break logic. Now it also supports all previously ignored table-specific tags, like <caption>, <tbody>, <thead>, <col> etc and correctly implements fixed table layout (in addition to the default auto table layout). Table layout building logic has been rethought from performance optimization perspective.
  3. Selected CSS at-rule directives support
    PD4ML v4 introduces support of @font-face and @page CSS at-rules. They intended to download and register in the font cache TTF/WOFF fonts and define document-specific target page format (incl. margins) correspondingly.
    More info...
  4. More CSS functions supported
    The improved CSS parser/cascading engine of PD4ML implements new CSS functions for color value computation (rgb(), rgba(), hsl(), hsla()), opacity control (alpha()), general calculations (calc()). More functions are to be supported soon. More info...

Performance Improvements

  1. New optimized single-pass HTML parser
    As a significant part of our efforts to improve the software performance we developed an optimized single-pass HTML parser. The parser implicitly performs HTML normalization (of non-well-formed HTML) and builds DOM-like document representation in RAM. The second parsing pass is triggered only in a case the source document overrides document encoding or when the parser encounters a <style> section nested to <body>, as the style can potentially affect already parsed part of the document
  2. New resource cache
    All resource requests from PD4ML (to load images, stylesheets, fonts etc) are dispatched via the new cache engine. The cache locally stores (in RAM or in temp dir) frequently used items, tracks their expiration time (if specified by HTTP), cleanups cache RAM if the cache exceeds reasonable size. Also the cache engine solves the JDK issue of flooding temp directory with locked objects, created on each java.awt.Font.deriveFont(...) API call
  3. New font engine
    The totally-reimplemented PD4ML font engine does a good job to efficiently lookup the best suitable font from a list of available ones, to match requested font family, face, style and a capacity to render a given text string. The font engine can handle multiple font folders, can deal with downloaded web fonts, can auto-index and use system fonts (optionally filtered by a specified criteria).
  4. New PDF/Image output modules
    PD4ML implements new PDF and raster image output subsystems, optimized for a better performance and a smaller memory footprint. RTF output module is ported from the latest PD4ML v3 with minimal changes and still shows great results and performance.

Product distribution changes

  1. Apache Maven
    PD4ML development process is based on the concept of a project object model (POM) of Apache Maven. Maven allows us to manage project's build, testing, reporting, and documentation from a central piece of information. From customer perspective, the major benefit of the Maven-centricity is an instant availability of the newest versions or nightly built snapshots in our public Maven software repository.
    Of course, we kept a possibility for our customers to obtain PD4ML from the usual software download area.
  2. Continuous delivery
    The new PD4ML development infrastructure is build on continuous delivery (CD) principles. The development process is organized to produce software in short cycles, ensuring that the software can be automatically and reliably released at any time.
  3. License API
    Since PD4ML v4 we build and deliver identical binaries for all license types. Particular license type-specific features are switched on or off depending on license activation code. The code can be passed directly to PD4ML constructor as a string, or provided to PD4ML API as pd4ml.lic file.