Refactored PD4ML API
New class package structure
Since PD4ML v4 the public API classes are moved fromorg.zefer
tocom.pd4ml
package. The main converter fully qualified class name iscom.pd4ml.PD4ML
now. For backward compatibility we created a wrapperorg.zefer.PD4ML
(and accompanying utility classes), which makes possible to use the newest pd4ml.jar in applications compiled with PD4ML v3. The wrapper class translates the old API calls to new ones (where possible).Separated source read and target write methods
In the new API the conversion process is split into two phases: reading/parsing HTML withpd4ml.readHTML(...)
method and target format writing methodspd4ml.writePDF(...)
,pd4ml.writeRTF(...)
andpd4ml.renderAsImages(...)
. The approach allows to read a source document only once and to write multiple document output types as well as makes possible to analyze parsed document metrics (for example, maximal width of HTML content) and to choose the best suitable target paper format.Less dependency on Java AWT
Unfortunately it is not possible for the time being to completely omit Java AWT classes usage in PD4ML: AWT is used to read font metrics, to write to BufferedImage etc. The new API reduces Java AWT dependency in the public API; AWT classes are replaced with better suitable custom ones: ( for examplecom.pd4ml.PageMargins
instead ofjava.awt.Insets
)Utility methods
PD4ML v4 includes a lot of features, previously available only in command-line mode or in third-party tools. Now you can directly from your Java application index font directory, analyze PDF documents, merge PDFs, remove selected pages, apply or reset PDF security settings, update metadata etc.
New Features
Page marks and decoration elements
In addition to well known from previous versions<pd4ml:page.header>
and<pd4ml:page.footer>
we added new proprietary tags<pd4ml:page.background>
and<pd4ml:watermark>
. All of the tags allow you to define page header, footer, background or watermark in HTML and to specify a page scope (front page, even, odd or explicitly specified page number range) to apply to. The tags can be placed directly into source HTML, or they can be applied with corresponding PD4ML API methods.
More info...Plug-in interface for custom tags
With the new PD4ML it is possible to register a custom HTML tag and to assign a custom Java handler class to it. The handler receives parsed tag attributes, tag's outer HTML as a string and graphics context to print to (draw text, graphics primitives etc). Typical applications for the interface are, for example, to plug external SVG and MathML renderers.
More info...Web fonts support
In addition to the regular way of embedding TTFs (taken from a local preconfigured font directory or from fonts.jar), PD4ML supports Web fonts referencing using the standard CSS syntax. If specified, PD4ML downloads the fonts from a provided URL (either remote or local) and uses it in the conversion process. Currently TTF, OTF (with TrueType font outlines) and WOFF font file types are supported. WOFF2 support comes later.Endnotes support
In addition to footnotes support we implemented a proprietary<pd4ml:endnote>
tag. The tag, if present in the source HTML, is substituted with an endnote index, all nested content goes to the end of the document and is represented indexed, similar to footnotes. The feature can be useful for a creating of bibliography section, index of graphs etc.Print and/or screen targeted document watermarking
Now you can specify arbitrary HTML content as a watermark (applying transparency, angle etc properties to it). Additionally the watermark can be targeted for particular media: i.e. no watermark by screen view, but watermarked print output.
More info...PDF/UA support
The new PD4ML architecture has been designed bearing PDF/UA (International Standard ISO 14289 for accessible PDF technology) in mind. Now PD4ML can output Tagged PDF conforms PDF/UA and PDF/A-2a standardsVisual refinements
The new version changes look-and-feel of form widgets (defined with scalable vector graphics now), adds support for rounded borders and partial support for gradient fills.HTML injection
New PD4ML API method allows to virtually inject an arbitrary portion of HTML code right after opening<body>
tag or just before closing</body>
tag of a source document.
More info...Page Number Tag
The new version adds support for a proprietary<pd4ml:page.number>
tag. The tag (without attributes) is replaced with total number of pages in the resulting document. With "of" attribute<pd4ml:page.number of="anchorName">
the tag is replaced with a page number, where located a referenced<a name="anchorName">
or an element withid="anchorName"
More info...
More HTML5/CSS3 Support
HTML5 tags support
The new HTML rendering engine is optimized for HTML5. We do not claim full HTML5 specification support — some features are irrelevant for PDF/RTF conversion, a usefulness of some probably undervalued by us — but the most important tags and features are already there. Thanks to the new architecture, any missing feature can be added with small or moderate efforts.
More info...Full HTML table tags support
We totally refactored the table rendering subsystem and implemented a quite sophisticated table page break logic. Now it also supports all previously ignored table-specific tags, like<caption>
,<tbody>
,<thead>
,<col>
etc and correctly implements fixed table layout (in addition to the default auto table layout). Table layout building logic has been rethought from performance optimization perspective.Selected CSS at-rule directives support
PD4ML v4 introduces support of@font-face
and@page
CSS at-rules. They intended to download and register in the font cache TTF/WOFF fonts and define document-specific target page format (incl. margins) correspondingly.
More info...More CSS functions supported
The improved CSS parser/cascading engine of PD4ML implements new CSS functions for color value computation (rgb(), rgba(), hsl(), hsla()
), opacity control (alpha()
), general calculations (calc()
). More functions are to be supported soon. More info...
Performance Improvements
New optimized single-pass HTML parser
As a significant part of our efforts to improve the software performance we developed an optimized single-pass HTML parser. The parser implicitly performs HTML normalization (of non-well-formed HTML) and builds DOM-like document representation in RAM. The second parsing pass is triggered only in a case the source document overrides document encoding or when the parser encounters a<style>
section nested to<body>
, as the style can potentially affect already parsed part of the documentNew resource cache
All resource requests from PD4ML (to load images, stylesheets, fonts etc) are dispatched via the new cache engine. The cache locally stores (in RAM or in temp dir) frequently used items, tracks their expiration time (if specified by HTTP), cleanups cache RAM if the cache exceeds reasonable size. Also the cache engine solves the JDK issue of flooding temp directory with locked objects, created on eachjava.awt.Font.deriveFont(...)
API callNew font engine
The totally-reimplemented PD4ML font engine does a good job to efficiently lookup the best suitable font from a list of available ones, to match requested font family, face, style and a capacity to render a given text string. The font engine can handle multiple font folders, can deal with downloaded web fonts, can auto-index and use system fonts (optionally filtered by a specified criteria).New PDF/Image output modules
PD4ML implements new PDF and raster image output subsystems, optimized for a better performance and a smaller memory footprint. RTF output module is ported from the latest PD4ML v3 with minimal changes and still shows great results and performance.
Product distribution changes
Apache Maven
PD4ML development process is based on the concept of a project object model (POM) of Apache Maven. Maven allows us to manage project's build, testing, reporting, and documentation from a central piece of information. From customer perspective, the major benefit of the Maven-centricity is an instant availability of the newest versions or nightly built snapshots in our public Maven software repository.
Of course, we kept a possibility for our customers to obtain PD4ML from the usual software download area.Continuous delivery
The new PD4ML development infrastructure is build on continuous delivery (CD) principles. The development process is organized to produce software in short cycles, ensuring that the software can be automatically and reliably released at any time.License API
Since PD4ML v4 we build and deliver identical binaries for all license types. Particular license type-specific features are switched on or off depending on license activation code. The code can be passed directly to PD4ML constructor as a string, or provided to PD4ML API aspd4ml.lic
file.