<< back

PD4ML: HTML-to-RTF conversion

The main purpose of PD4ML is generation of PDF documents from HTML templates. However it also can output some other formats, including RTF.

The feature has been initially developed as an option to produce editable document drafts (for remarks, annotations etc). But now it's grown up to a mature product.

A switch to RTF generation can be done with one of the following API calls:

pd4ml.outputFormat(PD4Constants.RTF);
// or optionally...
pd4ml.outputFormat(PD4Constants.RTF_WMF);
The equivalents in JSP taglib:
<pd4ml:transform ... outputFormat="rtf"> ... </pd4ml:transform>

<pd4ml:transform ... outputFormat="rtfwmf"> ... </pd4ml:transform>
(in the case the transform tag automatically sets corresponding Content-type HTTP header "application/rtf")

In the command line tool:

java -Xmx512m -Djava.awt.headless=true -cp ./pd4ml.jar Pd4Cmd <URL> 1200 -out doc.rtf -outformat rtf

java -Xmx512m -Djava.awt.headless=true -cp ./pd4ml.jar Pd4Cmd <URL> 1200 -out doc.rtf -outformat rtfwmf
The only difference between RTF and RTF_WMF is in embedded images: with RTF it embeds to RTF images "as is": PNG, JPEG etc. In RTF_WMF mode it converts al images to WMF format for compatibility with WordPad.exe. As a drawback of the image compatibility is a significantly bigger output file size.

PD4ML is able to convert from rendered HTML layout to RTF the following elements:

  • Page margins
  • Text styles and fonts
  • Text backgrounds
  • Text indentation
  • Tables (with correct table nesting). It supports col- and row-spans, table and cell backgrounds, cell paddings. Border style (color, width) is not supported for the time being.
  • Images
  • Hyperlinks (external and internal), image hyperlinks
  • Headers / footers. There is a possibility to define individual header and footer for title page.

Despite the fact RTF format is quite old and standardized, only few viewers implement all its features. For example on MacOS platform tables appear corrupted (as a set of text paragraphs) and images are not shown at all. MS Word probably is the most features-rich RTF viewer/editor application.