General notes

There is a set of simple rules that could help you to author HTML code compatible with PD4ML:

  • Put <style> section only to <head> area of HTML.
  • Avoid to use XHTML-like syntax <tag/> for single tags, especially in <head> area. (Irrelevant since PD4ML v3.x)
  • Do not use <span> tags that do not belong to HTML3.2 spec. Use <div> instead of them. (Irrelevant since PD4ML v3.x)
  • References to multiple styles class=”style1 style2″ are not supported. (Irrelevant since PD4ML v3.x)
  • valign=”center” table cell attribute value is invalid, however it is supported by MS IE. Use correct setting valign=”middle”
  • PD4ML does not support table borders style control. (Irrelevant since PD4ML v3.x)

Good luck!

PDF/A Output

There is a special version of PD4ML with PDF/A output support: PD4ML Volume DMS Edition (starting from v360).  The version introduces a new PD4ML API call generatePdfa(boolean enable).

In PDF/A output mode PD4ML generates all needed document metadata and disables some features, which are not allowed by PDF/A format: for example document encryption. Also PDF/A requires all used fonts are embedded into the resulting PDF. If TTF embedding is not switched on, PD4ML in PDF/A mode implicitly enables it and embeds default TTF fonts.

PDF/A generation mode requires pd4ml_rc.jar is in the same directory where pd4ml.jar is.

PDF/A generation errors or warnings may be obtained this way (after pd4ml.render() all)

PD4ML.StatusMessage[] msgs =
(PD4ML.StatusMessage[]) pd4ml.getLastRenderInfo(PD4Constants.PD4ML_PDFA_STATUS);

for ( int i = 0; i < msgs.length; i++ ) {
System.out.println( (msgs[i].isError() ? "ERROR: " : "WARNING: ") + msgs[i].getMessage());
}

Table of contents

PD4ML allows to insert into the generated PDFs a table of contents (TOC) built from <H1> – <H6> headings hierarchy.

There is a special tag <pd4ml:toc> intended for that. TOC can be inserted into any document position. Only one TOC object is allowed. <pd4ml:toc> tag is ignored in multi-source PDF generation scenarios (when an array of URLs is passed to be converted to PDF).

<pd4ml:toc> can be parameterized with pncorr attribute. The numeric value given in the attribute allows to correct page numbers in the TOC. The pncorr value will be added to TOC’s page numbers.

Internally the TOC table is represented with an HTML table like below:

<table class="ptoc-table" cellspacing="0">
<tr>
<td class="ptoc-left-col"><a class="ptoc-link" href="#ptoc_1">
<div class="ptoc1-style-left">Chapter 1<pd4ml-dots></div></a></td>
<td class="ptoc-right-col"><a class="ptoc-link" href="#ptoc_1">
<div class="ptoc1-style-right">1</div></a></td></tr>
</tr>
<tr>
<td class="ptoc-left-col"><a class="ptoc-link" href="#ptoc_2">
<div class="ptoc2-style-left">Chapter 1.1<pd4ml-dots></div></a></td>
<td class="ptoc-right-col"><a class="ptoc-link" href="#ptoc_2">
<div class="ptoc2-style-right">2</div></a></td></tr>
</tr>
<tr>
<td class="ptoc-left-col"><a class="ptoc-link" href="#ptoc_3">
<div class="ptoc2-style-left">Chapter 1.2<pd4ml-dots></div></a></td>
<td class="ptoc-right-col"><a class="ptoc-link" href="#ptoc_3">
<div class="ptoc2-style-right">2</div></a></td></tr>
</tr>
<tr>
<td class="ptoc-left-col"><a class="ptoc-link" href="#ptoc_4">
<div class="ptoc3-style-left">Chapter 1.2.1<pd4ml-dots></div></a></td>
<td class="ptoc-right-col"><a class="ptoc-link" href="#ptoc_4">
<div class="ptoc3-style-right">3</div></a></td></tr>
</tr>
</table>

The following style sheet is applied to TOC by default and can be overriden:

.ptoc1-style-left { margin-left: 0 }
.ptoc2-style-left { margin-left: 16 }
.ptoc3-style-left { margin-left: 32 }
.ptoc4-style-left { margin-left: 48 }
.ptoc5-style-left { margin-left: 64 }
.ptoc6-style-left { margin-left: 80 }
.ptoc-table { border: 0; width: 100% }
.ptoc-left-col { width: 99%; padding-right: 0 }
.ptoc-right-col { text-align: right; padding-left: 0; vertical-align: bottom }
.ptoc-link { text-decoration: none; color: black }

 

Specifying watermark image from Java API

...
PD4ML html = new PD4ML();
...
PD4PageMark header = new PD4PageMark();
header.setWatermark( "images/logo.gif", new Rectangle(10,10,120,25), 50 );
html.setPageHeader( header );
...

Defining header/footer in JSP

...
<pd4ml:header areaHeight="50">
<font face="Arial" size="2">Header test.<br>
$[page] of $[total]</font>
</pd4ml:header>
...

Defining header/footer from Java API

Definition of headers and footers with HTML is straightforward, the following specifics should be taken into account.

  • HTML header or footer definition data overrides pageNumberTemplate and titleTemplate properties and all other associated settings.
  • ${page}, ${total} and ${title} placeholders are populated as before with current page number, total number of pages and document title.
  • Base URL of the header/footer html is the same as the main HTML source.
...
PD4ML html = new PD4ML();
...
PD4PageMark header = new PD4PageMark();
header.setAreaHeight( 150 );
header.setHtmlTemplate( "<font face=\"Arial\" size=\"10\">" + 
       "<b>Header test</b><br>" + "${page} of ${total}</font>" );
html.setPageHeader( header );
...

Known issues

  • Bidirectional scripts are supported with some restrictions. PD4ML can not do correctly glyphs shaping yet for the texts in page header/footer areas as well as in the form elements (like button captions). 
     
  • To use PD4ML with bidirectional scripts on UNIX platform, you should correctly install referred TTFs to be known by Java. See the document: http://java.sun.com/j2se/1.3/docs/guide/intl/addingfonts.html#adding 
     
  • On UNIX-derived platforms TTF font Wingdings is not supported for the time being.

A lot of TTF fonts have only plain style defined. For example NSimSun font has no bold or italic glyphs pre-defined.

“If the font in a document uses a bold or italic style, but there is no font data for that style, the host operating system will synthesize the style.” 

That works for native applications like MS Internet Explorer for direct text output, but unfortunately a synthesized font is not accessible from Java application as binary font data. PD4ML needs a .ttf file or an equivalent to parse, extract glyph definitions for used characters and to embed them to the resulting PDF.

In theory as a possible solution you can use a third-party TTF management tool to synthesize italic style of NSimSun and to save it as
NSimSunItalic.ttf and to register in pd4fonts.properties:

NSimSun\ Italic=NSimSunItalic.ttf

Or simply choose a similar italic font, and register it as above.

The same can be done for “NSimSun\ Bold\ Italic”

Embedding fonts to PDF from JSP

<%@ taglib uri=”/WEB-INF/tlds/pd4ml.tld” prefix=”pd4ml” %><%@page
contentType=”text/html; charset=UTF-8″%><pd4ml:transform
screenWidth=”400″
pageFormat=”A5″
pageOrientation=”landscape”
pageInsets=”15,15,15,15,points”
inline=”true”>

<!–
TODO:
1. adjust fonts directory below.
2. make sure that the directory contains pd4fonts.properties file
3. for more info see PD4ML.useTTF( String, boolean ) Java API docs
–>
<pd4ml:usettf from=”/windows/fonts”/>


<html>
<head>
<title>PD4ML embedded fonts test</title>
<META http-equiv=Content-Type content="text/html; charset=utf-8">
</head>
<body>
<font face="Tahoma">Hello, World!</font><br>
<font face="Verdana">Привет, Мир!</font><br>
<font face="SimSun">我招呼您, 世界!</font><br>
</body>
</html>
</pd4ml:transform>

Note: Explicit and correct charset setting like <%@page contentType=”text/html; charset=UTF-8″%> is mandatory for JSP.

Embedding fonts to PDF from Java API

...
File tempFile = File.createTempFile( "pd4ml", ".pdf" );
java.io.FileOutputStream fos = new java.io.FileOutputStream( tempFile );

PD4ML pd4ml = new PD4ML();
pd4ml.setHtmlWidth( 800 );
pd4ml.useAdobeFontMetrics( true );

pd4ml.useTTF( "/windows/fonts", true );

pd4ml.render( "http://www.yahoo.com", fos );

String pdfFile = tempFile.getAbsolutePath();
String viewcmd = 
    "\\Program Files\\Adobe\\Acrobat 6.0\\Reader\\AcroRd32.exe " + pdfFile;

Runtime.getRuntime().exec( viewcmd );

...