1.Getting started

PD4ML is available in 2 variants:

  • PD4ML Standard – offers basic Java API for HTML-to-PDF conversion, includes PD4ML JSP tag library.
  • PD4ML Professional – adds to PD4ML Standard more features.

PD4ML Standard contains PD4ML library pd4ml.jar (or pd4ml_demo.jar) in lib/ dir, PD4ML JSP custom tag library (pd4ml_tl.jar orpd4ml_tl_demo.jar), the library description (pd4ml.tld), PD4ML Browser/Converter (part of the main PD4ML library) and relevant documentation.
PD4ML Pro package by content is similar to PD4ML Web, but includes more featured version of pd4ml.jar/pd4ml_demo.jar.

Both versions rely on open source CSS Parser library available from original project site (http://cssparser.sourceforge.net/) as well as from PD4ML Software download area. We recommend to use CSS Parser version patched by our team (http://pd4ml.com/cssparser-0.9.4.patched.src.2010.zip), which supports underscores in CSS properties, resolves a number of minor bugs and provides more informative error messaging.

2.Obtaining PD4ML

PD4ML is available in 2 variants:

  • PD4ML Standard – offers basic Java API for HTML-to-PDF conversion, includes PD4ML JSP tag library.
  • PD4ML Professional – adds to PD4ML Standard more features.

PD4ML Standard contains PD4ML library pd4ml.jar (or pd4ml_demo.jar) in lib/ dir, PD4ML JSP custom tag library (pd4ml_tl.jar orpd4ml_tl_demo.jar), the library description (pd4ml.tld), PD4ML Browser/Converter (part of the main PD4ML library) and relevant documentation.
PD4ML Pro package by content is similar to PD4ML Web, but includes more featured version of pd4ml.jar/pd4ml_demo.jar.

Both versions rely on open source CSS Parser library available from original project site (http://cssparser.sourceforge.net/) as well as from PD4ML Software download area. We recommend to use CSS Parser version patched by our team (http://pd4ml.com/cssparser-0.9.4.patched.src.2010.zip), which supports underscores in CSS properties, resolves a number of minor bugs and provides more informative error messaging.

3.Running PD4ML Converter as a standalone application

To run PD4ML Converter as a standalone application it is not necessary to perform any special installation procedures. Simply copy PD4ML library and CSS parser (ss_css2.jar) to your working directory and make sure, that your JRE environment (JAVA_HOME, CLASSPATH) is properly configured.

 

 

3.1.Run an evaluation copy of PD4ML as a GUI application

D:\tools>java -jar pd4ml_demo.jar <params>

Note. Older version require the following syntax:
D:\tools>java -Xbootclasspath/a:ss_css2.jar -jar pd4ml_demo.jar <params>
or
D:\tools>java -cp pd4ml_demo.jar org.zefer.pd4ml.tools.PD4Browser

3.2.Run a commercial copy of PD4ML as a GUI application

D:\tools>java -jar pd4ml.jar <params>

Note. Older version require the following syntax:
D:\tools>java -Xbootclasspath/a:ss_css2.jar -jar pd4ml.jar <params>
or
D:\tools>java -cp pd4ml.jar org.zefer.pd4ml.tools.PD4Browser

 

Note: The application creates and updates automatically pd4browser.properties file that holds options specified in its options dialog.

3.3.Running PD4ML as a command line converter tool

In order to start the application in command-line mode simply append two parameters (source URL and output file name) to the command we used above.

D:\tools>java -jar pd4ml.jar http://localhost:80/ test.pdf 
or
D:\tools>java -jar pd4ml.jar file:d:/test.html test.pdf

 

Note: there is no possibility to pass HTML-to-PDF convert parameters in the command line. The application takes parameters from pd4browser.properties (if exists) created by the application automatically.

4.PD4ML Java API Installation

Installation of PD4ML API is straightforward: the downloaded pd4ml.jar (or pd4ml_demo.jar in a case of the evaluation version) and ss_css2.jar should be included to classpath of your project.

PD4ML is intended to be used with JDK1.3.1 and above.

5.PD4ML JSP taglib and PD4ML Web Installation

As a start point for your PDF-enabled Web application development you can use the example Web application, supplied with the PD4ML Pro and PD4ML Web distributions.

Copy taglib/ directory and all its sub-content into your working directory.

 

Than copy pd4ml.jar (or pd4ml_demo.jar) to WEB-INF/lib directory of the webapp. Make sure, that pd4ml_tl.jar (or pd4ml_tl_demo.jar) and ss_css2.jar are already there.

Now deploy the application to your JSP runtime engine. The operation is usually differs for the engines from different vendors. Check the documentation of your version.

The following are possible steps to deploy the application to Tomcat 4.1.30 server.

  • Create pd4ml.xml file in the [tomcat_root_dir]/webapps/ with the content like below:

    <Context path=”/pd4ml docBase=”D:/tools/web debug=”0 privileged=”true reloadable=”true“>

      <ResourceLink name=”users global=”UserDatabase type=”org.apache.catalina.UserDatabase” />

    </Context>

  • Change docBase attribute to correspond your web application location
  • Restart Tomcat
  • Access the application with an URL like that http://localhost:8080/pd4ml/ (Change host and port to match your Tomcat installation). taglib/index.jsp page is be activated by default and example PDF should be generated. See: Using PD4ML custom tags in JSP section of this document for more info.

6.Adding PD4ML API calls to your Java application

6.1.Converting HTML addressed by URL to PDF

import org.zefer.pd4ml.PD4ML;
   import org.zefer.pd4ml.PD4Constants;
    ...
 2  protected Dimension format = PD4Constants.A4;
   protected boolean landscapeValue = false;
   protected int topValue = 10;
   protected int leftValue = 10;
   protected int rightValue = 10;
   protected int bottomValue = 10;
   protected String unitsValue = "mm";
   protected String proxyHost = "";
   protected int proxyPort = 0;
 
3  protected int userSpaceWidth = 780;
 
   ...
 
   private void runConverter(String urlstring, File output) throws IOException {
 
         if (urlstring.length() > 0) {
                if (!urlstring.startsWith("http://") && !urlstring.startsWith("file:")) {
                              urlstring = "http://" + urlstring;
                }
 
4               java.io.FileOutputStream fos = new java.io.FileOutputStream(output);
               
5               if ( proxyHost != null && proxyHost.length() != 0 && proxyPort != 0 ) {
                       System.getProperties().setProperty("proxySet", "true");
                       System.getProperties().setProperty("proxyHost", proxyHost);
                       System.getProperties().setProperty("proxyPort", "" + proxyPort);
                }
 
6               PD4ML pd4ml = new PD4ML();
 
7               try {                                                              
                       pd4ml.setPageSize( landscapeValue ? pd4ml.changePageOrientation( format ): format );
                    } catch (Exception e) {
                       e.printStackTrace();
                    }
                      
                if ( unitsValue.equals("mm") ) {
                       pd4ml.setPageInsetsMM( new Insets(topValue, leftValue,
bottomValue, rightValue) );
                } else {
                       pd4ml.setPageInsets( new Insets(topValue, leftValue,
bottomValue, rightValue) );
                }
 
                pd4ml.setHtmlWidth( userSpaceWidth );
               
8               pd4ml.render( urlstring, fos );
         }
   }
 
       ...

Comments:
1. Import the PD4ML converter class
2. Define HTML-to-PDF converting parameter values if needed. See API reference for more info.
3. Specify user space width. It has an analogy to Web-browser window horizontal size. From common web-browsing experience you can guess, that changing of the size can affect the HTML document representation: HTML elements arrangement, vertical size etc. See API reference for more info.
4. Preparing output stream for PDF generation.
5. Specifying proxy settings if the source HTML document is behind the firewall.
6. Instantiating PD4ML converter.
7. Passing to it HTML-to-PDF converting parameters.
8. Performing HTML-to-PDF translation. Note: using of an URL is not mandatory. PD4ML can read a source HTML from input stream. See API reference for more info.

6.2.Converting HTML obtained from input stream to PDF

File f = new File("D:/tools/test.pdf");
java.io.FileOutputStream fos = new java.io.FileOutputStream(f);
OutputStream sos = System.out;
 
      File fz = new File("D:/tools/yahoo.htm");
      java.io.FileInputStream fis = new java.io.FileInputStream(fz);
      InputStreamReader isr = new InputStreamReader( fis, "UTF-8" );
                 
      PD4ML html = new PD4ML();
      html.setPageSize( new Dimension(450, 450) );
      html.setPageInsets( new Insets(20, 50, 10, 10) );
      html.setHtmlWidth( 750 );
      html.enableImgSplit( false );
 
      URL base = new URL( "file:D:/tools/" );
 
      // alternatively base can be specified with <base href="..."> tag
      html.render( isr, fos, base );

7.Protecting PDF documents

A PDF document can be encrypted to protect its contents from unauthorized access. PD4ML supports PDF access permissions concept and allows a password to be specified for a document.

If any passwords or access restrictions are specified with PD4ML.setPermissions(), the document is encrypted, and the permissions and information required to validate the passwords are stored to the resulting document.

If a user attempts to open an encrypted document that has a password, the viewer application should prompt for a password. Correctly supplying either password allows the user to open the document, decrypt it, and display it on the screen.

If the document is encrypted with a password set to “empty”, no password is requested; the viewer application can simply open, decrypt, and display the document. Whether additional operations are allowed on a decrypted document depends on any access restrictions that were specified when the document was created.

The possible restrictions:

Modifying the document�s contents
Copying or otherwise extracting text and graphics from the document
Adding or modifying text annotations
Printing the document

See PD4ML API reference (PD4Constants.Allow*) for others.
The PDF document produced by PD4ML can be protected with 40-bit or 128-bit encryption.

...
String password = "empty";
boolean strongEncryption = true;
int permissions = PD4Constants.AllowPrint | PD4Constants.AllowCopy;
 
pd4ml.setPermissions( password, permissions, strongEncryption );
...

8.Converting HTML headings or named anchors to PDF bookmarks

PD4ML supports two different methods to generate PDF bookmarks (also known as outlines):

  1. Converting of HTML headings structure to a corresponding bookmark structure.
  2. Listing of HTML named anchors (HTML destinations) as bookmarks.

By default PD4ML does not generates document bookmarks. In order to enable the generation, an API method generateOutlines(boolean) should be triggered before render(…) method call. It can be called with one of the two available parameter values: true – to use headings structure for bookmarks; false – to use named anchors.

(In PD4ML taglib the generation process is controlled by outline attribute of <pd4ml:transform>. It can be assigned to the values “none”“headings” or “anchors”)

  • What do “named anchors” mean? Named anchors (or destinations) are defined in HTML code like the following: 
     
    <a name=”destination”>label</a>
     
    Note: do not use nested HTML tags for label definition.
  • What do “headings” mean? Headings are HTML tags <h1> … </h1> to <h6> … </h6> whose hierarchy can be used by PD4ML to generate tree-like bookmarks structure.

9.Inserting page breaks

PD4ML introduces a special HTML tag <pd4ml:page.break> which is interpreted by the converting engine as page break command. In JSP the tag should have XHTML-like closing slash: <pd4ml:page.break/>

10.1.Using PD4ML custom tags in JSP

10.1.1.Surrounding of HTML/JSP content with tags

Note: some combinations of MS Internet Explorer and Adobe Acrobat reader plugin versions are buggy. Instead of a PDF generation result MS IE displays a blank page. Check our online Support/HelpDesk for a possible workaround.


1 <%@ taglib uri="http://pd4ml.com/tlds/pd4ml/2.6" prefix="pd4ml" %><%@page
contentType="text/html; charset=ISO8859_1"%><pd4ml:transform
screenWidth="400"
pageFormat="A5"
pageOrientation="landscape"
pageInsets="100,100,100,100,points"
enableImageSplit="false">

<html>
<head>
<title>pd4ml test</title>
<style type="text/css">
body {
color: red;
background-color: #FFFFFF;
font-family: Tahoma, "Sans-Serif";
font-size: 10pt;
}
</style>
</head>
<body>
2 <img src="images/logos.gif" width="125" height="74">
<p>
Hello, World!
3 <pd4ml:page.break/>
<table width="100%" style="background-color: #f4f4f4; color: #000000">
<tr>
<td>
Hello, New Page!
</td>
</tr>
</table>
</body>
</html>
4 </pd4ml:transform>

 

Comments:
1. PD4ML JSP taglib declaration and opening transform tag. JSP content surrounded with <pd4ml:transform>and </pd4ml:transform> tags is passed to the PD4ML converter.
2. Image should be referenced with relative path. Absolute URLs, likesrc=”http://myserver:80/path/to/img.gif” are allowed as well, but src=”/path/to/img.gif” not.
3. The directive forces PD4ML converter to insert a page break to the output PDF.
4. Closing of the transformation tag. Any content that appears after the tag is ignored.
5.There is a CSS bug in JDKs older than v1.5b2. In order to avoid it, use CSS class names lowercased. (Irrelevant since PD4ML v3.x)

10.1.2.Defining PDF document footer (or header) with JSP custom tag

The <pd4ml:header> and <pd4ml:footer> JSP tags as well as inlinefileName and interpolateImages attributes of <pd4ml:transform> tag are available since v1.0.5


<%@ taglib uri="http://pd4ml.com/tlds/pd4ml/2.6" prefix="pd4ml" %><%@page
contentType="text/html; charset=ISO8859_1"%><pd4ml:transform
screenWidth="400"
pageFormat="A5"
pageOrientation="landscape"
pageInsets="15,15,15,15,points"
enableImageSplit="false"
inline="true"
fileName="footer.pdf"
interpolateImages="false">

<pd4ml:footer
1 titleTemplate="title: $[title]"
2 pageNumberTemplate="page $[page]"
titleAlignment="left"
pageNumberAlignment="right"
color="#008000"
3 initialPageNumber="1"
4 pagesToSkip="1"
fontSize="14"
5 areaHeight="18"/>

<html>
<head>
<title>pd4ml header/footer test</title>
<style type="text/css">
body {
color: #000000;
background-color: #FFFFFF;
font-family: Tahoma, "Sans-Serif";
font-size: 10pt;
}
</style>
</head>
<body>
<img src="images/logos.gif" width="125" height="74">
<p>
Hello, World!
<pd4ml:page.break/>
<table width="100%" style="background-color: #f4f4f4; color: #000000">
<tr>
<td>
Hello, New Page!
</td>
</tr>
</table>
</body>
</html>
</pd4ml:transform>

Comments:
1. Title template definition. A string that can optionally contain placeholders $[title] for a title value taken from HTML’s <title> tag, $[page] for a page counter value.
2. Page number template definition. A string with placeholder $[page] for a page counter value.
3. The attribute initializes internal page counter with the given value.
4. The attribute defines, that 1 page should not contain footer information.
5. Footer area height in points.

PD4ML also accepts syntax like ${var}, but it has special meaning in the most recent Java Servlet API versions. In order to avoid notation conflicts PD4ML additionally supports $[var] placeholders since v3.x.

10.1.4.Temporary saving generated PDF to hard drive

With <pd4ml:savefile> tag you have possibility to store just generated PDF to hard drive and redirect user’s browser to read the PDF as static resource or to redirect the request to another URL for PDF post-processing.

Note: the tag should be nested to <pd4ml:transform> and have no body.

Usage 1.

<pd4ml:savefile
uri="/WEB/savefile/saved/"
dir="D:/spool/generated_pdfs"
redirect="pdf"
debug="false"/>

 

The tag above forces PD4ML to save the generated PDF to D:/spool/generated_pdfs with an autogenerated name.

It is expected, that local directory D:/spool/generated_pdfs corresponds to URL http://yourserver.com/WEB/savefile/saved/ (as given in “uri attribute)

After generation PD4ML will send to client’s browser a redirect command with URL like that:

http://yourserver.com/WEB/savefile/saved/generated_name.pdf

Usage 2.


<pd4ml:savefile
dir="D:/spool/generated_pdfs"
redirect="/mywebapp/send_pdf_by_email.jsp"
debug="false"/>

The tag above forces PD4ML to save the generated PDF to D:/spool/generated_pdfs with an autogenerated name.

After that it forwards to /mywebapp/send_pdf_by_email.jsp with a parameter filename=<pdfname>.

So send_pdf_by_email.jsp can read file name

String fileName = request.getParameter(“filename”);

build full path

String path = “D:/spool/generated_pdfs” + “/” + fileName;

read the just-generated PDF file and and perform postprocessing or other actions (like email sending).

In both cases above you can predefine PDF file name with “name” attribute. If a file with the name is already exists in D:/spool/generated_pdfs, than the new file name is appended with an autoincremented numeric value.

10.2.Using PD4ML custom tags with ColdFusion

The described integration method was tested with ColdFusion MX 6.1 enterprise and development editions running under Jrun4

Making PD4ML available in ColdFusion web application.

Copy pd4ml.jar, pd4ml_tl.jar and pd4ml.tld (pd4ml_demo.jar, pd4ml_tl_demo.jar and pd4ml.tld) to the directory WEB-INF/lib of your CF-enabled application. By default it is ${jrun4}/servers/cfusion/cfusion-ear/cfusion-war/WEB-INF/lib.Restart the CF runtime!

Creating a PD4ML-enabled .cfm page

<cfimport taglib="/WEB-INF/lib/pd4ml.tld" prefix="pd4ml"><pd4ml:transform
screenWidth="400"
pageFormat="A5"
pageOrientation="landscape"
pageInsets="15,15,15,15,points"
enableImageSplit="false"
inline="true"
fileName="myreport.pdf"
encoding="ISO8859_1"
interpolateImages="false">

<pd4ml:header
titleTemplate="$[title]"
pageNumberTemplate="page $[page]"
titleAlignment="left"
pageNumberAlignment="right"
color="##008000"
initialPageNumber="1"
pagesToSkip="1"
fontSize="14"
areaHeight="18"/>
<html>
<head>
<title>Hello, CF!</title>
<style>
td {
font-family: "Sans-Serif";
font-size: 12;
}
</style>
</head>
<body>

<img src="images/logos.gif" width="125" height="74">
<p>
<font face="tahoma">Hello, CF!</font>

<p>
<cfoutput>
Today's date is #DateFormat(Now(), "dd.mm.yyyy")#
</cfoutput>
<p>

<table border="0" width="300">

<cfloop index = "LoopCount" from = "1" to = "25">
<cfset x = variables.LoopCount mod 2 >

<tr>
<cfoutput>
<cfif variables.LoopCount IS "1">
<td bgcolor="##d0d0d0">Euros</td>
<td bgcolor="##d0d0d0">Dollars</td>
<td bgcolor="##d0d0d0">Pounds</td>
<cfelseif variables.x IS NOT "1">
<td>#LoopCount * 10# &##128;</td>
<td>#LoopCount * 10# $</td>
<td>#LoopCount / 10 * 2# &pound;</td>
<cfelse>
<td bgcolor="##e7e7e7">#LoopCount * 10# &##128;</td>
<td bgcolor="##e7e7e7">#LoopCount + 17# $</td>
<td bgcolor="##e7e7e7">#LoopCount / 10 * 2# &pound;</td>
</cfif>
</cfoutput>
</tr>

</cfloop>

</table>

</body>
</html>
</pd4ml:transform>

Comments:
1.There is no any white space character before <pd4ml:transform> tag. If there are any, than they appear in the resulting PDF file and corrupt it.
2.‘#’ characters have special meanings in ColdFusion. The character in the PD4ML header color definition is unescaped by duplication of ‘#’

Troubleshooting

Error: “The type for attribute authorName of tag transform could not be determined”
Solution: Jrun engine needs to be restarted. Newly installed PD4ML tag library is not known to the runtime.

Error: MS Internet Explorer 6.0 and Acrobat Reader 6.0.1 (or its newer version) show blank screen instead of resulting PDF. The problem is not existent with other browsers and Acrobat Reader’s versions.
Solution: Rename your .cfm page to .pdf and add the corresponding file extensions mapping to WEB-INF/web.xml. Access the page using the new name ended with .pdf (instead of .cfm).
The extension mapping code to be added:

<servlet-mapping>
<servlet-name>CfmServlet</servlet-name>
<url-pattern>*.pdf</url-pattern>
</servlet-mapping>

Error: <pd4ml:page.break/> causes an error.
Solution: It seems Jrun4 does not allow dots in custom JSP tag names. To fix the issue open WEB-INF/lib/pd4ml.tld in text editor, duplicate (copy-paste) page.break tag description section. In the copied section replace page.break tag name with page_break. And than replace <pd4ml:page.break/> tags in your .cfm code with <pd4ml:page_break/>

10.3.Using PD4ML custom tags with Struts or with any other J2EE UI frameworks, if JSP taglib integration is problematic

The specific of the Struts UI framework is that in some cases it takes control and opens output stream before PD4ML-enabled JSP page is loaded. On the other hand PD4ML needs an exclusive control over the output stream: it outputs binary data and any other output writers can corrupt it.

In order to solve the problem PD4ML offers the following solution.

(Since PD4ML 1.2.8 / Pro 2.1.4 there is an alternative to the approach below. The just generated PDF can be temporally stored to the hard drive and the current HTTP request can be forwarded to the new static PDF. See <pd4ml:savefile/> description for more details)

PD4ML transformer JSP pages need to be deployed to a separate web application outside of Struts context. It can be the same web application, but the PD4ML-enabled JSP pages should be out of control of the Struts dispatching servlet. (Can be achieved by web.xml settings).

If you want to separate PD4ML transforming and your application business logic – it is possible as well. Run a servlet runtime (like Tomcat) on a separate hardware unit and deploy PD4ML to it.

For our example the separated web application is associated with name separate_web_app.

Create a PD4ML transformer JSP page transformer.jsp in separate_web_app like the following.

<%@ taglib uri="/WEB-INF/tlds/pd4ml.tld" prefix="pd4ml" %><%@page
contentType="text/html; charset=ISO8859_1"%><pd4ml:transform
screenWidth="400"
pageFormat="A5"
pageOrientation="landscape"
pageInsets="15,15,15,15,points"
enableImageSplit="false"
inline="true"
fileName="myreport.pdf"
interpolateImages="false"

url="http://mainserver/struts/report.jsp">

<pd4ml:footer
titleTemplate="$[title]"
pageNumberTemplate="page $[page] of $[total]"
titleAlignment="left"
pageNumberAlignment="right"
fontSize="12"
areaHeight="14"/>

</pd4ml:transform>

 

Replace url attribute value of <pd4ml:transform> with the actual URL of your Struts (JSP) page, that should be converted to PDF.

The HTML content of transformer.jsp is ignored completely, but as usually: it should be no any white space character before <pd4ml:transform> tag.

At this point you can access the transformer.jsp and force PDF conversion of the specified source. But in most of the cases a session ID (JSESSIONID) propagation is important for proper Struts (or other JSP-based frameworks) functionality.

If the url attribute of <pd4ml:transform> is set, than PD4ML takes its value and appends to it JSESSIONID parameter. The value for the parameter it takes from HTTP request variable pd4session.

So a request for PDF generation from Struts context can look like that:

<%
response.sendRedirect( "/separate_web_app/transformer.jsp?pd4session=" +
session.getId() );
%>

New information: using of <pd4ml:savefile> tag allows us to save the just-generated PDF to the server’s local drive and to redirect the initial PDF generation HTTP request to the new static PDF file.

 

 

11.Using professional version features

11.1.TTF embedding

Professional version of PD4ML adds to the standard version a possibility to use TTF and Open Type fonts and to embed them into resulting PDF documents. The professional version is not limited by Latin-1 character set for PDF texts as the standard version is. Virtually any national script, supported by Java, or mix of the scripts can be converted from HTML to PDF form. It is important that TTF fonts, the source HTML refers, contain all needed glyphs.

Pro version of PD4ML is UNICODE based, but that does not mean, that all input HTML documents should be UNICODE or UTF encoded. The main requirement is that source HTMLs (or JSPs) have explicitly specified encoding types and the encodings match the documents content.

11.1.1.System requirements

  • PD4ML Pro does not support Java runtimes older than v1.4.2. (Update: since v3.x PD4ML Pro supports JDK 1.3.1)
  • Referred TTF fonts should be UNICODE

11.1.2.Configuring fonts directory

PD4ML takes TTF fonts from a directory passed to PD4ML runtime as a parameter. The directory should contain TTF font files and a special mapping file: pd4fonts.properties. The file maps font names to particular font files located in the directory. The format of a configuration file entry is following:

Font\ Name=fontfile.ttf

Spaces in font names are escaped with ‘\’
Example:

Bookman\ Old\ Style\ Bold\ Italic=BOOKOSBI.TTF

The config file can be generated automatically with the following command line:

java -jar pd4ml.jar -configure.fonts /fonts/directory

(please replace ‘/fonts/directory’ by the path to the fonts directory on your system)

If you use Windows platform, than it can be an option to generate the file directly in %WINDOWS%/fonts directory.
For UNIX derived systems we would recommend to copy all needed TTF fonts to $JAVA_HOME/jre/lib/font directory and to generate pd4fonts.properties file there. See more about adding TTF fonts to Java here: http://java.sun.com/j2se/1.3/docs/guide/intl/addingfonts.html#adding.

After a config file generation completed please edit the generated file (/fonts/directory/pd4fonts.properties) manually in order to remove any font entries, that are not allowed to be embedded. You may use a special Win32 tool, mentioned in the document http://www.microsoft.com/typography/TrueTypeEmbedding.mspx, in order to determine if a particular font protected by copyright against embedding.

 

11.1.3.Embedding fonts to PDF from Java API

...
File tempFile = File.createTempFile( "pd4ml", ".pdf" );
java.io.FileOutputStream fos = new java.io.FileOutputStream( tempFile );

PD4ML pd4ml = new PD4ML();
pd4ml.setHtmlWidth( 800 );
pd4ml.useAdobeFontMetrics( true );

pd4ml.useTTF( "/windows/fonts", true );

pd4ml.render( "http://www.yahoo.com", fos );

String pdfFile = tempFile.getAbsolutePath();
String viewcmd = 
    "\\Program Files\\Adobe\\Acrobat 6.0\\Reader\\AcroRd32.exe " + pdfFile;

Runtime.getRuntime().exec( viewcmd );

...

11.1.4.Embedding fonts to PDF from JSP

<%@ taglib uri=”/WEB-INF/tlds/pd4ml.tld” prefix=”pd4ml” %><%@page
contentType=”text/html; charset=UTF-8″%><pd4ml:transform
screenWidth=”400″
pageFormat=”A5″
pageOrientation=”landscape”
pageInsets=”15,15,15,15,points”
inline=”true”>

<!–
TODO:
1. adjust fonts directory below.
2. make sure that the directory contains pd4fonts.properties file
3. for more info see PD4ML.useTTF( String, boolean ) Java API docs
–>
<pd4ml:usettf from=”/windows/fonts”/>


<html>
<head>
<title>PD4ML embedded fonts test</title>
<META http-equiv=Content-Type content="text/html; charset=utf-8">
</head>
<body>
<font face="Tahoma">Hello, World!</font><br>
<font face="Verdana">Привет, Мир!</font><br>
<font face="SimSun">我招呼您, 世界!</font><br>
</body>
</html>
</pd4ml:transform>

Note: Explicit and correct charset setting like <%@page contentType=”text/html; charset=UTF-8″%> is mandatory for JSP.

11.1.5.Known issues

  • Bidirectional scripts are supported with some restrictions. PD4ML can not do correctly glyphs shaping yet for the texts in page header/footer areas as well as in the form elements (like button captions). 
     
  • To use PD4ML with bidirectional scripts on UNIX platform, you should correctly install referred TTFs to be known by Java. See the document: http://java.sun.com/j2se/1.3/docs/guide/intl/addingfonts.html#adding 
     
  • On UNIX-derived platforms TTF font Wingdings is not supported for the time being.

A lot of TTF fonts have only plain style defined. For example NSimSun font has no bold or italic glyphs pre-defined.

“If the font in a document uses a bold or italic style, but there is no font data for that style, the host operating system will synthesize the style.” 

That works for native applications like MS Internet Explorer for direct text output, but unfortunately a synthesized font is not accessible from Java application as binary font data. PD4ML needs a .ttf file or an equivalent to parse, extract glyph definitions for used characters and to embed them to the resulting PDF.

In theory as a possible solution you can use a third-party TTF management tool to synthesize italic style of NSimSun and to save it as
NSimSunItalic.ttf and to register in pd4fonts.properties:

NSimSun\ Italic=NSimSunItalic.ttf

Or simply choose a similar italic font, and register it as above.

The same can be done for “NSimSun\ Bold\ Italic”

11.2.PDF headers and footers in HTML format

11.3.Watermark images

Watermark images are bound to page header/footer objects. In order to add a watermark images to your PDF, you need to add a header or a footer object (even without content) and to specify an image URL, its bounds and opacity.

Bearing in mind, that a PDF page can have both header and footer objects, it makes possible to define 2 watermark image per page (one associated with header, another with footer).

Image bounds values are set in typographical points.

Opacity range is from 0 to 100.

 

11.3.1.Specifying watermark image from Java API

...
PD4ML html = new PD4ML();
...
PD4PageMark header = new PD4PageMark();
header.setWatermark( "images/logo.gif", new Rectangle(10,10,120,25), 50 );
html.setPageHeader( header );
...

11.3.2.Specifying watermark image in JSP

...
<pd4ml:header
watermarkUrl="watermark.gif"
watermarkOpacity="50"
watermarkBounds="10,10,100,30" />
...

11.4.Table of contents

PD4ML allows to insert into the generated PDFs a table of contents (TOC) built from <H1> – <H6> headings hierarchy.

There is a special tag <pd4ml:toc> intended for that. TOC can be inserted into any document position. Only one TOC object is allowed. <pd4ml:toc> tag is ignored in multi-source PDF generation scenarios (when an array of URLs is passed to be converted to PDF).

<pd4ml:toc> can be parameterized with pncorr attribute. The numeric value given in the attribute allows to correct page numbers in the TOC. The pncorr value will be added to TOC’s page numbers.

Internally the TOC table is represented with an HTML table like below:

<table class="ptoc-table" cellspacing="0">
<tr>
<td class="ptoc-left-col"><a class="ptoc-link" href="#ptoc_1">
<div class="ptoc1-style-left">Chapter 1<pd4ml-dots></div></a></td>
<td class="ptoc-right-col"><a class="ptoc-link" href="#ptoc_1">
<div class="ptoc1-style-right">1</div></a></td></tr>
</tr>
<tr>
<td class="ptoc-left-col"><a class="ptoc-link" href="#ptoc_2">
<div class="ptoc2-style-left">Chapter 1.1<pd4ml-dots></div></a></td>
<td class="ptoc-right-col"><a class="ptoc-link" href="#ptoc_2">
<div class="ptoc2-style-right">2</div></a></td></tr>
</tr>
<tr>
<td class="ptoc-left-col"><a class="ptoc-link" href="#ptoc_3">
<div class="ptoc2-style-left">Chapter 1.2<pd4ml-dots></div></a></td>
<td class="ptoc-right-col"><a class="ptoc-link" href="#ptoc_3">
<div class="ptoc2-style-right">2</div></a></td></tr>
</tr>
<tr>
<td class="ptoc-left-col"><a class="ptoc-link" href="#ptoc_4">
<div class="ptoc3-style-left">Chapter 1.2.1<pd4ml-dots></div></a></td>
<td class="ptoc-right-col"><a class="ptoc-link" href="#ptoc_4">
<div class="ptoc3-style-right">3</div></a></td></tr>
</tr>
</table>

The following style sheet is applied to TOC by default and can be overriden:

.ptoc1-style-left { margin-left: 0 }
.ptoc2-style-left { margin-left: 16 }
.ptoc3-style-left { margin-left: 32 }
.ptoc4-style-left { margin-left: 48 }
.ptoc5-style-left { margin-left: 64 }
.ptoc6-style-left { margin-left: 80 }
.ptoc-table { border: 0; width: 100% }
.ptoc-left-col { width: 99%; padding-right: 0 }
.ptoc-right-col { text-align: right; padding-left: 0; vertical-align: bottom }
.ptoc-link { text-decoration: none; color: black }

 

12.PDF/A

There is a special version of PD4ML with PDF/A output support: PD4ML Volume DMS Edition (starting from v360).  The version introduces a new PD4ML API call generatePdfa(boolean enable).

In PDF/A output mode PD4ML generates all needed document metadata and disables some features, which are not allowed by PDF/A format: for example document encryption. Also PDF/A requires all used fonts are embedded into the resulting PDF. If TTF embedding is not switched on, PD4ML in PDF/A mode implicitly enables it and embeds default TTF fonts.

PDF/A generation mode requires pd4ml_rc.jar is in the same directory where pd4ml.jar is.

PDF/A generation errors or warnings may be obtained this way (after pd4ml.render() all)

PD4ML.StatusMessage[] msgs =
(PD4ML.StatusMessage[]) pd4ml.getLastRenderInfo(PD4Constants.PD4ML_PDFA_STATUS);

for ( int i = 0; i < msgs.length; i++ ) {
System.out.println( (msgs[i].isError() ? "ERROR: " : "WARNING: ") + msgs[i].getMessage());
}

13.General notes

There is a set of simple rules that could help you to author HTML code compatible with PD4ML:

  • Put <style> section only to <head> area of HTML.
  • Avoid to use XHTML-like syntax <tag/> for single tags, especially in <head> area. (Irrelevant since PD4ML v3.x)
  • Do not use <span> tags that do not belong to HTML3.2 spec. Use <div> instead of them. (Irrelevant since PD4ML v3.x)
  • References to multiple styles class=”style1 style2″ are not supported. (Irrelevant since PD4ML v3.x)
  • valign=”center” table cell attribute value is invalid, however it is supported by MS IE. Use correct setting valign=”middle”
  • PD4ML does not support table borders style control. (Irrelevant since PD4ML v3.x)

Good luck!

Suggest Edit