Pd4Cmd is a Java command line tool built on the top of PD4ML HTML-to-PDF/RTF/IMG converter library. The tool offers an access to virtually all PD4ML API functionality and makes possible to use PD4ML converter as a standalone application or as a part of non-Java environments/applications.
Usage 1: java -jar pd4ml*.jar -Xmx512m [-Djava.awt.headless=true] '<HTML/XML url>' <htmlWidth> [pageFormatName|WxH] [-xsl <notesdefault|url>] [-pdfa] [-permissions <NUMBER>] [-bookmarks <HEADINGS|ANCHORS>] [-dumphtml][-orientation <PORTRAIT|LANDSCAPE>] [-insets <T,L,B,R,><mm|pt>] [-bgcolor <#RGB>] [-bgimage '<url>'] [-ttf <ttf_fonts_dir>] [-ttfrefsonly][-addstyle <CSS code>] [-pdfforms] [-multicolumn <nr,gap>] [-adjustwidth] [-fitapage] [-fitandcenterpage] [-nohyperlinks] [-author <author name>] [-title <title override>] [-cookie <name> <value>] [-param <name> <value>] [-header '<header HTML code>'] [-footer '<footer HTML code>'] [-pagerange <page>] [-encoding <HTML encoding>] [-outformat <pdf|pdfa|pdfua|rtf|rtfwmf|png8|png24|tiff] [-out <output_file_path>] [-password <password>] [-merge <path> <after|before>] [-threads <pool_size>] [-log <level>] [-debug <level>] [-usetmpfiles] [-watermark imageurl,x,y,width,height,opacity]

Usage 2: java -jar pd4ml*.jar -Xmx512m [-Djava.awt.headless=true] -tools '<PDF url>' [-readpassword <password>] [-permissions <NUMBER>] [-author <author name>] [-title <title override>] [-out <output_file_path>] [-password <password>] [-merge <path> <after|before>] [-mergepassword <password>]

Usage 3: java -jar pd4ml*.jar -Xmx512m [-Djava.awt.headless=true] -tools '<PDF url>' [-readpassword <password>] [-printpermissions] [-printauthor] [-printtitle] [-printpagenum]

Usage 4: java -jar pd4ml*.jar -Xmx512m [-Djava.awt.headless=true] -configure.fonts <fontdir> [pd4fonts.properties location]

Usage 5: java -jar pd4ml*.jar -Xmx512m -gui [HTML url]
HTML conversion to PDF, RTF or a raster image
HTML-to-PDF conversion with the absolute minimum of parameters
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar "http://pd4ml.com" 1200
  • The command line overrides the default Java memory heap size limit with -Xmx512m. Here it is set to 512Mb.
  • On UNIX platform -Djava.awt.headless=true allows to run the application on non-graphics-enabled servers or from remote ssh/telnet sessions.
  • http://pd4ml.com" 1200 are HTML source URL and htmlWidth (virtual "browser" frame width) parameters.
    Note: on Win32 the URL is enclosed, if needed, to double quotes, on UNIX - to single quotes.
  • The default PDF document format: A4 / PORTRAIT
  • In the example 1200px width of rendered document will be mapped to 595pt widths of A4 page format. As long as an output file path omitted, the output is sent to STDOUT and can be piped to another application.
Customized HTML-to-PDF conversion
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar "http://pd4ml.com" 1200 LETTER -bookmarks HEADINGS -pdfforms -debug -out pd4ml.pdf
  • In the examples the generated PDF is written to a file, defined with -out parameter. That makes possible to use STDOUT for debug output (-debug parameter).
  • The examples also force PD4ML to produce PDF outlines (bookmarks) from <h1>-<h6> hierarchy of the document (-bookmarks HEADINGS) and to convert HTML forms to interactive PDF forms (-pdfforms).
PDF processing (merging, updating)
PDF page removal
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar -tools file:c:/docs/test.pdf -pagerange 2-3,5+ -out c:/docs/newdoc.pdf
The call extracts a selected range of document pages and saves them as a new document.
PDF documents merge
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar -tools file:c:/docs/test.pdf -merge file:c:/docs/tomerge.pdf after -out c:/docs/newdoc.pdf
Note: -pagerange option is not available by a PDF merge
PDF permissions update
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar -tools file:c:/docs/test.pdf -permissions 28 -out c:/docs/newdoc.pdf
-permissions 28 is a sum of permissions: AllowDegradedPrint = 4, AllowModify = 8 and AllowCopy = 16. See API reference for more details.
Reading of document meta information
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar -tools file:c:/docs/test.pdf -printpermissions -printauthor -printtitle -printpagenum
The call prints to STDOUT basic PDF info: document permissions (as a hex number), document author, document title, number of document pages (decimal number)
Indexing of TTF fonts
java -Xmx512m -jar pd4ml.jar -configure.fonts <fontdir> [pd4fonts.properties location]

The command line syntax allows to define document header or footer code in HTML. The HTML code may include some reserved characters, have special meaning in command shell on the hosting platform.

First, you need to make sure, space characters do not corrupt the entire command line.

On Win32 platform, the code must me enclosed to double quotes, or to single quotes on UNIX-derived platforms (including MacOS).

If the HTML code itself has double or single quotes - they have to be substituted with safer equivalents: on Win32, for example, double quotes replaced with single quotes. Or, if it is possible, removed completely: PD4ML accepts attribute values without quotes (in contrast to XML requirements).

On Win32 we had no success to build a command line with an HTML snippet, which has both a space and a double quote. Please let us know if there is a solution.

On UNIX you may escape a single quote with '\'' sequence (single quote, backslash, single quote two times), which looks strange, but works.

On all platforms you should be careful with the following characters in command lines: '<', '>', '%', ';', '&', '|', '^'

Win32 addition to the list: '(', ')', ','

UNIX addition to the list: '{', '}', '[', ']', '$', '#', '*', '?', '`'

Make sure the characters are parts of a text, enclosed to appropriate quotes, or escaped.

If you run Pd4Cmd from an alien environment, like Perl, PHP etc, with an exec() or system() call, look for an exec method or function, which accepts a command line as an array of parameters. In the case a special character escaping is not needed. Methods with a single parameter (represents entire command line) require the escaping and may even add their own issues.

Command-line parameters
Pd4Cmd parameter Description
-gui Launches GUI viewer/converter tool. v4.0.1
'<url>'  (mandatory) URL of HTML source.
  • Supported protocols: file, http and https (https may not work under some JDKs)
  • If needed, enclose the URL into single quotes on UNIX-derived platforms, into double quotes on Windows.
  • Due specifics of Java, file protocol requires less (than normally) slashes by addressing absolute paths on Windows: "file:c:/path/file.html"
Examples: 'http://pd4ml.com' 'http://host/doc.htm;jsessionid=873465837' 'file:c:/path/file.htm' 'file:docs/doc1.htm' (relative to the current directory) (on Windows platform use double quotes)
<htmlWidth>  (mandatory) Width of "virtual browser" frame. Base for relative width calculations.
-xsl <stylesheet URI> Forces to perform XSL conversion of input document (XML expected in the case) with XSL stylesheet, whose URL is given as a parameter. Default XSL transformer is Xalan
-usetmpfiles Forces the tool to extract Base64-encoded Notes attachments and to store them as temporal files instead of passing the bulky data to Xalan
-dumphtml If XSL transformation is requested, the parameter forces to dump the transformation result (HTML) to STDOUT
pageFormatName|WxH Target page format. Either one of predefined names or WIDTHxHEIGHT dimensions, given in typographical points. Default value: A4 Predefined page formats:
  • A0 - 2384x3370 points
  • A1 - 1684x2384 points
  • A2 - 1190x1684 points
  • A3 - 842x1190 points
  • A4 - 595x842 points
  • A5 - 421x595 points
  • A6 - 297x421 points
  • A7 - 210x297 points
  • A8 - 148x210 points
  • A9 - 105x148 points
  • A10 - 74x105 points
  • HALFLETTER - 396x612 points
  • ISOB0 - 2836x4008 points
  • ISOB1 - 2004x2836 points
  • ISOB2 - 1418x2004 points
  • ISOB3 - 1002x1418 points
  • ISOB4 - 709x1002 points
  • ISOB5 - 501x709 points
  • LEDGER - 1224x792 points
  • LEGAL - 612x1008 points
  • LETTER - 612x792 points
  • NOTE - 540x720 points
  • TABLOID - 792x1224 points
Examples: A3 400x400
-addstyle <CSS code> The parameter allows to apply additional styles to the source document. Multiple occurrences of the parameter in Pd4Cmd command line are allowed. Example: -addstyle 'TH {background-color: tomato} TR {page-break-inside: avoid}' (on Windows platform use double quotes)
-adjustwidth Sets htmlWidth to the most right margin of the HTML block content. Calling the method would force PD4ML to build HTML layout with htmlWidth to determine the most right edge of rendered content and to use the value for PDF mapping (in other words, to virtually cut any blank area right-side). Notes:
  • In order to use the method efficiently, it is important to set HtmlWidth value greater than the expected maximal right edge offset.
  • If the source document has HTML objects, whose width is set to 100%, than the method call is meaningless.
  • As long as htmlWidth affects HTML-to-PDF scale factor, usage of the method causes inconstancy of font/object sizes in the resulting PDF from document to document.
-author <author name> Defines document author in PDF propertiesExample: -author 'Max Mustermann' (on Windows platform use double quotes)
-bgcolor '<#RGB>' Defines background color for PDF pagesExamples: -bgcolor '#FFFCFE' -bgcolor 0xFFFCFE (on Windows platform use double quotes)
-bgimage '<url>' Defines background image for PDF pages. The image will be stretched to cover the entire page, so it makes sense to choose images with dimensions, proportional to the target page format. Examples:-bgimage 'http://pd4ml.com/i/blank.jpg' -bgimage 'file:/resources/images/blank.jpg'(on Windows platform use double quotes)
-bookmarks <HEADINGS|ANCHORS> Forces to generate PDF bookmarks (aka outlines).
  • If set to ANCHORS, PD4ML creates PDF bookmarks taken from <a name="destination"> Label</a> tags. If such tag is empty (Label is not defined), it uses destination string as visible label.
  • if set to HEADINGS, than PD4ML creates PDF bookmark tree structure derived from <H1>-<H6> hierarchy.
Examples: -bookmarks HEADINGS -bookmarks ANCHORS
-cookie <name> <value> Allows to define a cookie to be sent with source HTML HTTP request (and all subsequent resource requests). Multiple occurrences of the parameter in Pd4Cmd command line are allowed. Example: -cookie JSESSIONID '9034657927465;path=/' (on Windows platform use double quotes)
-debug Enables PD4ML debug output to STDOUT. The parameter takes no effect if -out parameter is omitted.
-encoding <HTML encoding> Document encoding override
-fitapage Forces PD4ML to downscale entire HTML layout if needed to fit a single PDF page vertically
-footer '<footer HTML code>' PRO Defines PDF page footer in HTML. $[page], $[total] and $[title] placeholders are supported.Example: -footer '<div width=100% align=right>$[page] of $[total]</div>' (on Windows platform use double quotes)
-header '<header HTML code>' PRO Defines PDF page header in HTML. $[page], $[total] and $[title] placeholders are supported.Example: -header '<div width=100% align=right>$[page] of $[total]</div>' (on Windows platform use double quotes)
-insets <T,L,B,R,><mm|pt> Defines page margins (Top,Left,Bottom,Right). Defaults: 10,10,10,10,mm Examples: -insets 10,20,10,10,mm -insets 20,40,20,20,pt
-merge <path> <after|before> PRO Merges conversion result with an existent PDF document. after - append the existing document to the conversion result, before - prepend the document
-multicolumn <nr,gap> PRO Outputs multicolumn PDF document. nr - number of columns, gap - column padding
-nohyperlinks Disables to convert external HTML hyperlinks into PDF hyperlinks
-noimagesplit Allows to disable image splitting by page breaks. By default the splitting is enabled. If the parameter is set, than PD4ML tries to put page breaks protecting the images. If an image height (in screen pixels) is bigger than computed page height (in screen pixels), than it will be splitted regardless the option.Similar behavior may be achieved with IMG{page-break-inside: avoid} CSS style
-orientation <PORTRAIT|LANDSCAPE> LANDSCAPE rotates 90° target page format (default is A4) Examples: -orientation PORTRAIT -orientation LANDSCAPE
-out <output_file_path> Defines target file path/name. Pd4Cmd must have permissions to write the file. Examples: -out c:\tmp\out.pdf -out /tmp/out.pdf
-outformat <pdf|pdfa|pdfua|rtf|rtfwmf> PRO Specifies output file format. pdfa duplicates -pdfa parameter. rtf forces PD4ML to output RTF instead of PDF. rtfwmf outputs RTF and converts images to WMF file format for a better viewer compatibility.
-pagerange <page> Allows to limit a scope of generated pages. Examples: "2+" - skip the first page, "1-2" - output only the first and the second pages, "even" or "odd" - it is obvious. The rules may be combined: "3-7,odd" Example: -pagerange '2-3,7+' (on Windows platform use double quotes)
-param <name> <value> Sets key/value pair to dynamically substitute placeholders in HTML template (like $[key]). Key names "page", "total" and "title" are reserved for PDF headers and footers. Also allows to pass PD4ML tweaking parameters. Multiple occurrences of the parameter in Pd4Cmd command line are allowed. Examples: -param date 'Feb 18, 2010' -param pd4ml.basic.authentication usr:pwd (on Windows platform use double quotes)
-password <password> Protects the resulting document with a password. Example: -password geheim
-pdfa DMS Forces PD4ML to output PDF compliant with PDF/A specification. PDF/A specification requires all used fonts to be embedded to the resulting document. So the method call cannot guarantee the resulting doc is PDF/A, for example, if TTF embedding (-ttf) is disabled or not configured.Place pd4ml_rc.jar to the same directory where pd4ml.jar is - it will help to avoid most of the font embedding problems.
-pdfforms Forces PD4ML to convert HTML forms into interactive PDF forms
-permissions <NUMBER> Defines document access permissions. NUMBER is a sum of permission values:
  • AllowAnnotate - (bit 6, value = 32)
  • AllowAssembly - (bit 11, value = 1024)
  • AllowContentExtraction - (bit 10, value = 512)
  • AllowCopy - (bit 5, value = 16)
  • AllowDegradedPrint - (bit 3, value = 4)
  • AllowFillingForms - (bit 9, value = 256)
  • AllowModify - (bit 4, value = 8)
  • AllowPrint - (bit 12 + bit 3, value = 2052)
Examples: -permissions 2068 allows to copy and to print the resulting document
-protectpud Makes PD4ML to output PDF objects respecting dimensions/font sizes given in "in", "pt", "cm" etc. By default the physical sizes are converted to pixel equivalents (using 72dpi) and scaled up or down with entire document layout.Use the feature carefully: as it switched on, there is no single HTML-to-PDF scale factor for all HTML objects. The resulting PDF layout may appear visually corrupted.
-smarttablesplit Insert page breaks inbetween table rows to make the table portions fit PDF page height. If the table has a header (the first rows with <th> cells only) it replicates the row to each table section.Similar behavior (excluding the header replication) may be achieved with TR, TABLE {page-break-inside: avoid} CSS style
-title <title override> Defines (or overrides) the document title Example: -title 'New title' (on Windows platform use double quotes)
-ttf <ttf_fonts_dir> PRO Specifies TTF fonts directory. Examples: -ttf c:\windows\fonts -ttf fonts/   (relative to the current dir)
-threads <pool_size> Enables asynchronous resource loading (graphics, attachments etc) if thread pool size is not equal 0 and 1. By default the asynchronous loader is disabled. pool_size - loader thread pool size. If the value is negative, it it does not limit the pool size, creates new threads as needed, but will reuse previously constructed threads when they are available.
-tools Switches Pd4Cmd to a tools mode. In the mode it expects not HTML, but PDF as an input and some HTML conversion-specific features take no effect.Examples: -tools file:/docs/test.pdf -tools file:c:\docs\test.pdf -tools file:c:/docs/test.pdf -tools c:\docs\test.pdf -tools http://pdfcloud.com/test.pdf
-readpassword <password> Specifies an input PDF document password for a case the document is password protected (Tools mode) Examples: -readpassword segretto
-mergepassword <password> Specifies a merged PDF document password for a case the document is password protected (Tools mode) Examples: -mergepassword segretto
-printpermissions Reads and prints PDF document permissions numberic value in hex form to STDOUT
-printpagenum Reads and prints PDF page number to STDOUT
-printauthor Reads and prints PDF document author
-printtitle Reads and prints PDF document title