Usage 1: java -jar pd4ml*.jar -Xmx512m [-Djava.awt.headless=true] '<HTML/XML url>' <htmlWidth> [pageFormatName|WxH] [-xsl <notesdefault|url>] [-pdfa] [-permissions <NUMBER>] [-bookmarks <HEADINGS|ANCHORS>] [-dumphtml][-orientation <PORTRAIT|LANDSCAPE>] [-insets <T,L,B,R,><mm|pt>] [-bgcolor <#RGB>] [-bgimage '<url>'] [-ttf <ttf_fonts_dir>] [-ttfrefsonly][-addstyle <CSS code>] [-pdfforms] [-multicolumn <nr,gap>] [-adjustwidth] [-fitapage] [-fitandcenterpage] [-nohyperlinks] [-author <author name>] [-title <title override>] [-cookie <name> <value>] [-param <name> <value>] [-header '<header HTML code>'] [-footer '<footer HTML code>'] [-pagerange <page>] [-encoding <HTML encoding>] [-outformat <pdf|pdfa|pdfua|rtf|rtfwmf|png8|png24|tiff] [-out <output_file_path>] [-password <password>] [-merge <path> <after|before>] [-threads <pool_size>] [-log <level>] [-debug <level>] [-usetmpfiles] [-watermark imageurl,x,y,width,height,opacity] Usage 2: java -jar pd4ml*.jar -Xmx512m [-Djava.awt.headless=true] -tools '<PDF url>' [-readpassword <password>] [-permissions <NUMBER>] [-author <author name>] [-title <title override>] [-out <output_file_path>] [-password <password>] [-merge <path> <after|before>] [-mergepassword <password>] Usage 3: java -jar pd4ml*.jar -Xmx512m [-Djava.awt.headless=true] -tools '<PDF url>' [-readpassword <password>] [-printpermissions] [-printauthor] [-printtitle] [-printpagenum] Usage 4: java -jar pd4ml*.jar -Xmx512m [-Djava.awt.headless=true] -configure.fonts <fontdir> [ location] Usage 5: java -jar pd4ml*.jar -Xmx512m -gui [HTML url]
HTML conversion to PDF, RTF or a raster image
HTML-to-PDF conversion with the absolute minimum of parameters
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar "" 1200
- The command line overrides the default Java memory heap size limit with -Xmx512m. Here it is set to 512Mb.
- On UNIX platform
allows to run the application on non-graphics-enabled servers or from remote ssh/telnet sessions." 1200
are HTML source URL and htmlWidth (virtual "browser" frame width) parameters.Note: on Win32 the URL is enclosed, if needed, to double quotes, on UNIX - to single quotes.
- The default PDF document format: A4 / PORTRAIT
- In the example 1200px width of rendered document will be mapped to 595pt widths of A4 page format. As long as an output file path omitted, the output is sent to STDOUT and can be piped to another application.
Customized HTML-to-PDF conversion
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar "" 1200 LETTER -bookmarks HEADINGS -pdfforms -debug -out pd4ml.pdf
- In the examples the generated PDF is written to a file, defined with -out parameter. That makes possible to use STDOUT for debug output (-debug parameter).
- The examples also force PD4ML to produce PDF outlines (bookmarks) from
hierarchy of the document (-bookmarks HEADINGS
) and to convert HTML forms to interactive PDF forms (-pdfforms
PDF processing (merging, updating)
PDF page removal
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar -tools file:c:/docs/test.pdf -pagerange 2-3,5+ -out c:/docs/newdoc.pdfThe call extracts a selected range of document pages and saves them as a new document.
PDF documents merge
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar -tools file:c:/docs/test.pdf -merge file:c:/docs/tomerge.pdf after -out c:/docs/newdoc.pdfNote:
option is not available by a PDF merge
PDF permissions update
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar -tools file:c:/docs/test.pdf -permissions 28 -out c:/docs/newdoc.pdf
-permissions 28
is a sum of permissions: AllowDegradedPrint = 4
, AllowModify = 8
and AllowCopy = 16
. See API reference for more details.
Reading of document meta information
java -Djava.awt.headless=true -Xmx512m -jar pd4ml.jar -tools file:c:/docs/test.pdf -printpermissions -printauthor -printtitle -printpagenumThe call prints to STDOUT basic PDF info: document permissions (as a hex number), document author, document title, number of document pages (decimal number)
Indexing of TTF fonts
java -Xmx512m -jar pd4ml.jar -configure.fonts <fontdir> [ location]
The command line syntax allows to define document header or footer code in HTML. The HTML code may include some reserved characters, have special meaning in command shell on the hosting platform.
First, you need to make sure, space characters do not corrupt the entire command line.
On Win32 platform, the code must me enclosed to double quotes, or to single quotes on UNIX-derived platforms (including MacOS).
If the HTML code itself has double or single quotes - they have to be substituted with safer equivalents: on Win32, for example, double quotes replaced with single quotes. Or, if it is possible, removed completely: PD4ML accepts attribute values without quotes (in contrast to XML requirements).
On Win32 we had no success to build a command line with an HTML snippet, which has both a space and a double quote. Please let us know if there is a solution.
On UNIX you may escape a single quote with '\'' sequence (single quote, backslash, single quote two times), which looks strange, but works.
On all platforms you should be careful with the following characters in command lines:
'<', '>', '%', ';', '&', '|', '^'
Win32 addition to the list:
'(', ')', ','
UNIX addition to the list:
'{', '}', '[', ']', '$', '#', '*', '?', '`'
Make sure the characters are parts of a text, enclosed to appropriate quotes, or escaped.
If you run Pd4Cmd from an alien environment, like Perl, PHP etc, with an
call, look for an exec method or function, which accepts a command line as an array of parameters. In the case a special character escaping is not needed. Methods with a single parameter (represents entire command line) require the escaping and may even add their own issues.
Command-line parameters
Pd4Cmd parameter | Description |
-gui |
Launches GUI viewer/converter tool. v4.0.1 |
'<url>' |
(mandatory) URL of HTML source.
'file:docs/doc1.htm' (relative to the current directory)
(on Windows platform use double quotes) |
<htmlWidth> |
(mandatory) Width of "virtual browser" frame. Base for relative width calculations. |
-xsl <stylesheet URI> |
Forces to perform XSL conversion of input document (XML expected in the case) with XSL stylesheet, whose URL is given as a parameter. Default XSL transformer is Xalan |
-usetmpfiles | Forces the tool to extract Base64-encoded Notes attachments and to store them as temporal files instead of passing the bulky data to Xalan |
-dumphtml |
If XSL transformation is requested, the parameter forces to dump the transformation result (HTML) to STDOUT |
pageFormatName|WxH |
Target page format. Either one of predefined names or WIDTHxHEIGHT dimensions, given in typographical points. Default value: A4
Predefined page formats:
400x400 |
-addstyle <CSS code> |
The parameter allows to apply additional styles to the source document. Multiple occurrences of the parameter in Pd4Cmd command line are allowed.
-addstyle 'TH {background-color: tomato} TR {page-break-inside: avoid}'
(on Windows platform use double quotes) |
-adjustwidth |
Sets htmlWidth to the most right margin of the HTML block content. Calling the method would force PD4ML to build HTML layout with htmlWidth to determine the most right edge of rendered content and to use the value for PDF mapping (in other words, to virtually cut any blank area right-side).
-author <author name> |
Defines document author in PDF propertiesExample:
-author 'Max Mustermann'
(on Windows platform use double quotes) |
-bgcolor '<#RGB>' |
Defines background color for PDF pagesExamples:
-bgcolor '#FFFCFE'
-bgcolor 0xFFFCFE
(on Windows platform use double quotes) |
-bgimage '<url>' |
Defines background image for PDF pages. The image will be stretched to cover the entire page, so it makes sense to choose images with dimensions, proportional to the target page format.
Examples:-bgimage ''
-bgimage 'file:/resources/images/blank.jpg' (on Windows platform use double quotes) |
-bookmarks <HEADINGS|ANCHORS> |
Forces to generate PDF bookmarks (aka outlines).
-bookmarks HEADINGS
-bookmarks ANCHORS |
-cookie <name> <value> |
Allows to define a cookie to be sent with source HTML HTTP request (and all subsequent resource requests). Multiple occurrences of the parameter in Pd4Cmd command line are allowed.
-cookie JSESSIONID '9034657927465;path=/'
(on Windows platform use double quotes) |
-debug |
Enables PD4ML debug output to STDOUT. The parameter takes no effect if -out parameter is omitted. |
-encoding <HTML encoding> |
Document encoding override |
-fitapage |
Forces PD4ML to downscale entire HTML layout if needed to fit a single PDF page vertically |
-footer '<footer HTML code>' |
PRO Defines PDF page footer in HTML. $[page], $[total] and $[title] placeholders are supported.Example:
-footer '<div width=100% align=right>$[page] of $[total]</div>'
(on Windows platform use double quotes) |
-header '<header HTML code>' |
PRO Defines PDF page header in HTML. $[page], $[total] and $[title] placeholders are supported.Example:
-header '<div width=100% align=right>$[page] of $[total]</div>'
(on Windows platform use double quotes) |
-insets <T,L,B,R,><mm|pt> |
Defines page margins (Top,Left,Bottom,Right).
Defaults: 10,10,10,10,mm
-insets 10,20,10,10,mm
-insets 20,40,20,20,pt |
-merge <path> <after|before> |
PRO Merges conversion result with an existent PDF document. after - append the existing document to the conversion result, before - prepend the document |
-multicolumn <nr,gap> |
PRO Outputs multicolumn PDF document. nr - number of columns, gap - column padding |
-nohyperlinks |
Disables to convert external HTML hyperlinks into PDF hyperlinks |
-noimagesplit |
Allows to disable image splitting by page breaks. By default the splitting is enabled. If the parameter is set, than PD4ML tries to put page breaks protecting the images. If an image height (in screen pixels) is bigger than computed page height (in screen pixels), than it will be splitted regardless the option.Similar behavior may be achieved with IMG{page-break-inside: avoid} CSS style |
-orientation <PORTRAIT|LANDSCAPE> |
LANDSCAPE rotates 90° target page format (default is A4)
-orientation PORTRAIT
-orientation LANDSCAPE |
-out <output_file_path> |
Defines target file path/name. Pd4Cmd must have permissions to write the file.
-out c:\tmp\out.pdf
-out /tmp/out.pdf |
-outformat <pdf|pdfa|pdfua|rtf|rtfwmf> |
PRO Specifies output file format. pdfa duplicates -pdfa parameter. rtf forces PD4ML to output RTF instead of PDF. rtfwmf outputs RTF and converts images to WMF file format for a better viewer compatibility. |
-pagerange <page> |
Allows to limit a scope of generated pages. Examples: "2+" - skip the first page, "1-2" - output only the first and the second pages, "even" or "odd" - it is obvious. The rules may be combined: "3-7,odd"
-pagerange '2-3,7+'
(on Windows platform use double quotes) |
-param <name> <value> |
Sets key/value pair to dynamically substitute placeholders in HTML template (like $[key] ). Key names "page", "total" and "title" are reserved for PDF headers and footers. Also allows to pass PD4ML tweaking parameters. Multiple occurrences of the parameter in Pd4Cmd command line are allowed.
-param date 'Feb 18, 2010'
-param pd4ml.basic.authentication usr:pwd
(on Windows platform use double quotes) |
-password <password> |
Protects the resulting document with a password.
-password geheim |
-pdfa |
DMS Forces PD4ML to output PDF compliant with PDF/A specification. PDF/A specification requires all used fonts to be embedded to the resulting document. So the method call cannot guarantee the resulting doc is PDF/A, for example, if TTF embedding (-ttf ) is disabled or not configured.Place pd4ml_rc.jar to the same directory where pd4ml.jar is - it will help to avoid most of the font embedding problems. |
-pdfforms |
Forces PD4ML to convert HTML forms into interactive PDF forms |
-permissions <NUMBER> |
Defines document access permissions. NUMBER is a sum of permission values:
-permissions 2068 allows to copy and to print the resulting document |
-protectpud |
Makes PD4ML to output PDF objects respecting dimensions/font sizes given in "in", "pt", "cm" etc. By default the physical sizes are converted to pixel equivalents (using 72dpi) and scaled up or down with entire document layout.Use the feature carefully: as it switched on, there is no single HTML-to-PDF scale factor for all HTML objects. The resulting PDF layout may appear visually corrupted. |
-smarttablesplit |
Insert page breaks inbetween table rows to make the table portions fit PDF page height. If the table has a header (the first rows with <th> cells only) it replicates the row to each table section.Similar behavior (excluding the header replication) may be achieved with TR, TABLE {page-break-inside: avoid} CSS style |
-title <title override> |
Defines (or overrides) the document title
-title 'New title'
(on Windows platform use double quotes) |
-ttf <ttf_fonts_dir> |
PRO Specifies TTF fonts directory.
-ttf c:\windows\fonts
-ttf fonts/ (relative to the current dir) |
-threads <pool_size> |
Enables asynchronous resource loading (graphics, attachments etc) if thread pool size is not equal 0 and 1. By default the asynchronous loader is disabled. pool_size - loader thread pool size. If the value is negative, it it does not limit the pool size, creates new threads as needed, but will reuse previously constructed threads when they are available. |
-tools |
Switches Pd4Cmd to a tools mode. In the mode it expects not HTML, but PDF as an input and some HTML conversion-specific features take no effect.Examples:
-tools file:/docs/test.pdf
-tools file:c:\docs\test.pdf
-tools file:c:/docs/test.pdf
-tools c:\docs\test.pdf
-tools |
-readpassword <password> |
Specifies an input PDF document password for a case the document is password protected (Tools mode)
-readpassword segretto |
-mergepassword <password> |
Specifies a merged PDF document password for a case the document is password protected (Tools mode)
-mergepassword segretto |
-printpermissions |
Reads and prints PDF document permissions numberic value in hex form to STDOUT |
-printpagenum |
Reads and prints PDF page number to STDOUT |
-printauthor |
Reads and prints PDF document author |
-printtitle |
Reads and prints PDF document title |