Instructions for adapting your PD4ML-v3-enabled application to the latest PD4ML API
Tag: news
CSS Flexible Box Layout support added
PD4ML v4.0.6 introduces flexbox layout support.
Children of a flex container can “flex” their sizes, either growing to fill unused space or shrinking to avoid overflowing the parent. Both horizontal and vertical alignment of the children is supported.
Automatic Font Kerning Support
PD4ML v4.0.4 introduces optional font kerning support. Kerning is an addition or reduction of space between two characters (glyphs) of a proportional font. As a rule a rendered text is visually much more pleasing when the kerning is applied.
PD4ML uses kerning pairs info from TTF/OTF fonts (if available). For the standard built-in Type1 fonts the kerning info is known and included to PD4ML product itself.
The feature is also implemented in our older development branch v3.11.0 due to its importance
Creating high quality accessible PDF/UA documents
One of the major reasons to migrate to PD4ML v4 API is the support of the PDF/UA standard (Universal Accessibility).
PDF/UA standard is to let people with disabilities access PDF information in an efficient manner without assistance from others and be able to receive the same value from the content as others. PDF/UA does not add any new features to the PDF file format, but makes some aspects required which are optional in a regular PDF.
A lot of the requirements, like fonts embedding, were implicitly fulfilled even by older versions of PD4ML. But the major ones are covered only starting from PD4ML v4 – thanks to the new product architecture, which has been designed bearing in mind PDF/UA conformance.
PDF/UA requirements
- PDF/UA document must be tagged. It must include information regarding the nesting and relationship of different types of structure elements. It is probably the most annoying task to fulfill if you create PDF/UA using WYSIWYG tools from scratch or do manual PDF post-processing to achieve PDF/UA conformance. With PD4ML the task is trivial: a document structure comes from input HTML and can be perfectly automatically transformed to PDF tags.
- The document structure elements must have alternative descriptions. PD4ML uses for that a content specified by TITLE and ALT attributes of HTML tags. It is also a good idea to specify LANG attribute if the content not in English.
- PDF/UA document must be supplied with XML metadata. PD4ML generates it for you from known document/environment info. Make sure your input HTML document specifies
<title>
,<meta name="description" content="document subject">
,<meta name="author" content="author name">
,<meta name="keywords" content="comma-delimited list of keywords">
etc.
Enabling PDF/UA output
First you need to obtain PD4ML UA license with a license code. See the documentation.
Without the license code (or with a code of PD4ML license type, which does not enable PDF/UA feature) it still going to generate PDF/UA documents, but watermarked with “Evaluation” banner.
The next step is straightforward: invoke writePDF()
method with PDFUA parameter.
PD4ML pd4ml = new PD4ML(); // important! PDF/UA requires TTF fonts to be embedded // "arial,times,courier" is a comma-delimited list of font file name patterns to use pd4ml.useTTF("c:/Windows/Fonts/", "arial,times,courier"); String html = "<html>\n" + "<head>\n" + "<title>PDF/UA Test</title>\n" + "<meta name=\"description\" content=\"Document Subject\" />\n" + "<meta name=\"author\" content=\"Max Mustermann\" />\n" + "</head>\n" + "<body lang=\"DE-de\">\n" + "<div title=\"PDF/UA test content\">Prüfung auf PDF/UA-Konformität</div>\n" + "</body>\n" + "</html>"; ByteArrayInputStream bais = new ByteArrayInputStream(html.getBytes()); // read and parse HTML pd4ml.readHTML(bais); File pdf = File.createTempFile("result", ".pdf"); FileOutputStream fos = new FileOutputStream(pdf); // render and write the result as PDF pd4ml.writePDF(fos, Constants.PDFUA); // open the just-generated PDF with a default PDF viewer Desktop.getDesktop().open(pdf);
If you run PD4ML as a standalone command line tool, you may force it to output PDF/UA with -outformat pdfua
parameter.
In both cases do not forget to point PD4ML to a folder with indexed TTF/OTF fonts using pd4ml.useTTF(fontDir)
API call or -ttf <ttf_fonts_dir>
command line parameter.
PDF/UA validation
After PDF document is generated, it is a good idea to validate it for PDF/UA conformity. PD4ML did the best, but there are some aspects, that cannot be fully automated (e.g. providing of alternative descriptions).
There is a variety of PDF and PDF/UA validator tools. Our choice is free Free PDF Accessibility Checker (PAC 3)
The PAC3 tool creates detailed validation reports, allows to browse the document structure to the problem elements and highlight them in the rendered content. However error messages can be not always clear. With some practice and collected validation experience their meaning becomes less confusing.
Alternatively you may use Preflight included to Adobe Acrobat Pro or diverse online validator applications (e.g. 3-HEIGHTS™ PDF VALIDATOR ONLINE TOOL). The tools are also great, but comparing to PAC3, they are mostly focused not on accessibility issues, but on technical/syntax aspects of PDF code (which should be not your, but our concern).
Most typical validation errors
-
Despite the message, the real validation error reason is a missing of <thead> section in a table. Add missing <thead> section to suppress the message. PD4ML creates an implicit <thead> section if leading table rows contain <th> cells only.
-
The message is clear: in PDF/UA you are not allowed to nest, let’s say, <h4> heading to <h2>, which happens quite often in real life HTML documents.
RTF Output Performance
The recent PD4ML builds (both v3 and v4) refactor RTF output logic to improve conversion performance by bulky input HTML documents.
Now, even in extreme situations, the performance is comparable with PDF output and very often outperforms it.
How to configure PDF fonts
PD4ML PRO, DMS and UA allow you to use all UNICODE characters space of custom TTF/OTF fonts in PDF.
The way TTF embedding is implemented by PD4ML may look complicated at first glance. On practice it is not so; also there are reasons why TTF usage is not as transparent as in regular Java applications.
In Java you can easily instantiate java.awt.Font
object for any font face name, obtain the font metrics and to set the font for text output. By PDF generation PD4ML needs an access not only to java.awt.Font
object, but to the corresponding physical .ttf file (to parse them and to extract a subset of used glyphs). Unfortunately Java does not offer a way to locate TTF file for a particular java.awt.Font
object.
The most straightforward solution was to use font face -> font file mapping file. PD4ML’s default file name for it is pd4fonts.properties
Below are available options how to create and deal with the mapping file or how to avoid a creation of it.
Creation of pd4fonts.properties for a selected set of fonts
- create fonts/ directory (i.e /path/to/my/fonts/) and copy desired TTF files into it.
- run pd4fonts.properties generation command
java -Xmx512m -jar pd4ml.jar -configure.fonts /path/to/my/fonts/
As a result it should produce /path/to/my/fonts/pd4fonts.properties.
Now you can refer to it from Java application
pd4ml.useTTF("/path/to/my/fonts/"); // or identically pd4ml.useTTF("/path/to/my/fonts/pd4fonts.properties");
Creation of pd4fonts.properties for system fonts
In the example above pd4fonts.properties file is stored to the same folder where TTF files are. If you run the command to index system fonts, in most of the cases it fails, as it has no write permission to the system font folder.
A solution is to write pd4fonts.properties to another location:
- run pd4fonts.properties generation command
java -Xmx512m -jar pd4ml.jar -configure.fonts c:/windows/fonts/ c:/path/to/my/config
As a result it should produce c:/path/to/my/config/pd4fonts.properties with an internal reference to the original font folder c:/windows/fonts/.
Now you may refer to it from Java application
pd4ml.useTTF("c:/path/to/my/config"); // or identically pd4ml.useTTF("c:/path/to/my/config/pd4fonts.properties");
Creation of pd4fonts.properties on-a-fly
Set generateFontMappingFileIfMissing
parameter of useTTF()
to true
pd4ml.useTTF("/path/to/my/fonts/", true);
Creation of in-memory font mapping on-a-fly
Typically the method is used, when there is no preconfigured fonts directory available, and a use of the system fonts directory seems to be a good option. An obvious drawback of the idea is a potentially long indexing time of a big number of system fonts.
PD4ML allows to reduce the indexing efforts by limiting a scope of used fonts. fontFileNameFilter
parameter can be set to a comma-delimited list of font name patterns:
pd4ml.useTTF("c:/windows/fonts/", "arial,times,courier");
The above code forces PD4ML to index only fonts, whose names contain arial, times or courier.
Creation of a JAR file with fonts
As a rule in Web application contexts you are not allowed to refer local file system resources. That makes the above methods not usable. PD4ML’s solution is to pack the fonts to a JAR file and deploy it with the Web application resources.
- create fonts/ directory (i.e /path/to/my/fonts/) and copy desired TTF files into it.
- run pd4fonts.properties generation command
java -Xmx512m -jar pd4ml.jar -configure.fonts /path/to/my/fonts/
which produces /path/to/my/fonts/pd4fonts.properties
- pack to JAR
jar cvf fonts.jar /path/to/my/fonts/
After deployment you can refer to it from Java application
pd4ml.useTTF("java:fonts/");
The “java:fonts/” URL addresses fonts/ folder within the JAR.
Web fonts
The @font-face
CSS at-rule adds a custom font to a list of available ones; the font can be loaded from either a remote server or a locally-installed font on the user’s own computer.
The approach requires no API calls. All configuration is to be done in HTML/CSS sources.
@font-face { font-family: "Consolas"; src: url("java:/html/rc/FiraMono-Regular.ttf") format("ttf"); } @font-face { font-family: 'Open Sans'; font-style: normal; font-weight: 400; src: url(https://fonts.gstatic.com/s/lato/v11/qIIYRU-oROkIk8vfvxw6QvesZW2xOQ-xsNqO47m55DA.woff) format('woff'); }
Font kerning
Kerning is an addition or reduction of space between two characters (glyphs) of a proportional font. As a rule a rendered text is visually much more pleasing when the kerning is applied.
Font kerning can be enabled with PD4ML API call:
pd4ml.applyKerning(true);
If you run PD4ML as a standalone command line tool, you may force it to apply kerning with -kerning
parameter.
CSS Transform Property Supported
PD4ML v4.0.3 implements CSS transform feature e.g. to rotate, scale, skew HTML objects.
Here is a list of supported transform functions:
- translate(x, y)
- translatex(x)
- translatey(y)
- skew(x-angle,y-angle)
- skewx(x-angle)
- skewy(y-angle)
- scale(x,y)
- scalex(x)
- scaley(y)
- matrix(n,n,n,n,n,n)
- rotate(angle)
- rotatez(angle)