PD4ML: Multiple HTML source documents
If you need to produce a single PDF document from multiple HTMLs, a
straightforward approach - to merge the HTML files into a single one - will
not work as a closing </body> or </html> tag of the first document
signals to the HTML parser that the rest of input data must be ignored.
possible solution for the problem is to preprocess the merged documents and to remove
all occurrences of </body> or </html>, before it is
passed to PD4ML. PD4ML's HTML normalizer
module should correctly fix the inconsistency and auto-close the tags if they are removed
in the last chained document.
As a recommended solution, PD4ML defines special versions of render() methods, which accept multiple HTML
documents for a conversion into a single PDF: render(URL,...) and
render(StringReader,...). The approach will give you more predictable
result (as there is no CSS style inter-mixes, could happen by document merge),
but has a limitation: each source HTML document starts a new PDF page; there is
no way to continue half-blank page with a new doc.