HTML to PDF / DOCX / RTF Java converter library Forums PD4ML Forums Technical questions / Troubleshooting Duplicate font subset prefixes cause Adobe Acrobat to abort page rendering

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #41137

    # Bug report: duplicate font subset prefixes cause Adobe Acrobat to abort page rendering

    ## Summary

    PD4ML emits multiple **distinct** embedded font subset programs that share the
    **same 6-character subset prefix and BaseFont name** (e.g. CIPRXO+ArialMT
    applied to six different font programs in a single one-page document). This
    violates ISO 32000-1 §9.6.4, which requires the subset tag to be unique per
    subset.

    Lenient viewers (Apple Preview / Quartz, MuPDF) render the affected documents
    correctly. Adobe Acrobat does not: it renders content up to the first font
    whose name collides with an incompatible cached program, then silently stops
    drawing the remainder of the page. The text is present and extractable in the
    PDF; Acrobat simply does not paint it.

    ## Environment

    Taken from the PD4ML diagnostic comments embedded in the page content stream:

    – PD4ML version: **4.1.0** (the build dated 2026-02-22)
    – JDK version: 21.0.11
    – OS version: Mac OS X 26.5.1
    – File encoding: UTF-8
    – Output: single-page A4 PDF (595 x 842), PDF 1.7
    – Fonts embedded as CIDFontType2 (Type0 / Identity-H), TrueType FontFile2

    The document uses two watermarks injected as HTML, per the diagnostics:

    setWatermark: 345,595.0,1.0,1.0,90.0,2.0,true,true,1+   (injectHtml 502)
    setWatermark: 345,0.0,841.0,1.0,-90.0,2.0,true,true,1+  (injectHtml 503)
    

    ## Observed behaviour

    A certificate template (heading, a few centred paragraphs, a bulleted list,
    two italic lines) is converted to PDF.

    – **Apple Preview / Quartz:** renders the full page correctly.
    – **MuPDF (pdf rasterizer):** renders the full page correctly; all 17 text
    spans extract correctly.
    – **Adobe Acrobat:** renders only the heading and the following bold lines,
    then stops. Everything from the first italic line onward is blank. No error
    dialog is shown for this particular file. (An earlier variant of the same
    template did raise an Acrobat error dialog and truncated at the same point.)

    ## Root cause (verified from the PDF bytes)

    The page content stream is well-formed and balanced (19 q / 19 Q,
    17 BT / 17 ET) and contains all text. The defect is in the embedded fonts.

    Ten font resources (/f1../f10) are present. Each is a **distinct font
    object** with a **distinct FontFile2 program** (distinct object numbers and
    distinct byte lengths), yet they collide on BaseFont name:

    | BaseFont | Used by | FontFile2 objects (sizes in bytes) |
    |—————————|———————-|—————————————————-|
    | CIPRXO+ArialMT | f1, f5, f6, f7, f8, f10 | 55 (19768), 18 (19768), 15 (28368), 43 (31340), 39 (21748), 59 (25148) |
    | EGOKFG+Arial-BoldMT | f2, f3 | 32 (24356), 34 (30624) |
    | QDIRYT+Arial-ItalicMT | f4, f9 | 40 (16436), 31 (17420) |

    So a single subset prefix (CIPRXO) is reused for six different subset
    programs, EGOKFG for two, and QDIRYT for two. Per ISO 32000-1 §9.6.4 the
    6-letter tag must uniquely identify the subset; here it does not.

    Acrobat appears to key its embedded-font cache on the BaseFont name. When a
    second resource references a name it has already cached, Acrobat reuses the
    first program (with a different CIDToGIDMap and glyph set) and aborts rendering
    of the remaining content on the page.

    The truncation begins, empirically, at the first use of the italic face
    (QDIRYT+Arial-ItalicMT).

    ## Likely trigger

    The collision correlates with the **watermark / injectHtml** usage. Those
    appear to run as separate render passes, each of which independently subsets
    the fonts it needs while reusing the same per-face prefix. The result is
    multiple distinct subset programs sharing one tag. (PD4ML 4.0.9fx5 / 3.11.4fx5
    fixed a related “possible name clash of embedded fonts by PDF document merge”;
    the watermark-overlay path may not be covered by that fix.)

    This is **reproducible on the current 4.1.0 build (2026-02-22)**.

    ## Confirmation of the cause

    Post-processing the generated PDF to assign a **unique** subset prefix to each
    embedded font (rewriting /BaseFont on the Type0 dictionary and its descendant
    CIDFont, and /FontName on the FontDescriptor) makes Acrobat render the entire
    page correctly. This was verified two ways:

    1. Renaming **all** subsets to unique prefixes — full render in Acrobat.
    2. Renaming **only** the two italic subsets to unique prefixes, leaving the
    CIPRXO (x6) and EGOKFG (x2) collisions in place — also a full render in
    Acrobat. This isolates the italic-face collision as the fatal one in this
    document, although the underlying defect (non-unique subset tags) applies to
    all three groups.

    No change to the font programs themselves is required; only the names need to
    be made unique.

    ## Expected behaviour

    Every distinct embedded font subset should receive a unique 6-character subset
    prefix (and correspondingly unique /BaseFont and /FontName), per
    ISO 32000-1 §9.6.4, regardless of how many render passes (including watermark /
    injectHtml passes) contribute fonts to the page.

    ## Reproduction

    Minimal repro: a single-page HTML template containing bold heading text, normal
    body text, and at least two italic text runs, converted with two injected HTML
    watermarks. Open the result in Adobe Acrobat (not Preview).

    A sample failing PDF is attached. Its /Font resources exhibit the collisions
    listed above.

    ## Severity / impact

    High for any workflow whose output is consumed in Adobe Acrobat. The document
    looks correct in Preview and most browsers, so the defect is easy to ship
    unnoticed and only surfaces for end users opening the file in Acrobat, where the
    bulk of the page content silently disappears.

    Attachments:
    You must be logged in to view attached files.
Viewing 1 post (of 1 total)

You must be logged in to reply to this topic.