Forum Replies Created

Viewing 2 posts - 4,306 through 4,307 (of 4,307 total)
  • Author
    Posts
  • in reply to: Technical questions / Troubleshooting
    #41003

    We have a use-case where the right-side of the text in a table is getting clipped. It appears that PD4ML cannot determine the width of each TTF font character. We have evaluated several thousands TTF fonts that are in the Linux distro (google_noto_sans*.ttf and google_noto_serif*.ttf, to name few), but have not found an acceptable set (normal, bold, italic, bold italic) that can be used. It seems like only a few condensed fonts render correctly, but they are not acceptable for business documents. We have been using PD4ML PRO v3 with embedded TTFs for many years successfully, and only recently found this use-case. We upgraded to v4, hoping it would resolve the issue, but have not had any success.

    Also, I understand that OTF files should also work, however we have not been able get them to work. Conversion from OTFs to TTFs is not an option because it likely will violate the open source license.

    Attached is an example HTML file.

    Attachments:
    You must be logged in to view attached files.
    in reply to: Technical questions / Troubleshooting
    #41137

    # Bug report: duplicate font subset prefixes cause Adobe Acrobat to abort page rendering

    ## Summary

    PD4ML emits multiple **distinct** embedded font subset programs that share the
    **same 6-character subset prefix and BaseFont name** (e.g. CIPRXO+ArialMT
    applied to six different font programs in a single one-page document). This
    violates ISO 32000-1 §9.6.4, which requires the subset tag to be unique per
    subset.

    Lenient viewers (Apple Preview / Quartz, MuPDF) render the affected documents
    correctly. Adobe Acrobat does not: it renders content up to the first font
    whose name collides with an incompatible cached program, then silently stops
    drawing the remainder of the page. The text is present and extractable in the
    PDF; Acrobat simply does not paint it.

    ## Environment

    Taken from the PD4ML diagnostic comments embedded in the page content stream:

    – PD4ML version: **4.1.0** (the build dated 2026-02-22)
    – JDK version: 21.0.11
    – OS version: Mac OS X 26.5.1
    – File encoding: UTF-8
    – Output: single-page A4 PDF (595 x 842), PDF 1.7
    – Fonts embedded as CIDFontType2 (Type0 / Identity-H), TrueType FontFile2

    The document uses two watermarks injected as HTML, per the diagnostics:

    setWatermark: 345,595.0,1.0,1.0,90.0,2.0,true,true,1+   (injectHtml 502)
    setWatermark: 345,0.0,841.0,1.0,-90.0,2.0,true,true,1+  (injectHtml 503)
    

    ## Observed behaviour

    A certificate template (heading, a few centred paragraphs, a bulleted list,
    two italic lines) is converted to PDF.

    – **Apple Preview / Quartz:** renders the full page correctly.
    – **MuPDF (pdf rasterizer):** renders the full page correctly; all 17 text
    spans extract correctly.
    – **Adobe Acrobat:** renders only the heading and the following bold lines,
    then stops. Everything from the first italic line onward is blank. No error
    dialog is shown for this particular file. (An earlier variant of the same
    template did raise an Acrobat error dialog and truncated at the same point.)

    ## Root cause (verified from the PDF bytes)

    The page content stream is well-formed and balanced (19 q / 19 Q,
    17 BT / 17 ET) and contains all text. The defect is in the embedded fonts.

    Ten font resources (/f1../f10) are present. Each is a **distinct font
    object** with a **distinct FontFile2 program** (distinct object numbers and
    distinct byte lengths), yet they collide on BaseFont name:

    | BaseFont | Used by | FontFile2 objects (sizes in bytes) |
    |—————————|———————-|—————————————————-|
    | CIPRXO+ArialMT | f1, f5, f6, f7, f8, f10 | 55 (19768), 18 (19768), 15 (28368), 43 (31340), 39 (21748), 59 (25148) |
    | EGOKFG+Arial-BoldMT | f2, f3 | 32 (24356), 34 (30624) |
    | QDIRYT+Arial-ItalicMT | f4, f9 | 40 (16436), 31 (17420) |

    So a single subset prefix (CIPRXO) is reused for six different subset
    programs, EGOKFG for two, and QDIRYT for two. Per ISO 32000-1 §9.6.4 the
    6-letter tag must uniquely identify the subset; here it does not.

    Acrobat appears to key its embedded-font cache on the BaseFont name. When a
    second resource references a name it has already cached, Acrobat reuses the
    first program (with a different CIDToGIDMap and glyph set) and aborts rendering
    of the remaining content on the page.

    The truncation begins, empirically, at the first use of the italic face
    (QDIRYT+Arial-ItalicMT).

    ## Likely trigger

    The collision correlates with the **watermark / injectHtml** usage. Those
    appear to run as separate render passes, each of which independently subsets
    the fonts it needs while reusing the same per-face prefix. The result is
    multiple distinct subset programs sharing one tag. (PD4ML 4.0.9fx5 / 3.11.4fx5
    fixed a related “possible name clash of embedded fonts by PDF document merge”;
    the watermark-overlay path may not be covered by that fix.)

    This is **reproducible on the current 4.1.0 build (2026-02-22)**.

    ## Confirmation of the cause

    Post-processing the generated PDF to assign a **unique** subset prefix to each
    embedded font (rewriting /BaseFont on the Type0 dictionary and its descendant
    CIDFont, and /FontName on the FontDescriptor) makes Acrobat render the entire
    page correctly. This was verified two ways:

    1. Renaming **all** subsets to unique prefixes — full render in Acrobat.
    2. Renaming **only** the two italic subsets to unique prefixes, leaving the
    CIPRXO (x6) and EGOKFG (x2) collisions in place — also a full render in
    Acrobat. This isolates the italic-face collision as the fatal one in this
    document, although the underlying defect (non-unique subset tags) applies to
    all three groups.

    No change to the font programs themselves is required; only the names need to
    be made unique.

    ## Expected behaviour

    Every distinct embedded font subset should receive a unique 6-character subset
    prefix (and correspondingly unique /BaseFont and /FontName), per
    ISO 32000-1 §9.6.4, regardless of how many render passes (including watermark /
    injectHtml passes) contribute fonts to the page.

    ## Reproduction

    Minimal repro: a single-page HTML template containing bold heading text, normal
    body text, and at least two italic text runs, converted with two injected HTML
    watermarks. Open the result in Adobe Acrobat (not Preview).

    A sample failing PDF is attached. Its /Font resources exhibit the collisions
    listed above.

    ## Severity / impact

    High for any workflow whose output is consumed in Adobe Acrobat. The document
    looks correct in Preview and most browsers, so the defect is easy to ship
    unnoticed and only surfaces for end users opening the file in Acrobat, where the
    bulk of the page content silently disappears.

    Attachments:
    You must be logged in to view attached files.
Viewing 2 posts - 4,306 through 4,307 (of 4,307 total)