#29620

There is a minor difference in diagnostics data of the PDFs (encapsulated to the first PDF content object):

[language=java:2h90ify6]% PD4ML version: 394 Pro DMS
% JDK version: 1.7.0_51
% OS version: Linux 3.2.0-76-generic-pae
% File encoding: UTF-8
% insets: java.awt.Insets[top=0,left=0,bottom=0,right=0]
% size: java.awt.Dimension[width=595,height=842]
% ttf: java:
% pro version[/language:2h90ify6]

[language=java:2h90ify6]% PD4ML version: 394 Pro DMS
% JDK version: 1.7.0_71
% OS version: Windows 7 6.1
% File encoding: UTF-8
% insets: java.awt.Insets[top=0,left=0,bottom=0,right=0]
% size: java.awt.Dimension[width=595,height=842]
% ttf: java:
% pro version[/language:2h90ify6]

The difference explains also a shift in the object offset table (XREF).

Also the documents should differ in DOCIDs – unique document identifiers.

Probably a better way to compare the documents would be to use PD4Document API to collect metadata (i.e. number of pages) and to extract particular page content.

[language=java:2h90ify6]PD4Document doc1 = new PD4Document(new URL(“file:o:/work/testdata/testsuite/pdf/doc5enc.pdf”), password);
int pagenum = doc1.getNumberOfPages();
String content = doc1.getPageContent(1);
String outlines = doc1.dumpOutlines();
String author = doc1.getAuthor()[/language:2h90ify6]