HTML to PDF / DOCX / RTF Java converter library › Forums › PD4ML v3 Archived Forums (Read Only) › HTML/CSS rendering issues › Images with # in URL Don’t Render
- This topic has 4 replies, 2 voices, and was last updated Dec 04, 2013
05:44:21 by Thunderforge.
-
AuthorPosts
-
October 20, 2013 at 18:32#26875
I’ve discovered that images with a # symbol in the URL don’t render in PD4ML, while they do render in web browsers. This happens on both Mac OS X and Windows. You can test this with the following:
1. Place an image in a folder with a # sign in it. For instance, “~/Desktop/#Test/Test.jpg” (Mac OS X) or “Users/[Name]/Desktop/#Test/Test.jpg” (Windows).
2. Run the following Java code:
PD4ML pd4ml = new PD4ML();<br /> String portraitPath = System.getProperty("user.home") + File.separator + "Desktop" + File.separator + "#Test" + File.separator + "Test.jpg";<br /> portraitPath = new File(portraitPath).toURI().toString();<br /> String outputHTML = "<html><body><img src="" + portraitPath + "" /></body></html>";<br /> FileOutputStream fos = new FileOutputStream(new File(System.getProperty("user.home") + "/Desktop/Test.pdf"));<br /> pd4ml.render(new StringReader(outputHTML), fos);<br />
The result will be an empty PDF. If you change the directory from “#Test” to “Test”, both in the code and in the file system, the portrait will appear properly.October 22, 2013 at 11:09#29398PD4ML loads document resources (even from a local file system) using URL/URLConnection Java classes. If an image is referenced using a file system path, it is implicitly converted to an URL by an adding a “file:” protocol prefix.
‘#’ symbol has a special meaning in URLs. In your particular case ‘#’ and following it path info is simply ignored.
It is important to understand, that from HTML specification perspective your HTML code is erroneous. The specification explicitly requires to use an URL (not just a path) as an SRC attribute value:
valid non-empty URL potentially surrounded by spaces referencing a non-interactive, optionally animated, image resource that is neither paged nor scripted.
The major web browsers also accept paths to workaround a “popular” web authors’ mistake. (However with MS FrontPage it is even impossible to embed an image from a directory with ‘#’ in its name).
We could implement a support of ‘#’ in image paths, but probably a renaming of the dir to a web-friendly name is a better idea.
October 23, 2013 at 01:26#29399I understand that a # in a URL denotes an anchor. However, the File.toURI() method escapes the # symbol and renders it as %23 (e.g. file:/Users/Will/Desktop/%23Test/Test.jpg). Because it is escaped, shouldn’t it be interpreted as a literal #, rather than a symbolic #?
After all, The internet standards track protocol for the URI says the following in section 2.4.3:
The character “#” is excluded because it is used to delimit a URI from a fragment identifier in URI references…[Several more examples of excluded characters are given]…Data corresponding to excluded characters must be escaped in order to be properly represented within a URI.
The # symbol is an excluded character. When returned from File.URI(), it is escaped. Therefore the protocol indicates that that it meets the requirements to be properly represented.
Even if that’s not reason enough, Java Swing is able to handle this, as in the following code:
//Setup paths<br /> String portraitPath = System.getProperty("user.home") + File.separator + "Desktop" + File.separator + "#Test" + File.separator + "Test.jpg";<br /> String uriString = new File(portraitPath).toURI().toString(); //Among other things, # becomes %23<br /> <br /> //Setup view<br /> JFrame frame = new JFrame("Test");<br /> Panel panel = new JPanel();<br /> <br /> //Add label containing the image, rendered from a URL with an escaped # sign<br /> JLabel label = new JLabel(new ImageIcon(new URL(uriString)));<br /> panel.add(label);<br /> <br /> //Finalize everything<br /> frame.setContentPane(panel);<br /> frame.pack();<br /> frame.setVisible(true); //The image is successfully displayed!
Since Java Swing is able to display an image with an escaped # in the URL, it seems that PD4ML, being a Java library, ought to also be able to display an image with an escaped # in the URL.December 3, 2013 at 19:05#29400v385fx3 solves the issue.
December 4, 2013 at 05:44#29401Thanks, the new version works great! I appreciate you addressing this.
-
AuthorPosts
The forum ‘HTML/CSS rendering issues’ is closed to new topics and replies.