HTML to PDF / DOCX / RTF Java converter library › Forums › PD4ML v3 Archived Forums (Read Only) › HTML/CSS rendering issues › Encoding problem (æøå)
- This topic has 7 replies, 3 voices, and was last updated Sep 15, 2011
14:26:37 by Anonymous.
-
AuthorPosts
-
September 14, 2011 at 15:51#26606
Hi.
I am rendering af URL and have problems with characters like æ, ø, å which is shown as ‘?’.
How do I solve the character problem?
My HTML page is declared with the meta tag:
The pd4ml debug looks like:
version: PD4ML 371b9 (eval)
using content encoding from HTTP header: iso-8859-1
new parse attempt with: UTF8
done in 240ms.In the PD4ML code I use the code:
<br /> ...<br /> PD4ML html = new PD4ML();<br /> html.overrideDocumentEncoding("iso-8859-1");<br /> html.enableDebugInfo();<br /> <br /> response.setContentType("application/pdf");<br /> response.setHeader("Content-disposition","inline; filename="+fileName+".pdf");<br /> <br /> InputStreamReader isr = new InputStreamReader(connection.getInputStream());<br /> ByteArrayOutputStream baos = new ByteArrayOutputStream();<br /> html.render( url, baos );<br /> <br /> byte[] result = baos.toByteArray();<br /> response.setContentLength(result.length);<br /> <br /> ServletOutputStream sos = response.getOutputStream();<br /> sos.write( result );<br />
September 14, 2011 at 15:56#28656PD4ML supports by default ISO-8859-1 charset only. In order to output international characters you need to utilize TTF embedding feature of PD4ML Pro.
September 14, 2011 at 16:08#28657Thanks.
But the characters like æøå is supported by ISO-8859-1.
And why does the debug says that it parse with attempt: UTF8?:
using content encoding from HTTP header: iso-8859-1
new parse attempt with: UTF8Please see:
http://en.wikipedia.org/wiki/ISO/IEC_8859-1September 14, 2011 at 16:31#28658new parse attempt with: UTF8
is caused by an encounter with encoding directive.In your code overrideDocumentEncoding() takes no effect, as if works only for render(URL) methods.
[language=java:2s69vyt0]/**
* sets default encoding for URL-addressed HTML documents
* @param encoding name
* @since v3.7.1
*/
public void overrideDocumentEncoding( String encoding ) {[/language:2s69vyt0]As long as you pass the HTML source as an InputStreamReader, it uses a charset of the reader. You may try to use InputStreamReader constructor with Carset parameter. But a better approach would be to call
render(StringReader isr, OutputStream os, URL base, String encoding)> But the characters like æøå is supported by ISO-8859-1.
Obviously the problem is caused by an attempt to interpret ISO-8859-1 document as UTF-8. First we need to make sure it is parsed as ISO-8859-1. Please try the above suggestions.September 14, 2011 at 17:27#28659Thanks, I will try your approach.
But how do we convert the website url to java.io.StringReader if we are not going to use java.net.URLConnection?
And what is the java.net.URL base for?
Can you please give some tips?
September 14, 2011 at 17:41#28660> And what is the java.net.URL base for?
base is for relative path calculations. InputStreamReader or StringReader simply provide no info to resolve relative paths.
Well, in your sample code it is not quite clear where connection variable comes from. If your source document can be addressed by URL, it is good idea to use one of render(URL) methods. In the case html.overrideDocumentEncoding(“iso-8859-1”) should work for you as expected.
September 14, 2011 at 18:02#28661Sorry my mistake.
We are not using java.net.URLConnection. We only use:
html.render( url, baos );Here is how we get the URL:
java.net.URL url = new java.net.URL(“http://www.cateringportal.dk/rekvisition/servlet/URLOrder?orderID=279&uid=uJ5FOxxyXCgMvU42ygSRqdBlmWLPGzVw7JiAqOsTPnhv8MirBa”);September 15, 2011 at 14:26#28662Problem is solved.
The URL I pointed to, was a servlet which was redirecting to the jsp page (HTML page). This caused the problems with showing speciel characters like æøå as ???.
Now I just point the URL directly to the jsp page (HTML page) and it works now. The characters are shown perfectly.
Thanks for support.
-
AuthorPosts
The forum ‘HTML/CSS rendering issues’ is closed to new topics and replies.