Hi,
I have some problems when I try to get a pdf from an url. I checked the sites html syntax and found something like this:
Code:
<script>...</script>
<td height="700">
<noscript><td height="300"></noscript>
The first
td is wrong. It should be deleted. The script adds a td dynamicly. The problem is, that I can not delete it. When I try to get a pdf file everything will be rendered until this code fragment.
Is it possible to configure pd4ml in a way that it ignores invalid html or is there any other solution?