[GeoNetwork-devel] PDFPrint fails with exception [SEC=UNCLASSIFIED]

Hi Kevin,

I’ve encountered UTF encoding issues in the past. Generally my first port of call is to check that the data is actually valid UTF-8. I’ve attached a utility that simply dumps details of non-ascii characters to the console. This will give you a starting point. I generally use textpad to view the file in hex once the potential problem characters have been found. Not all programs, e.g. Windows notepad, behave correctly when ‘special’ characters exist (nor does it handle XML encoding declarations).

You may also like to consider the database settings and support for UTF-8. I think MySQL requires additional settings in the JDBC connect string.

Cheers,

Steve

-----Original Message-----
From: Kevin Gunn [mailto:k.gunn@anonymised.com]
Sent: Wednesday, 22 October 2008 4:20
To: geonetwork-devel@lists.sourceforge.net
Subject: [GeoNetwork-devel] PDFPrint fails with exception

Hi,

The latest BlueNetMEST’s PDFPrint fails on exception for me. Is there a known solution for this?

C:_work\geonetwork\BlueNet_MEST_SVN\web\geonetwork/xsl/portal-present-fop.xsl

2008-10-22 13:53:44,755 ERROR [jeeves.service] - → (C) message : org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8 sequence.

2008-10-22 13:53:44,755 ERROR [jeeves.service] - → (C) exception : XPathException

2008-10-22 13:53:44,755 DEBUG [jeeves.service] - Raised exception while executing service

org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8 sequence.

XPathException

en

pdf.search

Thx,

Kevin Gunn

Software Engineer

Australian Institute of Marine Science

Ph: (07) 47534305

Fax: (07) 4772 5852

E-mail: k.gunn@anonymised.com

------------------------------------------------------------------------|

The information contained in this communication is for the use of the |

individual or entity to whom it is addressed, and may contain |

information which is the subject of legal privilege and/or copyright. |

If you have received this communication in error, please notify the |

sender by return E-Mail and delete the transmission, together with any |

attachments, from your system. Thank you. |

------------------------------------------------------------------------|

 
 
--  
------------------------------------------------------------------------
The information contained in this communication is for the use of the 
individual or entity to whom it is addressed, and may contain 
information which is the subject of legal privilege and/or copyright.  
 
If you have received this communication in error, please notify the 
sender by return email and delete the transmission, together with any 
attachments, from your system. Thank you.
------------------------------------------------------------------------
(attachments)

UnSanity.java (1.07 KB)

Hi Steve,

Thx for the response. I don’t feed it any doc, it takes the entire Jeeves request and puts it through the XSLT FOP transformation. I’ll try to track down exactly where in the XML it’s having issues. It could be DB related as the md sub-xml sections come from the DB. The records I’m testing with are straight copies of the default ISO19139.mcp template with a new title. We’re using oracle as the DB and the default driver that comes with the latest GN libs.

I’ll wack a little method into Xml.java to check the chars in the XML being transformed; perhaps something like this could be added into the current impl to fail it nicely.

Are you guys using this latest source as your Geonetwork production version?

Cheers,

Kevin


From: Stephen.Davies@anonymised.com [mailto:Stephen.Davies@anonymised.com]
Sent: Wednesday, 22 October 2008 15:44 PM
To: geonetwork-devel@anonymised.comourceforge.net
Subject: Re: [GeoNetwork-devel] PDFPrint fails with exception[SEC=UNCLASSIFIED]

Hi Kevin,

I’ve encountered UTF encoding issues in the past. Generally my first port of call is to check that the data is actually valid UTF-8. I’ve attached a utility that simply dumps details of non-ascii characters to the console. This will give you a starting point. I generally use textpad to view the file in hex once the potential problem characters have been found. Not all programs, e.g. Windows notepad, behave correctly when ‘special’ characters exist (nor does it handle XML encoding declarations).

You may also like to consider the database settings and support for UTF-8. I think MySQL requires additional settings in the JDBC connect string.

Cheers,

Steve

-----Original Message-----
From: Kevin Gunn [mailto:k.gunn@anonymised.com]
Sent: Wednesday, 22 October 2008 4:20
To: geonetwork-devel@lists.sourceforge.net
Subject: [GeoNetwork-devel] PDFPrint fails with exception

Hi,

The latest BlueNetMEST’s PDFPrint fails on exception for me. Is there a known solution for this?

C:_work\geonetwork\BlueNet_MEST_SVN\web\geonetwork/xsl/portal-present-fop.xsl

2008-10-22 13:53:44,755 ERROR [jeeves.service] - → (C) message : org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8 sequence.

2008-10-22 13:53:44,755 ERROR [jeeves.service] - → (C) exception : XPathException

2008-10-22 13:53:44,755 DEBUG [jeeves.service] - Raised exception while executing service

org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8 sequence.

XPathException

en

pdf.search

Thx,

Kevin Gunn

Software Engineer

Australian Institute of Marine Science

Ph: (07) 47534305

Fax: (07) 4772 5852

E-mail: k.gunn@anonymised.com

------------------------------------------------------------------------|

The information contained in this communication is for the use of the |

individual or entity to whom it is addressed, and may contain |

information which is the subject of legal privilege and/or copyright. |

If you have received this communication in error, please notify the |

sender by return E-Mail and delete the transmission, together with any |

attachments, from your system. Thank you. |

------------------------------------------------------------------------|

 
 
--  
------------------------------------------------------------------------
The information contained in this communication is for the use of the 
individual or entity to whom it is addressed, and may contain 
information which is the subject of legal privilege and/or copyright.  
 
If you have received this communication in error, please notify the 
sender by return email and delete the transmission, together with any 
attachments, from your system. Thank you.
------------------------------------------------------------------------

--  
------------------------------------------------------------------------
The information contained in this communication is for the use of the 
individual or entity to whom it is addressed, and may contain 
information which is the subject of legal privilege and/or copyright.  

If you have received this communication in error, please notify the 
sender by return email and delete the transmission, together with any 
attachments, from your system. Thank you.
------------------------------------------------------------------------

Kevin,

Before doing that, could you check and see whether there is anything more specific in jetty/logs/output.log? Occasionally saxon puts more info about the problem including line/column numbers in there.

Cheers and thanks,
Simon

Kevin Gunn wrote:

Hi Steve,

Thx for the response. I don’t feed it any doc, it takes the entire Jeeves request and puts it through the XSLT FOP transformation. I’ll try to track down exactly where in the XML it’s having issues. It could be DB related as the md sub-xml sections come from the DB. The records I’m testing with are straight copies of the default ISO19139.mcp template with a new title. We’re using oracle as the DB and the default driver that comes with the latest GN libs.

I’ll wack a little method into Xml.java to check the chars in the XML being transformed; perhaps something like this could be added into the current impl to fail it nicely.

Are you guys using this latest source as your Geonetwork production version?

Cheers,

Kevin

------------------------------------------------------------------------

*From:* Stephen.Davies@anonymised.com [mailto:Stephen.Davies@anonymised.com]
*Sent:* Wednesday, 22 October 2008 15:44 PM
*To:* geonetwork-devel@lists.sourceforge.net
*Subject:* Re: [GeoNetwork-devel] PDFPrint fails with exception[SEC=UNCLASSIFIED]

Hi Kevin,

I’ve encountered UTF encoding issues in the past. Generally my first port of call is to check that the data is actually valid UTF-8. I’ve attached a utility that simply dumps details of non-ascii characters to the console. This will give you a starting point. I generally use textpad to view the file in hex once the potential problem characters have been found. Not all programs, e.g. Windows notepad, behave correctly when ‘special’ characters exist (nor does it handle XML encoding declarations).

You may also like to consider the database settings and support for UTF-8. I think MySQL requires additional settings in the JDBC connect string.

Cheers,

Steve

-----Original Message-----
*From:* Kevin Gunn [mailto:k.gunn@anonymised.com]
*Sent:* Wednesday, 22 October 2008 4:20
*To:* geonetwork-devel@lists.sourceforge.net
*Subject:* [GeoNetwork-devel] PDFPrint fails with exception

Hi,

The latest BlueNetMEST’s PDFPrint fails on exception for me. Is there a known solution for this?

C:\_work\geonetwork\BlueNet_MEST_SVN\web\geonetwork\/xsl/portal-present-fop.xsl

2008-10-22 13:53:44,755 ERROR [jeeves.service] - -> (C) message : org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8 sequence.

2008-10-22 13:53:44,755 ERROR [jeeves.service] - -> (C) exception : XPathException

2008-10-22 13:53:44,755 DEBUG [jeeves.service] - Raised exception while executing service

<error id="error">

<message>org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8 sequence.</message>

<class>XPathException</class>

<stack>

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="362" method="sendSAXSource" />

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="184" method="send" />

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="49" method="send" />

<at class="net.sf.saxon.Controller" file="Controller.java" line="1550" method="transform" />

<at class="jeeves.utils.Xml" file="Xml.java" line="265" method="transformFOP" />

<at class="jeeves.server.dispatchers.ServiceManager" file="ServiceManager.java" line="580" method="dispatchOutput" />

<at class="jeeves.server.dispatchers.ServiceManager" file="ServiceManager.java" line="383" method="dispatch" />

<at class="jeeves.server.JeevesEngine" file="JeevesEngine.java" line="621" method="dispatch" />

<at class="jeeves.server.sources.http.JeevesServlet" file="JeevesServlet.java" line="163" method="execute" />

<at class="jeeves.server.sources.http.JeevesServlet" file="JeevesServlet.java" line="88" method="doGet" />

</stack>

<request>

<language>en</language>

<service>pdf.search</service>

</request>

<response>

<summary count="1" type="local">

<keywords />

<categories />

<sources>

<source count="1" name="87aa46b0-a57f-4f33-8087-effe4c4dfcc5" />

</sources>

</summary>

</response>

</error>

Thx,

Kevin Gunn

Software Engineer

Australian Institute of Marine Science

Ph: (07) 47534305

Fax: (07) 4772 5852

E-mail: k.gunn@anonymised.com

------------------------------------------------------------------------|

The information contained in this communication is for the use of the |

individual or entity to whom it is addressed, and may contain |

information which is the subject of legal privilege and/or copyright. |

If you have received this communication in error, please notify the |

sender by return E-Mail and delete the transmission, together with any |

attachments, from your system. Thank you. |

------------------------------------------------------------------------|

The information contained in this communication is for the use of the individual or entity to whom it is addressed, and may contain information which is the subject of legal privilege and/or copyright.
If you have received this communication in error, please notify the sender by return email and delete the transmission, together with any attachments, from your system. Thank you.
------------------------------------------------------------------------
-- ------------------------------------------------------------------------
The information contained in this communication is for the use of the individual or entity to whom it is addressed, and may contain information which is the subject of legal privilege and/or copyright.

If you have received this communication in error, please notify the sender by return email and delete the transmission, together with any attachments, from your system. Thank you.
------------------------------------------------------------------------
  ------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
------------------------------------------------------------------------

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at GeoNetwork - Geographic Metadata Catalog download | SourceForge.net

Hi,

I'm running the beast in tomcat 5.5.27, will check the logs again. I'll
add the saxon path to the log config to see if that shows any more
detail.

Cheers,
Kevin

-----Original Message-----
From: Simon Pigot [mailto:Simon.Pigot@anonymised.com]
Sent: Thursday, 23 October 2008 10:04 AM
To: Kevin Gunn
Cc: Stephen.Davies@anonymised.com; geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] PDFPrint fails with
exception[SEC=UNCLASSIFIED]

Kevin,

Before doing that, could you check and see whether there is anything
more specific in jetty/logs/output.log? Occasionally saxon puts more
info about the problem including line/column numbers in there.

Cheers and thanks,
Simon

Kevin Gunn wrote:

Hi Steve,

Thx for the response. I don't feed it any doc, it takes the entire
Jeeves request and puts it through the XSLT FOP transformation. I'll
try to track down exactly where in the XML it's having issues. It
could be DB related as the md sub-xml sections come from the DB. The
records I'm testing with are straight copies of the default
ISO19139.mcp template with a new title. We're using oracle as the DB
and the default driver that comes with the latest GN libs.

I'll wack a little method into Xml.java to check the chars in the XML
being transformed; perhaps something like this could be added into the

current impl to fail it nicely.

Are you guys using this latest source as your Geonetwork production
version?

Cheers,

Kevin

------------------------------------------------------------------------

*From:* Stephen.Davies@anonymised.com [mailto:Stephen.Davies@anonymised.com]
*Sent:* Wednesday, 22 October 2008 15:44 PM
*To:* geonetwork-devel@lists.sourceforge.net
*Subject:* Re: [GeoNetwork-devel] PDFPrint fails with
exception[SEC=UNCLASSIFIED]

Hi Kevin,

I've encountered UTF encoding issues in the past. Generally my first
port of call is to check that the data is actually valid UTF-8. I've
attached a utility that simply dumps details of non-ascii characters
to the console. This will give you a starting point. I generally use
textpad to view the file in hex once the potential problem characters
have been found. Not all programs, e.g. Windows notepad, behave
correctly when 'special' characters exist (nor does it handle XML
encoding declarations).

You may also like to consider the database settings and support for
UTF-8. I think MySQL requires additional settings in the JDBC connect
string.

Cheers,

Steve

-----Original Message-----
*From:* Kevin Gunn [mailto:k.gunn@anonymised.com]
*Sent:* Wednesday, 22 October 2008 4:20
*To:* geonetwork-devel@lists.sourceforge.net
*Subject:* [GeoNetwork-devel] PDFPrint fails with exception

Hi,

The latest BlueNetMEST's PDFPrint fails on exception for me. Is there
a known solution for this?

C:\_work\geonetwork\BlueNet_MEST_SVN\web\geonetwork\/xsl/portal-present-
fop.xsl

2008-10-22 13:53:44,755 ERROR [jeeves.service] - -> (C) message :
org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8

sequence.

2008-10-22 13:53:44,755 ERROR [jeeves.service] - -> (C) exception :
XPathException

2008-10-22 13:53:44,755 DEBUG [jeeves.service] - Raised exception
while executing service

<error id="error">

<message>org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8

sequence.</message>

<class>XPathException</class>

<stack>

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="362"
method="sendSAXSource" />

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="184"
method="send" />

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="49"
method="send" />

<at class="net.sf.saxon.Controller" file="Controller.java" line="1550"

method="transform" />

<at class="jeeves.utils.Xml" file="Xml.java" line="265"
method="transformFOP" />

<at class="jeeves.server.dispatchers.ServiceManager"
file="ServiceManager.java" line="580" method="dispatchOutput" />

<at class="jeeves.server.dispatchers.ServiceManager"
file="ServiceManager.java" line="383" method="dispatch" />

<at class="jeeves.server.JeevesEngine" file="JeevesEngine.java"
line="621" method="dispatch" />

<at class="jeeves.server.sources.http.JeevesServlet"
file="JeevesServlet.java" line="163" method="execute" />

<at class="jeeves.server.sources.http.JeevesServlet"
file="JeevesServlet.java" line="88" method="doGet" />

</stack>

<request>

<language>en</language>

<service>pdf.search</service>

</request>

<response>

<summary count="1" type="local">

<keywords />

<categories />

<sources>

<source count="1" name="87aa46b0-a57f-4f33-8087-effe4c4dfcc5" />

</sources>

</summary>

</response>

</error>

Thx,

Kevin Gunn

Software Engineer

Australian Institute of Marine Science

Ph: (07) 47534305

Fax: (07) 4772 5852

E-mail: k.gunn@anonymised.com

------------------------------------------------------------------------
|

The information contained in this communication is for the use of the

|

individual or entity to whom it is addressed, and may contain |

information which is the subject of legal privilege and/or copyright.

|

If you have received this communication in error, please notify the |

sender by return E-Mail and delete the transmission, together with any

|

attachments, from your system. Thank you. |

------------------------------------------------------------------------
|

--

------------------------------------------------------------------------

The information contained in this communication is for the use of the
individual or entity to whom it is addressed, and may contain
information which is the subject of legal privilege and/or copyright.

If you have received this communication in error, please notify the
sender by return email and delete the transmission, together with any
attachments, from your system. Thank you.

------------------------------------------------------------------------

--

------------------------------------------------------------------------

The information contained in this communication is for the use of the
individual or entity to whom it is addressed, and may contain
information which is the subject of legal privilege and/or copyright.

If you have received this communication in error, please notify the
sender by return email and delete the transmission, together with any
attachments, from your system. Thank you.

------------------------------------------------------------------------

  

------------------------------------------------------------------------

------------------------------------------------------------------------
-

This SF.Net email is sponsored by the Moblin Your Move Developer's

challenge

Build the coolest Linux based applications with Moblin SDK & win great

prizes

Grand prize is a trip for two to an Open Source event anywhere in the

world

http://moblin-contest.org/redirect.php?banner_id=100&url=/

------------------------------------------------------------------------

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at

http://sourceforge.net/projects/geonetwork

--
------------------------------------------------------------------------
The information contained in this communication is for the use of the
individual or entity to whom it is addressed, and may contain
information which is the subject of legal privilege and/or copyright.

If you have received this communication in error, please notify the
sender by return email and delete the transmission, together with any
attachments, from your system. Thank you.
------------------------------------------------------------------------

Hi,

The issue is with some language names, copyright chars, and also some of
the space characters are coming up as not UTF-8 compliant. Attached is
the XML file for these errors.

2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
191 col: 15 char (decimal): 241
2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
204 col: 15 char (decimal): 231
2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
238 col: 24 char (decimal): 160
2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
238 col: 28 char (decimal): 160
2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
254 col: 91 char (decimal): 174
2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
256 col: 70 char (decimal): 174
2008-10-23 14:45:35,818 WARN [jeeves] - Non UTF-8 char found at Line:
5935 col: 64 char (decimal): 146
2008-10-23 14:45:35,818 WARN [jeeves] - Non UTF-8 char found at Line:
6436 col: 85 char (decimal): 150

Once removed the produced PDF is available, and doesn't seem to use any
of these UTF-8 affected elements. I fixed these by substitution of the
non-UTF-8 chars with '?'. This isn't the most elegant fix. We could
follow through the style sheet to see what elements the FOP wants, and
perhaps these should be stripped out into a new element before
transformation, but even those could have non-UTF-8 chars. So at some
point either the non-UTF-8 chars get stripped/swapped, or double byte
encoding is used. Does Saxon let us use other encoding? I'll track down
some doco on saxon and have a read.

Cheers,
Kevin

-----Original Message-----
From: Simon Pigot [mailto:Simon.Pigot@anonymised.com]
Sent: Thursday, 23 October 2008 10:04 AM
To: Kevin Gunn
Cc: Stephen.Davies@anonymised.com; geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] PDFPrint fails with
exception[SEC=UNCLASSIFIED]

Kevin,

Before doing that, could you check and see whether there is anything
more specific in jetty/logs/output.log? Occasionally saxon puts more
info about the problem including line/column numbers in there.

Cheers and thanks,
Simon

Kevin Gunn wrote:

Hi Steve,

Thx for the response. I don't feed it any doc, it takes the entire
Jeeves request and puts it through the XSLT FOP transformation. I'll
try to track down exactly where in the XML it's having issues. It
could be DB related as the md sub-xml sections come from the DB. The
records I'm testing with are straight copies of the default
ISO19139.mcp template with a new title. We're using oracle as the DB
and the default driver that comes with the latest GN libs.

I'll wack a little method into Xml.java to check the chars in the XML
being transformed; perhaps something like this could be added into the

current impl to fail it nicely.

Are you guys using this latest source as your Geonetwork production
version?

Cheers,

Kevin

------------------------------------------------------------------------

*From:* Stephen.Davies@anonymised.com [mailto:Stephen.Davies@anonymised.com]
*Sent:* Wednesday, 22 October 2008 15:44 PM
*To:* geonetwork-devel@lists.sourceforge.net
*Subject:* Re: [GeoNetwork-devel] PDFPrint fails with
exception[SEC=UNCLASSIFIED]

Hi Kevin,

I've encountered UTF encoding issues in the past. Generally my first
port of call is to check that the data is actually valid UTF-8. I've
attached a utility that simply dumps details of non-ascii characters
to the console. This will give you a starting point. I generally use
textpad to view the file in hex once the potential problem characters
have been found. Not all programs, e.g. Windows notepad, behave
correctly when 'special' characters exist (nor does it handle XML
encoding declarations).

You may also like to consider the database settings and support for
UTF-8. I think MySQL requires additional settings in the JDBC connect
string.

Cheers,

Steve

-----Original Message-----
*From:* Kevin Gunn [mailto:k.gunn@anonymised.com]
*Sent:* Wednesday, 22 October 2008 4:20
*To:* geonetwork-devel@lists.sourceforge.net
*Subject:* [GeoNetwork-devel] PDFPrint fails with exception

Hi,

The latest BlueNetMEST's PDFPrint fails on exception for me. Is there
a known solution for this?

C:\_work\geonetwork\BlueNet_MEST_SVN\web\geonetwork\/xsl/portal-present-
fop.xsl

2008-10-22 13:53:44,755 ERROR [jeeves.service] - -> (C) message :
org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8

sequence.

2008-10-22 13:53:44,755 ERROR [jeeves.service] - -> (C) exception :
XPathException

2008-10-22 13:53:44,755 DEBUG [jeeves.service] - Raised exception
while executing service

<error id="error">

<message>org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8

sequence.</message>

<class>XPathException</class>

<stack>

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="362"
method="sendSAXSource" />

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="184"
method="send" />

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="49"
method="send" />

<at class="net.sf.saxon.Controller" file="Controller.java" line="1550"

method="transform" />

<at class="jeeves.utils.Xml" file="Xml.java" line="265"
method="transformFOP" />

<at class="jeeves.server.dispatchers.ServiceManager"
file="ServiceManager.java" line="580" method="dispatchOutput" />

<at class="jeeves.server.dispatchers.ServiceManager"
file="ServiceManager.java" line="383" method="dispatch" />

<at class="jeeves.server.JeevesEngine" file="JeevesEngine.java"
line="621" method="dispatch" />

<at class="jeeves.server.sources.http.JeevesServlet"
file="JeevesServlet.java" line="163" method="execute" />

<at class="jeeves.server.sources.http.JeevesServlet"
file="JeevesServlet.java" line="88" method="doGet" />

</stack>

<request>

<language>en</language>

<service>pdf.search</service>

</request>

<response>

<summary count="1" type="local">

<keywords />

<categories />

<sources>

<source count="1" name="87aa46b0-a57f-4f33-8087-effe4c4dfcc5" />

</sources>

</summary>

</response>

</error>

Thx,

Kevin Gunn

Software Engineer

Australian Institute of Marine Science

Ph: (07) 47534305

Fax: (07) 4772 5852

E-mail: k.gunn@anonymised.com

------------------------------------------------------------------------
|

The information contained in this communication is for the use of the

|

individual or entity to whom it is addressed, and may contain |

information which is the subject of legal privilege and/or copyright.

|

If you have received this communication in error, please notify the |

sender by return E-Mail and delete the transmission, together with any

|

attachments, from your system. Thank you. |

------------------------------------------------------------------------
|

--

------------------------------------------------------------------------

The information contained in this communication is for the use of the
individual or entity to whom it is addressed, and may contain
information which is the subject of legal privilege and/or copyright.

If you have received this communication in error, please notify the
sender by return email and delete the transmission, together with any
attachments, from your system. Thank you.

------------------------------------------------------------------------

--

------------------------------------------------------------------------

The information contained in this communication is for the use of the
individual or entity to whom it is addressed, and may contain
information which is the subject of legal privilege and/or copyright.

If you have received this communication in error, please notify the
sender by return email and delete the transmission, together with any
attachments, from your system. Thank you.

------------------------------------------------------------------------

  

------------------------------------------------------------------------

------------------------------------------------------------------------
-

This SF.Net email is sponsored by the Moblin Your Move Developer's

challenge

Build the coolest Linux based applications with Moblin SDK & win great

prizes

Grand prize is a trip for two to an Open Source event anywhere in the

world

http://moblin-contest.org/redirect.php?banner_id=100&url=/

------------------------------------------------------------------------

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at

http://sourceforge.net/projects/geonetwork

--
------------------------------------------------------------------------
The information contained in this communication is for the use of the
individual or entity to whom it is addressed, and may contain
information which is the subject of legal privilege and/or copyright.

If you have received this communication in error, please notify the
sender by return email and delete the transmission, together with any
attachments, from your system. Thank you.
------------------------------------------------------------------------

(attachments)

PDFPrint_XML.xml (598 KB)

Hi Kevin,

That all works on Linux so I'm not sure that that was the issue. Instead I think the problem appears to have been the way in which the xml was prepared for transformation within the transformFOP method in jeeves/src/jeeves/utils/Xml.java - it was being written to a string using Xml.getString which worked fine on Linux but produced non- utf-8 on Windoze - switching the input to a JDOMSource seems to have fixed it and it now appears to work on both Linux and Windoze.

Francois, not sure if the trunk is affected in the same way?

Cheers,
Simon

Kevin Gunn wrote:

Hi,

The issue is with some language names, copyright chars, and also some of
the space characters are coming up as not UTF-8 compliant. Attached is
the XML file for these errors.

2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
191 col: 15 char (decimal): 241
2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
204 col: 15 char (decimal): 231
2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
238 col: 24 char (decimal): 160
2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
238 col: 28 char (decimal): 160
2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
254 col: 91 char (decimal): 174
2008-10-23 14:45:35,803 WARN [jeeves] - Non UTF-8 char found at Line:
256 col: 70 char (decimal): 174
2008-10-23 14:45:35,818 WARN [jeeves] - Non UTF-8 char found at Line:
5935 col: 64 char (decimal): 146
2008-10-23 14:45:35,818 WARN [jeeves] - Non UTF-8 char found at Line:
6436 col: 85 char (decimal): 150

Once removed the produced PDF is available, and doesn't seem to use any
of these UTF-8 affected elements. I fixed these by substitution of the
non-UTF-8 chars with '?'. This isn't the most elegant fix. We could
follow through the style sheet to see what elements the FOP wants, and
perhaps these should be stripped out into a new element before
transformation, but even those could have non-UTF-8 chars. So at some
point either the non-UTF-8 chars get stripped/swapped, or double byte
encoding is used. Does Saxon let us use other encoding? I'll track down
some doco on saxon and have a read.

Cheers,
Kevin

-----Original Message-----
From: Simon Pigot [mailto:Simon.Pigot@anonymised.com] Sent: Thursday, 23 October 2008 10:04 AM
To: Kevin Gunn
Cc: Stephen.Davies@anonymised.com; geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] PDFPrint fails with
exception[SEC=UNCLASSIFIED]

Kevin,

Before doing that, could you check and see whether there is anything more specific in jetty/logs/output.log? Occasionally saxon puts more info about the problem including line/column numbers in there.

Cheers and thanks,
Simon

Kevin Gunn wrote:
  

Hi Steve,

Thx for the response. I don't feed it any doc, it takes the entire Jeeves request and puts it through the XSLT FOP transformation. I'll try to track down exactly where in the XML it's having issues. It could be DB related as the md sub-xml sections come from the DB. The records I'm testing with are straight copies of the default ISO19139.mcp template with a new title. We're using oracle as the DB and the default driver that comes with the latest GN libs.

I'll wack a little method into Xml.java to check the chars in the XML being transformed; perhaps something like this could be added into the
    
current impl to fail it nicely.

Are you guys using this latest source as your Geonetwork production version?

Cheers,

Kevin

------------------------------------------------------------------------
  

*From:* Stephen.Davies@anonymised.com [mailto:Stephen.Davies@anonymised.com]
*Sent:* Wednesday, 22 October 2008 15:44 PM
*To:* geonetwork-devel@lists.sourceforge.net
*Subject:* Re: [GeoNetwork-devel] PDFPrint fails with exception[SEC=UNCLASSIFIED]

Hi Kevin,

I've encountered UTF encoding issues in the past. Generally my first port of call is to check that the data is actually valid UTF-8. I've attached a utility that simply dumps details of non-ascii characters to the console. This will give you a starting point. I generally use textpad to view the file in hex once the potential problem characters have been found. Not all programs, e.g. Windows notepad, behave correctly when 'special' characters exist (nor does it handle XML encoding declarations).

You may also like to consider the database settings and support for UTF-8. I think MySQL requires additional settings in the JDBC connect string.

Cheers,

Steve

-----Original Message-----
*From:* Kevin Gunn [mailto:k.gunn@anonymised.com]
*Sent:* Wednesday, 22 October 2008 4:20
*To:* geonetwork-devel@lists.sourceforge.net
*Subject:* [GeoNetwork-devel] PDFPrint fails with exception

Hi,

The latest BlueNetMEST's PDFPrint fails on exception for me. Is there a known solution for this?

C:\_work\geonetwork\BlueNet_MEST_SVN\web\geonetwork\/xsl/portal-present-
fop.xsl
  

2008-10-22 13:53:44,755 ERROR [jeeves.service] - -> (C) message : org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8
    

sequence.
  

2008-10-22 13:53:44,755 ERROR [jeeves.service] - -> (C) exception : XPathException

2008-10-22 13:53:44,755 DEBUG [jeeves.service] - Raised exception while executing service

<error id="error">

<message>org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8
    
sequence.</message>

<class>XPathException</class>

<stack>

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="362" method="sendSAXSource" />

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="184" method="send" />

<at class="net.sf.saxon.event.Sender" file="Sender.java" line="49" method="send" />

<at class="net.sf.saxon.Controller" file="Controller.java" line="1550"
    
method="transform" />

<at class="jeeves.utils.Xml" file="Xml.java" line="265" method="transformFOP" />

<at class="jeeves.server.dispatchers.ServiceManager" file="ServiceManager.java" line="580" method="dispatchOutput" />

<at class="jeeves.server.dispatchers.ServiceManager" file="ServiceManager.java" line="383" method="dispatch" />

<at class="jeeves.server.JeevesEngine" file="JeevesEngine.java" line="621" method="dispatch" />

<at class="jeeves.server.sources.http.JeevesServlet" file="JeevesServlet.java" line="163" method="execute" />

<at class="jeeves.server.sources.http.JeevesServlet" file="JeevesServlet.java" line="88" method="doGet" />

</stack>

<request>

<language>en</language>

<service>pdf.search</service>

</request>

<response>

<summary count="1" type="local">

<keywords />

<categories />

<sources>

<source count="1" name="87aa46b0-a57f-4f33-8087-effe4c4dfcc5" />

</sources>

</summary>

</response>

</error>

Thx,

Kevin Gunn

Software Engineer

Australian Institute of Marine Science

Ph: (07) 47534305

Fax: (07) 4772 5852

E-mail: k.gunn@anonymised.com

------------------------------------------------------------------------
|
  

The information contained in this communication is for the use of the
    

|
  

individual or entity to whom it is addressed, and may contain |

information which is the subject of legal privilege and/or copyright.
    

|
  

If you have received this communication in error, please notify the |

sender by return E-Mail and delete the transmission, together with any
    

|
  

attachments, from your system. Thank you. |

------------------------------------------------------------------------
|
  

------------------------------------------------------------------------
  

The information contained in this communication is for the use of the individual or entity to whom it is addressed, and may contain information which is the subject of legal privilege and/or copyright.
    
If you have received this communication in error, please notify the sender by return email and delete the transmission, together with any attachments, from your system. Thank you.

------------------------------------------------------------------------
  

--

------------------------------------------------------------------------
  

The information contained in this communication is for the use of the individual or entity to whom it is addressed, and may contain information which is the subject of legal privilege and/or copyright.
    
If you have received this communication in error, please notify the sender by return email and delete the transmission, together with any attachments, from your system. Thank you.

------------------------------------------------------------------------
  

------------------------------------------------------------------------
  

------------------------------------------------------------------------
-
  

This SF.Net email is sponsored by the Moblin Your Move Developer's
    

challenge
  

Build the coolest Linux based applications with Moblin SDK & win great
    

prizes
  

Grand prize is a trip for two to an Open Source event anywhere in the
    

world
  

http://moblin-contest.org/redirect.php?banner_id=100&url=/

------------------------------------------------------------------------
  

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at
    

GeoNetwork - Geographic Metadata Catalog download | SourceForge.net

-- ------------------------------------------------------------------------
The information contained in this communication is for the use of the individual or entity to whom it is addressed, and may contain information which is the subject of legal privilege and/or copyright.

If you have received this communication in error, please notify the sender by return email and delete the transmission, together with any attachments, from your system. Thank you.
------------------------------------------------------------------------
  

Hi Simon,

On jeu, 2008-10-23 at 18:20 +1100, Simon Pigot wrote:

jeeves/src/jeeves/utils/Xml.java - it was being written to a string
using Xml.getString which worked fine on Linux but produced non- utf-8
on Windoze - switching the input to a JDOMSource seems to have fixed it
and it now appears to work on both Linux and Windoze.

so instead of using a StreamSource, we could use :
Source src = new JDOMSource(xml);
or maybe a StreamSource with getBytes and Charset but more complex.

Francois, not sure if the trunk is affected in the same way?

Yep I think so.

Do you plan to fix it or I could test and do it then ?

Ciao.
Francois