[GeoNetwork-devel] Escaping ampersands in XSLT-generated URLs

I'm finding quite a few places in XSLT scripts that
generate URLs containing un-escaped ampersands.

For example, graphover-show.xsl contains the fragment:

...get?access=public&id={/root/response/id}&fname=...

This ends up in the browser as:

...get?access=public&id=...&fname=...

but it _should_ be:

...get?access=public&id=...&fname=...

One way to proceed would be to change the XSLT script to say:

...get?access=public&id={/root/response/id}&fname=...

but is this the best way?

--
Richard Walker
Software Improvements Pty Ltd
Phone: +61 2 6273 2055
Fax: +61 2 6273 2082

Hi Richard,
Does the below example generate an error for you?
Ciao,
Jeroen

On May 27, 2008, at 4:51 AM, Software Improvements gn-devel wrote:

I'm finding quite a few places in XSLT scripts that
generate URLs containing un-escaped ampersands.

For example, graphover-show.xsl contains the fragment:

...get?access=public&id={/root/response/id}&fname=...

This ends up in the browser as:

...get?access=public&id=...&fname=...

but it _should_ be:

...get?access=public&id=...&fname=...

One way to proceed would be to change the XSLT script to say:

...get?access=public&id={/root/response/id}&fname=...

but is this the best way?

--
Richard Walker
Software Improvements Pty Ltd
Phone: +61 2 6273 2055
Fax: +61 2 6273 2082

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Jeroen Ticheler wrote:

Hi Richard,
Does the below example generate an error for you?

I note that my previous e-mail looks very
strange in Nabble because every occurrence
of the sequence of characters
"ampersand a m p semicolon" appears
in the browser only as an ampersand. I really
did write just an ampersand in some places and
"ampersand a m p semicolon" in others! So for
the rest of this e-mail please mentally
replace each occurrence of "ampersand" with a
real ampersand.

I attach an example page generated via the graphover-show.xsl
script. I don't know how this example will get munged
by the mailing list or Nabble - the important line
reads "access=publicampersandid=10ampersandfname"
except with real ampersands.

If you put this into the W3C validator
(http://validator.w3.org/) you get
a number of errors related to the 'general entity "id"'
and 'general entity "fname"'.

The validator (correctly) parses
"access=publicampersandid=10"
as though it was "access=publicampersandid;=10",
i.e., by assuming a missing semicolon,
and then (correctly) rejects this as invalid HTML
because there is indeed no 'general entity "id"'.

--
Richard Walker
Software Improvements Pty Ltd
Phone: +61 2 6273 2055
Fax: +61 2 6273 2082

(attachments)

broken.html (237 Bytes)

I wrote:

I'm finding quite a few places in XSLT scripts that
generate URLs containing un-escaped ampersands.

I am pleased (and greatly relieved) to report
that:

1. It is a very, very longstanding error in the
    HTML output method of Xalan-J:
    http://issues.apache.org/jira/browse/XALANJ-611
2. The error has been fixed in Xalan-J version 2.7.1.
3. After replacing serializer.jar
    and xalan.jar with the 2.7.1 versions, the
    problem is fixed!
4. Saxon 9 also does the right thing.

But:
5. Someone should check the effect of upgrading
    to 2.7.1 on the escapeXMLEntities template
    defined in utils.xsl and used in metadata.xsl -
    some of this escaping
    may need to be modified, or indeed may no
    longer be necessary. I can do this checking
    if someone can give me a search query (either
    on the example data or as a remote search)
    that would trigger that bit of the code.

--
Richard Walker
Software Improvements Pty Ltd
Phone: +61 2 6273 2055
Fax: +61 2 6273 2082

I wrote:

5. Someone should check the effect of upgrading
   to 2.7.1 on the escapeXMLEntities template
   defined in utils.xsl and used in metadata.xsl -
   some of this escaping
   may need to be modified, or indeed may no
   longer be necessary. I can do this checking
   if someone can give me a search query (either
   on the example data or as a remote search)
   that would trigger that bit of the code.

OK, I had a look for myself. The "Hydrological basins"
sample metadata has an ampersand in the metadata. Switching
Xalan from 2.7.0 to 2.7.1 does _not_ alter the
display and editing of this metadata. (Phew.)

On looking at the code I can see why - the error in
Xalan that is fixed in version 2.7.1 is to do with ampersands
in attribute values, and the escapeXMLEntities template deals
with text nodes, and that worked correctly anyway.

But this did show up what might be considered
inconsistencies (or indeed, errors) in the presentation/editing
of metadata.

Try the following three things with the Hydrological
basins metadata while logged in as the "admin" user:

1. Bring up the "Default" view of the metadata,
    and press the "Edit" button.
    In the "OnLine resource" labelled
    "Hydrological basins in Africa (Shapefile Format)",
    observe the URL contains:
    id=10 ampersand fname=basins.zip ampersand access=private
    (If you use the browser's "View Source", you can see that
    the ampersands are escaped in the value attribute of the
    corresponding input element in the form's HTML because
    of the escapeXMLEntities template,
    but the browser renders them as raw ampersands.)
2. Press "Cancel", then press "XML view".
    Copy and paste the metadata from the browser into
    a file, and run your favourite XML parser on the file.
    Ouch - it is not well-formed XML because there are
    two unescaped ampersands!
    Each ampersand should have been displayed in the browser as
    ampersand amp semicolon.
3. Go back to the browser and press the
    "Edit button". (You're now editing the metadata as "raw"
    XML.) Search for "fname". Aha, the ampersands have
    been correctly escaped in this view (because of the
    escapeXMLEntities template).

Conclusions:
1. The "XML view" of the metadata can produce XML that
    is not well-formed. That is surely an error, yes?
2. Editing in the Default view is different from editing
    in the XML view - in the default view, ampersands are
    not escaped; in the XML view, they are. An
    inconsistency, but perhaps a reasonable one. But
    this could be a "gotcha" for users.

--
Richard Walker
Software Improvements Pty Ltd
Phone: +61 2 6273 2055
Fax: +61 2 6273 2082