[GeoNetwork-devel] Saving UUID's

Hi list,

When performing a GetRecordById request, I found out that GN changes the (ISO19139) UUID to lower case when saving it to the table Metadata.

If MD has a UUID 0C12204F-5626-4A2E-94F4-514424F093A1 in the ISO19139 encoded XML, it is saved as

0c12204f-5626-4a2e-94f4-514424f093a1

Why is this done?

Now it seems to cause some problems, for example: when requesting for the original MD, using CSW request like this:

<GetRecordById xmlns="http://www.opengis.net/cat/csw/2.0.2&quot;
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot;
         outputSchema="http://www.isotc211.org/2005/gmd&quot;
         service="CSW"
         version="2.0.2"
         xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd&quot;&gt;
   <Id>0C12204F-5626-4A2E-94F4-514424F093A1</Id>
</GetRecordById>

The record is not found.

Best regards,
Thijs

Hi Thijs

2009/9/28 Thijs Brentjens <lists@anonymised.com>:

Hi list,

When performing a GetRecordById request, I found out that GN changes the
(ISO19139) UUID to lower case when saving it to the table Metadata.

If MD has a UUID 0C12204F-5626-4A2E-94F4-514424F093A1 in the ISO19139
encoded XML, it is saved as

0c12204f-5626-4a2e-94f4-514424f093a1
Why is this done?

It was before 2.4 but actually it should use a WhitespaceAnalyzer
instead of a StandardAnalyzer for this field (and some other like
operatesOn)

See SearchManager
_analyzer.addAnalyzer("_uuid", new WhitespaceAnalyzer());

Which version are you using ?

Francois

Now it seems to cause some problems, for example: when requesting for
the original MD, using CSW request like this:

<GetRecordById xmlns="http://www.opengis.net/cat/csw/2.0.2&quot;
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot;
outputSchema="http://www.isotc211.org/2005/gmd&quot;
service="CSW"
version="2.0.2"

xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2
http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd&quot;&gt;
<Id>0C12204F-5626-4A2E-94F4-514424F093A1</Id>
</GetRecordById>

The record is not found.

Best regards,
Thijs

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

it seems to me that saving the UUID in lowercase is “correct”, or at least, insignificant, as UUID fields are (case-insensitive) hexadecimal numbers.

See RFC 4122 (http://www.ietf.org/rfc/rfc4122.txt) : “The hexadecimal values “a” through “f” are output as lower case characters and are case insensitive on input.”

The X667 recommendation (which CSW 202, OGC 07-006r1 refers to as The Spec for generating UUIDs) is hard to consult as it’s downloading at 0 speed for me – anyone better luck ? I’m trying here: http://www.itu.int/rec/T-REC-X.667-200409-S/en.

If we can’t retrieve records without lowercasing the UUID in the request, this seems to be a bug in GN’s search mechanism.

If we really would be using StandardAnalyzer at both index time and search time this could not happen, because it includes LowerCaseFilter.

Kind regards
Heikki Doeleman

On Mon, Sep 28, 2009 at 12:04 PM, Francois Prunayre <fx.prunayre@anonymised.com> wrote:

Hi Thijs

2009/9/28 Thijs Brentjens <lists@anonymised.com20…>:

Hi list,

When performing a GetRecordById request, I found out that GN changes the
(ISO19139) UUID to lower case when saving it to the table Metadata.

If MD has a UUID 0C12204F-5626-4A2E-94F4-514424F093A1 in the ISO19139
encoded XML, it is saved as

0c12204f-5626-4a2e-94f4-514424f093a1
Why is this done?

It was before 2.4 but actually it should use a WhitespaceAnalyzer
instead of a StandardAnalyzer for this field (and some other like
operatesOn)

See SearchManager
_analyzer.addAnalyzer(“_uuid”, new WhitespaceAnalyzer());

Which version are you using ?

Francois

Now it seems to cause some problems, for example: when requesting for
the original MD, using CSW request like this:

<GetRecordById xmlns=“http://www.opengis.net/cat/csw/2.0.2
xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance
outputSchema=“http://www.isotc211.org/2005/gmd
service=“CSW”
version=“2.0.2”

xsi:schemaLocation=“http://www.opengis.net/cat/csw/2.0.2
http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd”>
0C12204F-5626-4A2E-94F4-514424F093A1

The record is not found.

Best regards,
Thijs


Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

hi,

discussing with Francois, I’ve changed SearchManager so that it now uses StandardAnalyzer for indexing UUIDs.

However, closer inspection of the code in LuceneSearcher shows that in no case any Lucene analyzer is applied to search terms. It does some lowercasing here and there, that’s probably why many searches do end up being case insensitive.

It’s normal practice to apply the same analyzers to search terms and index terms in Lucene applications, in order to make things match. When integrating the NGR LuceneSearcher (with the performance fix) I’ll see how to include processing search terms by the correct analyzers, as the code is quite different from the current code in 2.4.x and trunk it makes more sense to do it there.

Kind regards
Heikki Doeleman

On Mon, Sep 28, 2009 at 12:24 PM, heikki <tropicano@anonymised.com> wrote:

it seems to me that saving the UUID in lowercase is “correct”, or at least, insignificant, as UUID fields are (case-insensitive) hexadecimal numbers.

See RFC 4122 (http://www.ietf.org/rfc/rfc4122.txt) : “The hexadecimal values “a” through “f” are output as lower case characters and are case insensitive on input.”

The X667 recommendation (which CSW 202, OGC 07-006r1 refers to as The Spec for generating UUIDs) is hard to consult as it’s downloading at 0 speed for me – anyone better luck ? I’m trying here: http://www.itu.int/rec/T-REC-X.667-200409-S/en.

If we can’t retrieve records without lowercasing the UUID in the request, this seems to be a bug in GN’s search mechanism.

If we really would be using StandardAnalyzer at both index time and search time this could not happen, because it includes LowerCaseFilter.

Kind regards
Heikki Doeleman

On Mon, Sep 28, 2009 at 12:04 PM, Francois Prunayre <fx.prunayre@anonymised.com…> wrote:

Hi Thijs

2009/9/28 Thijs Brentjens <lists@anonymised.com>:

Hi list,

When performing a GetRecordById request, I found out that GN changes the
(ISO19139) UUID to lower case when saving it to the table Metadata.

If MD has a UUID 0C12204F-5626-4A2E-94F4-514424F093A1 in the ISO19139
encoded XML, it is saved as

0c12204f-5626-4a2e-94f4-514424f093a1
Why is this done?

It was before 2.4 but actually it should use a WhitespaceAnalyzer
instead of a StandardAnalyzer for this field (and some other like
operatesOn)

See SearchManager
_analyzer.addAnalyzer(“_uuid”, new WhitespaceAnalyzer());

Which version are you using ?

Francois

Now it seems to cause some problems, for example: when requesting for
the original MD, using CSW request like this:

<GetRecordById xmlns=“http://www.opengis.net/cat/csw/2.0.2
xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance
outputSchema=“http://www.isotc211.org/2005/gmd
service=“CSW”
version=“2.0.2”

xsi:schemaLocation=“http://www.opengis.net/cat/csw/2.0.2
http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd”>
0C12204F-5626-4A2E-94F4-514424F093A1

The record is not found.

Best regards,
Thijs


Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi,

I have been able to access that URL and there is a statement in the document within section 6.5.4 as follows:

"
Software generating the hexadecimal representation of a UUID shall not use upper case letters.
NOTE - It is recommended that the hexadecimal representation used in all human-readable formats be restricted to lower-case letters. Software processing this representation is, however, required to accept both upper and lower case letters as specified in 6.5.2.
"

So it appears that GN should store the UUID in lower case letters but allow users to enter upper case letters. I expect that no-one would want to do this latter action because software can generate UUIDs without having to type anything and UUIDs don't look very human friendly to me. ;--)

John

-----Original Message-----
From: heikki [mailto:tropicano@anonymised.com]
Sent: Monday, 28 September 2009 10:03 PM
To: Francois Prunayre
Cc: Devel geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] Saving UUID's

hi,

discussing with Francois, I've changed SearchManager so that it now uses StandardAnalyzer for indexing UUIDs.

However, closer inspection of the code in LuceneSearcher shows that in no case any Lucene analyzer is applied to search terms. It does some lowercasing here and there, that's probably why many searches do end up being case insensitive.

It's normal practice to apply the same analyzers to search terms and index terms in Lucene applications, in order to make things match. When integrating the NGR LuceneSearcher (with the performance fix) I'll see how to include processing search terms by the correct analyzers, as the code is quite different from the current code in 2.4.x and trunk it makes more sense to do it there.

Kind regards
Heikki Doeleman

On Mon, Sep 28, 2009 at 12:24 PM, heikki <tropicano@anonymised.com> wrote:

it seems to me that saving the UUID in lowercase is "correct", or at least, insignificant, as UUID fields are (case-insensitive) hexadecimal numbers.

See RFC 4122 (http://www.ietf.org/rfc/rfc4122.txt) : "The hexadecimal values "a" through "f" are output as lower case characters and are case insensitive on input."

The X667 recommendation (which CSW 202, OGC 07-006r1 refers to as The Spec for generating UUIDs) is hard to consult as it's downloading at 0 speed for me -- anyone better luck ? I'm trying here: http://www.itu.int/rec/T-REC-X.667-200409-S/en.

If we can't retrieve records without lowercasing the UUID in the request, this seems to be a bug in GN's search mechanism.

If we really would be using StandardAnalyzer at both index time and search time this could not happen, because it includes LowerCaseFilter.

Kind regards
Heikki Doeleman

On Mon, Sep 28, 2009 at 12:04 PM, Francois Prunayre <fx.prunayre@anonymised.com> wrote:

Hi Thijs

2009/9/28 Thijs Brentjens <lists@anonymised.com>:

Hi list,

When performing a GetRecordById request, I found out that GN changes the
(ISO19139) UUID to lower case when saving it to the table Metadata.

If MD has a UUID 0C12204F-5626-4A2E-94F4-514424F093A1 in the ISO19139
encoded XML, it is saved as

0c12204f-5626-4a2e-94f4-514424f093a1
Why is this done?

It was before 2.4 but actually it should use a WhitespaceAnalyzer
instead of a StandardAnalyzer for this field (and some other like
operatesOn)

See SearchManager
_analyzer.addAnalyzer("_uuid", new WhitespaceAnalyzer());

Which version are you using ?

Francois

Now it seems to cause some problems, for example: when requesting for
the original MD, using CSW request like this:

<GetRecordById xmlns="http://www.opengis.net/cat/csw/2.0.2&quot;
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot;
        outputSchema="http://www.isotc211.org/2005/gmd&quot;
        service="CSW"
        version="2.0.2"

xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2
http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd&quot;&gt;
  <Id>0C12204F-5626-4A2E-94F4-514424F093A1</Id>
</GetRecordById>

The record is not found.

Best regards,
Thijs

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

hi John,

yes, that’s my conclusion too. When integrating a performance fix to search, sometime soon, I’ll see to it that this is handled correctly as well (i.e. apply a Lucene LowerCaseFilter, and nothing else, to UUIDs both at index and at query time).

The use of StandardAnalyzer as described below is not quite the correct way to deal with it because although it includes a LowerCaseFilter, it leads to funny differences in strings being tokenized (split up) depending on whether there are any decimal numbers in them or not.

Kind regards
Heikki Doeleman

On Wed, Oct 14, 2009 at 7:16 AM, <John.Hockaday@anonymised.com> wrote:

Hi,

I have been able to access that URL and there is a statement in the document within section 6.5.4 as follows:

"
Software generating the hexadecimal representation of a UUID shall not use upper case letters.
NOTE - It is recommended that the hexadecimal representation used in all human-readable formats be restricted to lower-case letters. Software processing this representation is, however, required to accept both upper and lower case letters as specified in 6.5.2.
"

So it appears that GN should store the UUID in lower case letters but allow users to enter upper case letters. I expect that no-one would want to do this latter action because software can generate UUIDs without having to type anything and UUIDs don’t look very human friendly to me. ;–)

John

-----Original Message-----
From: heikki [mailto:tropicano@anonymised.com…]
Sent: Monday, 28 September 2009 10:03 PM
To: Francois Prunayre
Cc: Devel geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] Saving UUID’s

hi,

discussing with Francois, I’ve changed SearchManager so that it now uses StandardAnalyzer for indexing UUIDs.

However, closer inspection of the code in LuceneSearcher shows that in no case any Lucene analyzer is applied to search terms. It does some lowercasing here and there, that’s probably why many searches do end up being case insensitive.

It’s normal practice to apply the same analyzers to search terms and index terms in Lucene applications, in order to make things match. When integrating the NGR LuceneSearcher (with the performance fix) I’ll see how to include processing search terms by the correct analyzers, as the code is quite different from the current code in 2.4.x and trunk it makes more sense to do it there.

Kind regards
Heikki Doeleman

On Mon, Sep 28, 2009 at 12:24 PM, heikki <tropicano@anonymised.com> wrote:

it seems to me that saving the UUID in lowercase is “correct”, or at least, insignificant, as UUID fields are (case-insensitive) hexadecimal numbers.

See RFC 4122 (http://www.ietf.org/rfc/rfc4122.txt) : “The hexadecimal values “a” through “f” are output as lower case characters and are case insensitive on input.”

The X667 recommendation (which CSW 202, OGC 07-006r1 refers to as The Spec for generating UUIDs) is hard to consult as it’s downloading at 0 speed for me – anyone better luck ? I’m trying here: http://www.itu.int/rec/T-REC-X.667-200409-S/en.

If we can’t retrieve records without lowercasing the UUID in the request, this seems to be a bug in GN’s search mechanism.

If we really would be using StandardAnalyzer at both index time and search time this could not happen, because it includes LowerCaseFilter.

Kind regards
Heikki Doeleman

On Mon, Sep 28, 2009 at 12:04 PM, Francois Prunayre <fx.prunayre@anonymised.com> wrote:

Hi Thijs

2009/9/28 Thijs Brentjens <lists@anonymised.com20…>:

Hi list,

When performing a GetRecordById request, I found out that GN changes the
(ISO19139) UUID to lower case when saving it to the table Metadata.

If MD has a UUID 0C12204F-5626-4A2E-94F4-514424F093A1 in the ISO19139
encoded XML, it is saved as

0c12204f-5626-4a2e-94f4-514424f093a1
Why is this done?

It was before 2.4 but actually it should use a WhitespaceAnalyzer
instead of a StandardAnalyzer for this field (and some other like
operatesOn)

See SearchManager
_analyzer.addAnalyzer(“_uuid”, new WhitespaceAnalyzer());

Which version are you using ?

Francois

Now it seems to cause some problems, for example: when requesting for
the original MD, using CSW request like this:

<GetRecordById xmlns=“http://www.opengis.net/cat/csw/2.0.2
xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance
outputSchema=“http://www.isotc211.org/2005/gmd
service=“CSW”
version=“2.0.2”

xsi:schemaLocation=“http://www.opengis.net/cat/csw/2.0.2
http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd”>
0C12204F-5626-4A2E-94F4-514424F093A1

The record is not found.

Best regards,
Thijs


Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork