[GeoNetwork-users] CSW GetRecords order

I'm trying to get a list of records from a GeoNetwork CSW, but I'm
getting some weird behaviour. A GetRecords request for the first 10
records gives me different responses each time. How am I supposed to
page through to get all the records if the ordering changes between
requests?! (See below) This doesn't happen with 5 other CSW servers
I've been working with, where the order is the same every time.

Of course I could increase the maxResults to get them all in one
request, but that is not a general solution for servers with large
numbers of records. The default and normal CSW way seems to be to get
10 at a time and as far as I can tell this is a bug in GeoNetworks.

David

$ curl -s 'http://scotgovsdi.edina.ac.uk/srv/en/csw?request=GetRecords&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=1
|grep dc:identifier
      <dc:identifier>e4b8e08f-7314-4072-8fda-4b483ac51f6d</dc:identifier>
      <dc:identifier>74af90a1-2871-45ef-92f1-c81c90a6bfd3</dc:identifier>
      <dc:identifier>41999672-9d40-4269-b4cb-3befc98fea3c</dc:identifier>
      <dc:identifier>d9f77d5c-6fc1-4307-ac12-49a4f6dd4696</dc:identifier>
      <dc:identifier>ce258347-51f6-4b29-a9c6-170b468bd463</dc:identifier>
      <dc:identifier>1106ed62-501b-4298-8718-d76e63e46ab1</dc:identifier>
      <dc:identifier>fc784162-31cb-429b-a3c2-7ef99b466c62</dc:identifier>
      <dc:identifier>c29059d5-48ee-4392-92bd-92c9b517ed13</dc:identifier>
      <dc:identifier>e8544752-8d8e-4be4-8fad-68e7e70a90b8</dc:identifier>
      <dc:identifier>3d64078a-3342-4386-99ba-0abb27e9dbaa</dc:identifier>
$ curl -s 'http://scotgovsdi.edina.ac.uk/srv/en/csw?request=GetRecords&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=1
|grep dc:identifier
      <dc:identifier>41999672-9d40-4269-b4cb-3befc98fea3c</dc:identifier>
      <dc:identifier>1106ed62-501b-4298-8718-d76e63e46ab1</dc:identifier>
      <dc:identifier>78a01dba-d88d-4d19-a257-c96ab51155be</dc:identifier>
      <dc:identifier>9d977a73-7884-4870-ae76-afccf8e6fae8</dc:identifier>
      <dc:identifier>3d64078a-3342-4386-99ba-0abb27e9dbaa</dc:identifier>
      <dc:identifier>b8130dd7-014d-40ae-ac82-878f48f86938</dc:identifier>
      <dc:identifier>e4b8e08f-7314-4072-8fda-4b483ac51f6d</dc:identifier>
      <dc:identifier>74af90a1-2871-45ef-92f1-c81c90a6bfd3</dc:identifier>
      <dc:identifier>d9f77d5c-6fc1-4307-ac12-49a4f6dd4696</dc:identifier>
      <dc:identifier>5c68fbb4-d220-4e07-97e8-230933e03774</dc:identifier>

Hi David

I have check the code and GeoNetwork if no sort criteria is provided in the
query, GeoNetwork uses Lucene relevance to sort the results anyway. This
can explain the results you're getting, as relevance is affected by
metadata content and searches.

I'm not an expert in CSW specification, but according to it: *Default
action is to present the records in the order in which they are retrieved*

So probably would be better that GeoNetwork returns the records in the
order that are stored if no sort criteria. You can open a issue in GitHub
to track about this.

Also related to your queries with curl, if you start a search and want to
paginate, you need to grab the JSESSIONID from the first request and reuse
it for the following requests. This makes GeoNetwork to reuse the searcher
and the pagination should be fine. In any case if you want to do multiple
invocations to GeoNetwork services, you need to reuse the JSESSIONID,
otherwise each request will be managed as an independent session.

Regards,

Jose García

On Fri, Jun 27, 2014 at 6:40 PM, David Read <david.read@anonymised.com>
wrote:

I'm trying to get a list of records from a GeoNetwork CSW, but I'm
getting some weird behaviour. A GetRecords request for the first 10
records gives me different responses each time. How am I supposed to
page through to get all the records if the ordering changes between
requests?! (See below) This doesn't happen with 5 other CSW servers
I've been working with, where the order is the same every time.

Of course I could increase the maxResults to get them all in one
request, but that is not a general solution for servers with large
numbers of records. The default and normal CSW way seems to be to get
10 at a time and as far as I can tell this is a bug in GeoNetworks.

David

$ curl -s '
http://scotgovsdi.edina.ac.uk/srv/en/csw?request=GetRecords&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=1
'
|grep dc:identifier
      <dc:identifier>e4b8e08f-7314-4072-8fda-4b483ac51f6d</dc:identifier>
      <dc:identifier>74af90a1-2871-45ef-92f1-c81c90a6bfd3</dc:identifier>
      <dc:identifier>41999672-9d40-4269-b4cb-3befc98fea3c</dc:identifier>
      <dc:identifier>d9f77d5c-6fc1-4307-ac12-49a4f6dd4696</dc:identifier>
      <dc:identifier>ce258347-51f6-4b29-a9c6-170b468bd463</dc:identifier>
      <dc:identifier>1106ed62-501b-4298-8718-d76e63e46ab1</dc:identifier>
      <dc:identifier>fc784162-31cb-429b-a3c2-7ef99b466c62</dc:identifier>
      <dc:identifier>c29059d5-48ee-4392-92bd-92c9b517ed13</dc:identifier>
      <dc:identifier>e8544752-8d8e-4be4-8fad-68e7e70a90b8</dc:identifier>
      <dc:identifier>3d64078a-3342-4386-99ba-0abb27e9dbaa</dc:identifier>
$ curl -s '
http://scotgovsdi.edina.ac.uk/srv/en/csw?request=GetRecords&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=1
'
|grep dc:identifier
      <dc:identifier>41999672-9d40-4269-b4cb-3befc98fea3c</dc:identifier>
      <dc:identifier>1106ed62-501b-4298-8718-d76e63e46ab1</dc:identifier>
      <dc:identifier>78a01dba-d88d-4d19-a257-c96ab51155be</dc:identifier>
      <dc:identifier>9d977a73-7884-4870-ae76-afccf8e6fae8</dc:identifier>
      <dc:identifier>3d64078a-3342-4386-99ba-0abb27e9dbaa</dc:identifier>
      <dc:identifier>b8130dd7-014d-40ae-ac82-878f48f86938</dc:identifier>
      <dc:identifier>e4b8e08f-7314-4072-8fda-4b483ac51f6d</dc:identifier>
      <dc:identifier>74af90a1-2871-45ef-92f1-c81c90a6bfd3</dc:identifier>
      <dc:identifier>d9f77d5c-6fc1-4307-ac12-49a4f6dd4696</dc:identifier>
      <dc:identifier>5c68fbb4-d220-4e07-97e8-230933e03774</dc:identifier>

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--

* GeoCat Bridge for ArcGIS allows instant publishing of data and metadata
on GeoServer and GeoNetwork. Visit http://geocat.net
<http://geocat.net/&gt; for details. _________________________Jose
GarcíaGeoCat bvVeenderweg 13 6721 WD BennekomThe
Netherlandshttp://GeoCat.net/> *

Jose,

Many thanks for this - I'll have a try with these, but it seems a bit
strange to use non-core CSW features to just be able to get all the
records.

So I've created a ticket for consideration as you suggest
https://github.com/geonetwork/core-geonetwork/issues/577

David

On 30 June 2014 07:28, Jose Garcia <jose.garcia@anonymised.com> wrote:

Hi David

I have check the code and GeoNetwork if no sort criteria is provided in the
query, GeoNetwork uses Lucene relevance to sort the results anyway. This can
explain the results you're getting, as relevance is affected by metadata
content and searches.

I'm not an expert in CSW specification, but according to it: Default action
is to present the records in the order in which they are retrieved

So probably would be better that GeoNetwork returns the records in the order
that are stored if no sort criteria. You can open a issue in GitHub to track
about this.

Also related to your queries with curl, if you start a search and want to
paginate, you need to grab the JSESSIONID from the first request and reuse
it for the following requests. This makes GeoNetwork to reuse the searcher
and the pagination should be fine. In any case if you want to do multiple
invocations to GeoNetwork services, you need to reuse the JSESSIONID,
otherwise each request will be managed as an independent session.

Regards,

Jose García

On Fri, Jun 27, 2014 at 6:40 PM, David Read <david.read@anonymised.com>
wrote:

I'm trying to get a list of records from a GeoNetwork CSW, but I'm
getting some weird behaviour. A GetRecords request for the first 10
records gives me different responses each time. How am I supposed to
page through to get all the records if the ordering changes between
requests?! (See below) This doesn't happen with 5 other CSW servers
I've been working with, where the order is the same every time.

Of course I could increase the maxResults to get them all in one
request, but that is not a general solution for servers with large
numbers of records. The default and normal CSW way seems to be to get
10 at a time and as far as I can tell this is a bug in GeoNetworks.

David

$ curl -s
'http://scotgovsdi.edina.ac.uk/srv/en/csw?request=GetRecords&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=1
|grep dc:identifier
      <dc:identifier>e4b8e08f-7314-4072-8fda-4b483ac51f6d</dc:identifier>
      <dc:identifier>74af90a1-2871-45ef-92f1-c81c90a6bfd3</dc:identifier>
      <dc:identifier>41999672-9d40-4269-b4cb-3befc98fea3c</dc:identifier>
      <dc:identifier>d9f77d5c-6fc1-4307-ac12-49a4f6dd4696</dc:identifier>
      <dc:identifier>ce258347-51f6-4b29-a9c6-170b468bd463</dc:identifier>
      <dc:identifier>1106ed62-501b-4298-8718-d76e63e46ab1</dc:identifier>
      <dc:identifier>fc784162-31cb-429b-a3c2-7ef99b466c62</dc:identifier>
      <dc:identifier>c29059d5-48ee-4392-92bd-92c9b517ed13</dc:identifier>
      <dc:identifier>e8544752-8d8e-4be4-8fad-68e7e70a90b8</dc:identifier>
      <dc:identifier>3d64078a-3342-4386-99ba-0abb27e9dbaa</dc:identifier>
$ curl -s
'http://scotgovsdi.edina.ac.uk/srv/en/csw?request=GetRecords&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=1
|grep dc:identifier
      <dc:identifier>41999672-9d40-4269-b4cb-3befc98fea3c</dc:identifier>
      <dc:identifier>1106ed62-501b-4298-8718-d76e63e46ab1</dc:identifier>
      <dc:identifier>78a01dba-d88d-4d19-a257-c96ab51155be</dc:identifier>
      <dc:identifier>9d977a73-7884-4870-ae76-afccf8e6fae8</dc:identifier>
      <dc:identifier>3d64078a-3342-4386-99ba-0abb27e9dbaa</dc:identifier>
      <dc:identifier>b8130dd7-014d-40ae-ac82-878f48f86938</dc:identifier>
      <dc:identifier>e4b8e08f-7314-4072-8fda-4b483ac51f6d</dc:identifier>
      <dc:identifier>74af90a1-2871-45ef-92f1-c81c90a6bfd3</dc:identifier>
      <dc:identifier>d9f77d5c-6fc1-4307-ac12-49a4f6dd4696</dc:identifier>
      <dc:identifier>5c68fbb4-d220-4e07-97e8-230933e03774</dc:identifier>

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community
Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on
GeoServer and GeoNetwork. Visit http://geocat.net for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net

Hi David

Sure, I understand the issue.

This only should happen with the relevance sorting (that is the default in
GeoNetwork) I think, as relevance is changed afaik by other searches. I
guess the best is to not sort by relevance by default (at least in CSW
server requests)

GeoNetwork creates a session and if you do a search, the searcher is stored
in the session. That way the pagination should be consistent for a search.

Regards,
Jose García

On Mon, Jun 30, 2014 at 1:20 PM, David Read <david.read@anonymised.com>
wrote:

Jose,

Many thanks for this - I'll have a try with these, but it seems a bit
strange to use non-core CSW features to just be able to get all the
records.

So I've created a ticket for consideration as you suggest
https://github.com/geonetwork/core-geonetwork/issues/577

David

On 30 June 2014 07:28, Jose Garcia <jose.garcia@anonymised.com> wrote:
> Hi David
>
> I have check the code and GeoNetwork if no sort criteria is provided in
the
> query, GeoNetwork uses Lucene relevance to sort the results anyway. This
can
> explain the results you're getting, as relevance is affected by metadata
> content and searches.
>
> I'm not an expert in CSW specification, but according to it: Default
action
> is to present the records in the order in which they are retrieved
>
> So probably would be better that GeoNetwork returns the records in the
order
> that are stored if no sort criteria. You can open a issue in GitHub to
track
> about this.
>
>
> Also related to your queries with curl, if you start a search and want to
> paginate, you need to grab the JSESSIONID from the first request and
reuse
> it for the following requests. This makes GeoNetwork to reuse the
searcher
> and the pagination should be fine. In any case if you want to do multiple
> invocations to GeoNetwork services, you need to reuse the JSESSIONID,
> otherwise each request will be managed as an independent session.
>
> Regards,
>
> Jose García
>
>
>
>
>
>
> On Fri, Jun 27, 2014 at 6:40 PM, David Read <
david.read@anonymised.com>
> wrote:
>>
>> I'm trying to get a list of records from a GeoNetwork CSW, but I'm
>> getting some weird behaviour. A GetRecords request for the first 10
>> records gives me different responses each time. How am I supposed to
>> page through to get all the records if the ordering changes between
>> requests?! (See below) This doesn't happen with 5 other CSW servers
>> I've been working with, where the order is the same every time.
>>
>> Of course I could increase the maxResults to get them all in one
>> request, but that is not a general solution for servers with large
>> numbers of records. The default and normal CSW way seems to be to get
>> 10 at a time and as far as I can tell this is a bug in GeoNetworks.
>>
>> David
>>
>> $ curl -s
>> '
http://scotgovsdi.edina.ac.uk/srv/en/csw?request=GetRecords&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=1
'
>> |grep dc:identifier
>>
<dc:identifier>e4b8e08f-7314-4072-8fda-4b483ac51f6d</dc:identifier>
>>
<dc:identifier>74af90a1-2871-45ef-92f1-c81c90a6bfd3</dc:identifier>
>>
<dc:identifier>41999672-9d40-4269-b4cb-3befc98fea3c</dc:identifier>
>>
<dc:identifier>d9f77d5c-6fc1-4307-ac12-49a4f6dd4696</dc:identifier>
>>
<dc:identifier>ce258347-51f6-4b29-a9c6-170b468bd463</dc:identifier>
>>
<dc:identifier>1106ed62-501b-4298-8718-d76e63e46ab1</dc:identifier>
>>
<dc:identifier>fc784162-31cb-429b-a3c2-7ef99b466c62</dc:identifier>
>>
<dc:identifier>c29059d5-48ee-4392-92bd-92c9b517ed13</dc:identifier>
>>
<dc:identifier>e8544752-8d8e-4be4-8fad-68e7e70a90b8</dc:identifier>
>>
<dc:identifier>3d64078a-3342-4386-99ba-0abb27e9dbaa</dc:identifier>
>> $ curl -s
>> '
http://scotgovsdi.edina.ac.uk/srv/en/csw?request=GetRecords&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=1
'
>> |grep dc:identifier
>>
<dc:identifier>41999672-9d40-4269-b4cb-3befc98fea3c</dc:identifier>
>>
<dc:identifier>1106ed62-501b-4298-8718-d76e63e46ab1</dc:identifier>
>>
<dc:identifier>78a01dba-d88d-4d19-a257-c96ab51155be</dc:identifier>
>>
<dc:identifier>9d977a73-7884-4870-ae76-afccf8e6fae8</dc:identifier>
>>
<dc:identifier>3d64078a-3342-4386-99ba-0abb27e9dbaa</dc:identifier>
>>
<dc:identifier>b8130dd7-014d-40ae-ac82-878f48f86938</dc:identifier>
>>
<dc:identifier>e4b8e08f-7314-4072-8fda-4b483ac51f6d</dc:identifier>
>>
<dc:identifier>74af90a1-2871-45ef-92f1-c81c90a6bfd3</dc:identifier>
>>
<dc:identifier>d9f77d5c-6fc1-4307-ac12-49a4f6dd4696</dc:identifier>
>>
<dc:identifier>5c68fbb4-d220-4e07-97e8-230933e03774</dc:identifier>
>>
>>
>>
------------------------------------------------------------------------------
>> Open source business process management suite built on Java and Eclipse
>> Turn processes into business applications with Bonita BPM Community
>> Edition
>> Quickly connect people, data, and systems into organized workflows
>> Winner of BOSSIE, CODIE, OW2 and Gartner awards
>> http://p.sf.net/sfu/Bonitasoft
>> _______________________________________________
>> GeoNetwork-users mailing list
>> GeoNetwork-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/geonetwork-users
>> GeoNetwork OpenSource is maintained at
>> http://sourceforge.net/projects/geonetwork
>
>
>
>
> --
> GeoCat Bridge for ArcGIS allows instant publishing of data and metadata
on
> GeoServer and GeoNetwork. Visit http://geocat.net for details.
> _________________________
> Jose García
> GeoCat bv
> Veenderweg 13
> 6721 WD Bennekom
> The Netherlands
> http://GeoCat.net
>

--

* GeoCat Bridge for ArcGIS allows instant publishing of data and metadata
on GeoServer and GeoNetwork. Visit http://geocat.net
<http://geocat.net/&gt; for details. _________________________Jose
GarcíaGeoCat bvVeenderweg 13 6721 WD BennekomThe
Netherlandshttp://GeoCat.net/> *