[GeoNetwork-devel] Problems using the OAI-PMH Harvester - resumptionToken format not understood

Hi All,

We are currently having some problems harvesting from an OAI-PMH server
based at the Australian Antartic Division
(http://services.aad.gov.au/oai/provider)

When harvesting iso-mcp records from this server we get the following
error:

    <error id="operation-aborted">
      <message>Raised exception when searching</message>
      <class>OperationAbortedEx</class>
      <stack>
        <at
class="org.fao.geonet.kernel.harvest.harvester.oaipmh.Harvester"
file="Harvester.java" line="170" method="search" />
        <at
class="org.fao.geonet.kernel.harvest.harvester.oaipmh.Harvester"
file="Harvester.java" line="103" method="harvest" />
        <at
class="org.fao.geonet.kernel.harvest.harvester.oaipmh.OaiPmhHarvester"
file="OaiPmhHarvester.java" line="217" method="doHarvest" />
        <at
class="org.fao.geonet.kernel.harvest.harvester.AbstractHarvester
$HarvestWithIndexProcessor" file="AbstractHarvester.java" line="371"
method="process" />
        <at class="org.fao.geonet.kernel.MetadataIndexerProcessor"
file="MetadataIndexerProcessor.java" line="39"
method="processWithFastIndexing" />
        <at
class="org.fao.geonet.kernel.harvest.harvester.AbstractHarvester"
file="AbstractHarvester.java" line="398" method="harvest" />
        <at class="org.fao.geonet.kernel.harvest.harvester.Executor"
file="Executor.java" line="87" method="run" />
      </stack>
      <object>BadResumptionTokenException: code=badResumptionToken,
message=The 'resumptionToken' argument is unrecognizable</object>
    </error>

The resumption token being returned is:

    <resumptionToken completeListSize="1147"
cursor="0">0/300/1147/iso-mcp/null/null/null</resumptionToken>

However, the OAI-PMH harvester seems to be expecting a resumptionToken
in a different format:

  private void parseToken(String strToken) throws
BadResumptionTokenException {

    String temp = strToken.split(SEPARATOR);

    if (temp.length != 6)
      throw new BadResumptionTokenException("unknown resumptionToken
format: "+strToken);

    set = temp[0];
    prefix = temp[1];
    from = temp[2] ;
    until = temp[3] ;
    randomid = temp[4];

    pos = Integer.parseInt( temp[5] );
  }

Where the separator is '-'.

Looking at the OAI-PMH spec at
http://www.openarchives.org/OAI/openarchivesprotocol.html
I can't see any reference to how the resumptionToken should be
formatted.

Harvesting from this server in previous versions worked fine because the
harvester did not rely on a specific format for the resumption token.

Can some one clarify why harvesting from this server no longer works?
Is a specific format for the resumptionToken required and if so where is
this mandated?

Please note that I'm running the harvester in the BlueNetMEST 1.4.2, but
the code is now the same in geonetwork trunk.

Thanks,

--
Craig Jones
eMII Infrastructure Programmer
IMOS e-Marine Information Infrastructure Facility (eMII)
Ph: +61 3 6226 8567

Hi Craig,

I would suggest you to have a look at this commit detail [1] and at the associated ticket [2] for more details about the OAI-PMH changes.
However, I think Timo Pröscholdt would be more helpful than I to explain you this particular change about the resumptionToken format expected.

HTH,

Cheers,

Mathieu

[1] http://geonetwork.svn.sourceforge.net/viewvc/geonetwork?view=revision&revision=6221
[2] http://trac.osgeo.org/geonetwork/ticket/242

On Wed, Dec 15, 2010 at 8:21 AM, Craig Jones <jonescc@anonymised.com> wrote:

Hi All,

We are currently having some problems harvesting from an OAI-PMH server
based at the Australian Antartic Division
(http://services.aad.gov.au/oai/provider)

When harvesting iso-mcp records from this server we get the following
error:

Raised exception when searching OperationAbortedEx BadResumptionTokenException: code=badResumptionToken, message=The 'resumptionToken' argument is unrecognizable

The resumption token being returned is:

0/300/1147/iso-mcp/null/null/null

However, the OAI-PMH harvester seems to be expecting a resumptionToken
in a different format:

private void parseToken(String strToken) throws
BadResumptionTokenException {

String temp = strToken.split(SEPARATOR);

if (temp.length != 6)
throw new BadResumptionTokenException("unknown resumptionToken
format: "+strToken);

set = temp[0];
prefix = temp[1];
from = temp[2] ;
until = temp[3] ;
randomid = temp[4];

pos = Integer.parseInt( temp[5] );
}

Where the separator is ‘-’.

Looking at the OAI-PMH spec at
http://www.openarchives.org/OAI/openarchivesprotocol.html
I can’t see any reference to how the resumptionToken should be
formatted.

Harvesting from this server in previous versions worked fine because the
harvester did not rely on a specific format for the resumption token.

Can some one clarify why harvesting from this server no longer works?
Is a specific format for the resumptionToken required and if so where is
this mandated?

Please note that I’m running the harvester in the BlueNetMEST 1.4.2, but
the code is now the same in geonetwork trunk.

Thanks,


Craig Jones
eMII Infrastructure Programmer
IMOS e-Marine Information Infrastructure Facility (eMII)
Ph: +61 3 6226 8567


Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi all,

I had the same problem in connecting to an external OAI-PMH service.

The commit Mathieu is referring to is imposing a structure to the
resumptionToken format in GeoNetwork, while in
   http://www.openarchives.org/OAI/openarchivesprotocol.html#FlowControl
it's stated that
    "The format of the resumptionToken is not defined by the OAI-PMH
    and should be considered opaque by the harvester."

The problem here is that the same class (ResumptionToken) is used for both the
server OAI service -- where the token has a defined structure in order to
allow GeoNetwork understand in which point the last request was left --
and the OAI client -- where the token should be free form but is forced to the
defined structure. It only works if you have GeoNetwork instances at both
connection ends.

I have a patch for this problem but I have to test it a little more.

   Cheers,
   Emanuele

Alle 09:17:50 di mercoledì 15 dicembre 2010, Mathieu Coudert ha scritto:

Hi Craig,

I would suggest you to have a look at this commit detail [1] and at the
associated ticket [2] for more details about the OAI-PMH changes.
However, I think Timo Pröscholdt would be more helpful than I to explain
you this particular change about the resumptionToken format expected.

HTH,

Cheers,

Mathieu

[1]
http://geonetwork.svn.sourceforge.net/viewvc/geonetwork?view=revision&revis
ion=6221 [2] http://trac.osgeo.org/geonetwork/ticket/242

On Wed, Dec 15, 2010 at 8:21 AM, Craig Jones <jonescc@anonymised.com> wrote:
> Hi All,
>
> We are currently having some problems harvesting from an OAI-PMH server
> based at the Australian Antartic Division
> (http://services.aad.gov.au/oai/provider)
>
> When harvesting iso-mcp records from this server we get the following
> error:
>
> <error id="operation-aborted">
> <message>Raised exception when searching</message>
> <class>OperationAbortedEx</class>
> <stack>
> <at
> class="org.fao.geonet.kernel.harvest.harvester.oaipmh.Harvester"
> file="Harvester.java" line="170" method="search" />
> <at
> class="org.fao.geonet.kernel.harvest.harvester.oaipmh.Harvester"
> file="Harvester.java" line="103" method="harvest" />
> <at
> class="org.fao.geonet.kernel.harvest.harvester.oaipmh.OaiPmhHarvester"
> file="OaiPmhHarvester.java" line="217" method="doHarvest" />
> <at
> class="org.fao.geonet.kernel.harvest.harvester.AbstractHarvester
> $HarvestWithIndexProcessor" file="AbstractHarvester.java" line="371"
> method="process" />
> <at class="org.fao.geonet.kernel.MetadataIndexerProcessor"
> file="MetadataIndexerProcessor.java" line="39"
> method="processWithFastIndexing" />
> <at
> class="org.fao.geonet.kernel.harvest.harvester.AbstractHarvester"
> file="AbstractHarvester.java" line="398" method="harvest" />
> <at class="org.fao.geonet.kernel.harvest.harvester.Executor"
> file="Executor.java" line="87" method="run" />
> </stack>
> <object>BadResumptionTokenException: code=badResumptionToken,
> message=The 'resumptionToken' argument is unrecognizable</object>
> </error>
>
> The resumption token being returned is:
>
> <resumptionToken completeListSize="1147"
> cursor="0">0/300/1147/iso-mcp/null/null/null</resumptionToken>
>
> However, the OAI-PMH harvester seems to be expecting a resumptionToken
> in a different format:
>
> private void parseToken(String strToken) throws
> BadResumptionTokenException {
>
> String temp = strToken.split(SEPARATOR);
>
> if (temp.length != 6)
> throw new BadResumptionTokenException("unknown
> resumptionToken
> format: "+strToken);
>
> set = temp[0];
> prefix = temp[1];
> from = temp[2] ;
> until = temp[3] ;
> randomid = temp[4];
>
> pos = Integer.parseInt( temp[5] );
> }
>
> Where the separator is '-'.
>
> Looking at the OAI-PMH spec at
> http://www.openarchives.org/OAI/openarchivesprotocol.html
> I can't see any reference to how the resumptionToken should be
> formatted.
>
> Harvesting from this server in previous versions worked fine because the
> harvester did not rely on a specific format for the resumption token.
>
> Can some one clarify why harvesting from this server no longer works?
> Is a specific format for the resumptionToken required and if so where is
> this mandated?
>
> Please note that I'm running the harvester in the BlueNetMEST 1.4.2, but
> the code is now the same in geonetwork trunk.
>
>
>
> Thanks,
>
>
> --
> Craig Jones
> eMII Infrastructure Programmer
> IMOS e-Marine Information Infrastructure Facility (eMII)
> Ph: +61 3 6226 8567

-------------------------------------------------------
Ing. Emanuele Tajariol
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://twitter.com/geosolutions_it
http://it.linkedin.com/in/etajariol
-------------------------------------------------------

Hi,

the issue of the OAI resumption token is trickier than one might expect,
as it touches on the issue of caching and performance.

On the one hand, one could chose to have an entierly random token, plus a number suffix used for paginating. The server then has to store at least a reference mapping the random token to what the client originally requested, to support pagination in big result sets.

On the other hand, one would probbaly not like this, as there is a lot of redundancy. Two clients harvesting the same redcords would need two result sets. So by coding some structure into the token, such as the metadataformat and other information on what the client requested, one can avoid this duplication. In fact, by doing this, a cache becomes entierly unnecessary, since the state is entierly stored on the client.

In terms of implementation. For speed I opted for an implemnetation with a cache, but for optimization reasons I chose to only store one reference for one search (two clients harvesting the same stuff only have one cache entry).
The question is thus what one should use as separator. I stupedly opted for "-" (taken from joai), forgetting that one would like to use "-" in categories, called sets in OAI language.
A patch switched this to "/", which seems more fair.

As to the OAI harvester component, which is separate from the OAI provider code, I did not work on this and there are no hardcoded references, at least that I know.

I should say that it requires a lot of attention though, as discussed in my email
http://osgeo-org.1803224.n2.nabble.com/OAI-PMH-support-for-deletions-td5693638.html

It does about to say that this applies to the OAI provider, too.

I have heard that a big European Weatherservice has contracted out some improvements to OAI in a spinoff and I hope that the changes will eventually be merged into the main trunk.

best
Timo

Le 14.02.2011 22:44, Emanuele Tajariol a écrit :

Hi all,

I had the same problem in connecting to an external OAI-PMH service.

The commit Mathieu is referring to is imposing a structure to the
resumptionToken format in GeoNetwork, while in
    http://www.openarchives.org/OAI/openarchivesprotocol.html#FlowControl
it's stated that
     "The format of the resumptionToken is not defined by the OAI-PMH
     and should be considered opaque by the harvester."

The problem here is that the same class (ResumptionToken) is used for both the
server OAI service -- where the token has a defined structure in order to
allow GeoNetwork understand in which point the last request was left --
and the OAI client -- where the token should be free form but is forced to the
defined structure. It only works if you have GeoNetwork instances at both
connection ends.

I have a patch for this problem but I have to test it a little more.

    Cheers,
    Emanuele

Alle 09:17:50 di mercoledì 15 dicembre 2010, Mathieu Coudert ha scritto:

Hi Craig,

I would suggest you to have a look at this commit detail [1] and at the
associated ticket [2] for more details about the OAI-PMH changes.
However, I think Timo Pröscholdt would be more helpful than I to explain
you this particular change about the resumptionToken format expected.

HTH,

Cheers,

Mathieu

[1]
http://geonetwork.svn.sourceforge.net/viewvc/geonetwork?view=revision&revis
ion=6221 [2] http://trac.osgeo.org/geonetwork/ticket/242

On Wed, Dec 15, 2010 at 8:21 AM, Craig Jones<jonescc@anonymised.com> wrote:

Hi All,

We are currently having some problems harvesting from an OAI-PMH server
based at the Australian Antartic Division
(http://services.aad.gov.au/oai/provider)

When harvesting iso-mcp records from this server we get the following
error:

    <error id="operation-aborted">
      <message>Raised exception when searching</message>
      <class>OperationAbortedEx</class>
      <stack>
        <at
class="org.fao.geonet.kernel.harvest.harvester.oaipmh.Harvester"
file="Harvester.java" line="170" method="search" />
        <at
class="org.fao.geonet.kernel.harvest.harvester.oaipmh.Harvester"
file="Harvester.java" line="103" method="harvest" />
        <at
class="org.fao.geonet.kernel.harvest.harvester.oaipmh.OaiPmhHarvester"
file="OaiPmhHarvester.java" line="217" method="doHarvest" />
        <at
class="org.fao.geonet.kernel.harvest.harvester.AbstractHarvester
$HarvestWithIndexProcessor" file="AbstractHarvester.java" line="371"
method="process" />
        <at class="org.fao.geonet.kernel.MetadataIndexerProcessor"
file="MetadataIndexerProcessor.java" line="39"
method="processWithFastIndexing" />
        <at
class="org.fao.geonet.kernel.harvest.harvester.AbstractHarvester"
file="AbstractHarvester.java" line="398" method="harvest" />
        <at class="org.fao.geonet.kernel.harvest.harvester.Executor"
file="Executor.java" line="87" method="run" />
      </stack>
      <object>BadResumptionTokenException: code=badResumptionToken,
message=The 'resumptionToken' argument is unrecognizable</object>
    </error>

The resumption token being returned is:

    <resumptionToken completeListSize="1147"
cursor="0">0/300/1147/iso-mcp/null/null/null</resumptionToken>

However, the OAI-PMH harvester seems to be expecting a resumptionToken
in a different format:

        private void parseToken(String strToken) throws
BadResumptionTokenException {

                String temp = strToken.split(SEPARATOR);

                if (temp.length != 6)
                        throw new BadResumptionTokenException("unknown
resumptionToken
format: "+strToken);

                set = temp[0];
                prefix = temp[1];
                from = temp[2] ;
                until = temp[3] ;
                randomid = temp[4];

                pos = Integer.parseInt( temp[5] );
        }

Where the separator is '-'.

Looking at the OAI-PMH spec at
http://www.openarchives.org/OAI/openarchivesprotocol.html
I can't see any reference to how the resumptionToken should be
formatted.

Harvesting from this server in previous versions worked fine because the
harvester did not rely on a specific format for the resumption token.

Can some one clarify why harvesting from this server no longer works?
Is a specific format for the resumptionToken required and if so where is
this mandated?

Please note that I'm running the harvester in the BlueNetMEST 1.4.2, but
the code is now the same in geonetwork trunk.

Thanks,

--
Craig Jones
eMII Infrastructure Programmer
IMOS e-Marine Information Infrastructure Facility (eMII)
Ph: +61 3 6226 8567

-------------------------------------------------------
Ing. Emanuele Tajariol
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://twitter.com/geosolutions_it
http://it.linkedin.com/in/etajariol
-------------------------------------------------------

--
www.xenophily.org - attraction to foreign peoples, cultures, or customs

Hi Emanuele,

I came to the same conclusion (later confirmed by Timo) and applied a
fix for this issue to the BlueNetMEST late last year.

This fix was applied to trunk in revision 7189 three weeks ago.

Cheers,

--
Craig Jones
eMII Infrastructure Programmer
IMOS e-Marine Information Infrastructure Facility (eMII)
Ph: +61 3 6226 8567

On Mon, 2011-02-14 at 22:44 +0100, Emanuele Tajariol wrote:

Hi all,

I had the same problem in connecting to an external OAI-PMH service.

The commit Mathieu is referring to is imposing a structure to the
resumptionToken format in GeoNetwork, while in
   http://www.openarchives.org/OAI/openarchivesprotocol.html#FlowControl
it's stated that
    "The format of the resumptionToken is not defined by the OAI-PMH
    and should be considered opaque by the harvester."

The problem here is that the same class (ResumptionToken) is used for both the
server OAI service -- where the token has a defined structure in order to
allow GeoNetwork understand in which point the last request was left --
and the OAI client -- where the token should be free form but is forced to the
defined structure. It only works if you have GeoNetwork instances at both
connection ends.

I have a patch for this problem but I have to test it a little more.

   Cheers,
   Emanuele

Alle 09:17:50 di mercoledì 15 dicembre 2010, Mathieu Coudert ha scritto:
> Hi Craig,
>
> I would suggest you to have a look at this commit detail [1] and at the
> associated ticket [2] for more details about the OAI-PMH changes.
> However, I think Timo Pröscholdt would be more helpful than I to explain
> you this particular change about the resumptionToken format expected.
>
> HTH,
>
> Cheers,
>
> Mathieu
>
> [1]
> http://geonetwork.svn.sourceforge.net/viewvc/geonetwork?view=revision&revis
>ion=6221 [2] http://trac.osgeo.org/geonetwork/ticket/242
>
> On Wed, Dec 15, 2010 at 8:21 AM, Craig Jones <jonescc@anonymised.com> wrote:
> > Hi All,
> >
> > We are currently having some problems harvesting from an OAI-PMH server
> > based at the Australian Antartic Division
> > (http://services.aad.gov.au/oai/provider)
> >
> > When harvesting iso-mcp records from this server we get the following
> > error:
> >
> > <error id="operation-aborted">
> > <message>Raised exception when searching</message>
> > <class>OperationAbortedEx</class>
> > <stack>
> > <at
> > class="org.fao.geonet.kernel.harvest.harvester.oaipmh.Harvester"
> > file="Harvester.java" line="170" method="search" />
> > <at
> > class="org.fao.geonet.kernel.harvest.harvester.oaipmh.Harvester"
> > file="Harvester.java" line="103" method="harvest" />
> > <at
> > class="org.fao.geonet.kernel.harvest.harvester.oaipmh.OaiPmhHarvester"
> > file="OaiPmhHarvester.java" line="217" method="doHarvest" />
> > <at
> > class="org.fao.geonet.kernel.harvest.harvester.AbstractHarvester
> > $HarvestWithIndexProcessor" file="AbstractHarvester.java" line="371"
> > method="process" />
> > <at class="org.fao.geonet.kernel.MetadataIndexerProcessor"
> > file="MetadataIndexerProcessor.java" line="39"
> > method="processWithFastIndexing" />
> > <at
> > class="org.fao.geonet.kernel.harvest.harvester.AbstractHarvester"
> > file="AbstractHarvester.java" line="398" method="harvest" />
> > <at class="org.fao.geonet.kernel.harvest.harvester.Executor"
> > file="Executor.java" line="87" method="run" />
> > </stack>
> > <object>BadResumptionTokenException: code=badResumptionToken,
> > message=The 'resumptionToken' argument is unrecognizable</object>
> > </error>
> >
> > The resumption token being returned is:
> >
> > <resumptionToken completeListSize="1147"
> > cursor="0">0/300/1147/iso-mcp/null/null/null</resumptionToken>
> >
> > However, the OAI-PMH harvester seems to be expecting a resumptionToken
> > in a different format:
> >
> > private void parseToken(String strToken) throws
> > BadResumptionTokenException {
> >
> > String temp = strToken.split(SEPARATOR);
> >
> > if (temp.length != 6)
> > throw new BadResumptionTokenException("unknown
> > resumptionToken
> > format: "+strToken);
> >
> > set = temp[0];
> > prefix = temp[1];
> > from = temp[2] ;
> > until = temp[3] ;
> > randomid = temp[4];
> >
> > pos = Integer.parseInt( temp[5] );
> > }
> >
> > Where the separator is '-'.
> >
> > Looking at the OAI-PMH spec at
> > http://www.openarchives.org/OAI/openarchivesprotocol.html
> > I can't see any reference to how the resumptionToken should be
> > formatted.
> >
> > Harvesting from this server in previous versions worked fine because the
> > harvester did not rely on a specific format for the resumption token.
> >
> > Can some one clarify why harvesting from this server no longer works?
> > Is a specific format for the resumptionToken required and if so where is
> > this mandated?
> >
> > Please note that I'm running the harvester in the BlueNetMEST 1.4.2, but
> > the code is now the same in geonetwork trunk.
> >
> >
> >
> > Thanks,
> >
> >
> > --
> > Craig Jones
> > eMII Infrastructure Programmer
> > IMOS e-Marine Information Infrastructure Facility (eMII)
> > Ph: +61 3 6226 8567

-------------------------------------------------------
Ing. Emanuele Tajariol
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://twitter.com/geosolutions_it
http://it.linkedin.com/in/etajariol
-------------------------------------------------------

Hi Craig,

thank you for the pointer.
I applied the patch to the 2.6.x branch and it works fine there also.

   Cheers,
   Emanuele

Alle 01:22:39 di martedì 15 febbraio 2011, Craig Jones ha scritto:

Hi Emanuele,

I came to the same conclusion (later confirmed by Timo) and applied a
fix for this issue to the BlueNetMEST late last year.

This fix was applied to trunk in revision 7189 three weeks ago.

Cheers,

-------------------------------------------------------
Ing. Emanuele Tajariol
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313
mob: +39 3477895230

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://twitter.com/geosolutions_it
http://it.linkedin.com/in/etajariol
-------------------------------------------------------