Hi,
the issue of the OAI resumption token is trickier than one might expect,
as it touches on the issue of caching and performance.
On the one hand, one could chose to have an entierly random token, plus a number suffix used for paginating. The server then has to store at least a reference mapping the random token to what the client originally requested, to support pagination in big result sets.
On the other hand, one would probbaly not like this, as there is a lot of redundancy. Two clients harvesting the same redcords would need two result sets. So by coding some structure into the token, such as the metadataformat and other information on what the client requested, one can avoid this duplication. In fact, by doing this, a cache becomes entierly unnecessary, since the state is entierly stored on the client.
In terms of implementation. For speed I opted for an implemnetation with a cache, but for optimization reasons I chose to only store one reference for one search (two clients harvesting the same stuff only have one cache entry).
The question is thus what one should use as separator. I stupedly opted for "-" (taken from joai), forgetting that one would like to use "-" in categories, called sets in OAI language.
A patch switched this to "/", which seems more fair.
As to the OAI harvester component, which is separate from the OAI provider code, I did not work on this and there are no hardcoded references, at least that I know.
I should say that it requires a lot of attention though, as discussed in my email
http://osgeo-org.1803224.n2.nabble.com/OAI-PMH-support-for-deletions-td5693638.html
It does about to say that this applies to the OAI provider, too.
I have heard that a big European Weatherservice has contracted out some improvements to OAI in a spinoff and I hope that the changes will eventually be merged into the main trunk.
best
Timo
Le 14.02.2011 22:44, Emanuele Tajariol a écrit :
Hi all,
I had the same problem in connecting to an external OAI-PMH service.
The commit Mathieu is referring to is imposing a structure to the
resumptionToken format in GeoNetwork, while in
http://www.openarchives.org/OAI/openarchivesprotocol.html#FlowControl
it's stated that
"The format of the resumptionToken is not defined by the OAI-PMH
and should be considered opaque by the harvester."
The problem here is that the same class (ResumptionToken) is used for both the
server OAI service -- where the token has a defined structure in order to
allow GeoNetwork understand in which point the last request was left --
and the OAI client -- where the token should be free form but is forced to the
defined structure. It only works if you have GeoNetwork instances at both
connection ends.
I have a patch for this problem but I have to test it a little more.
Cheers,
Emanuele
Alle 09:17:50 di mercoledì 15 dicembre 2010, Mathieu Coudert ha scritto:
Hi Craig,
I would suggest you to have a look at this commit detail [1] and at the
associated ticket [2] for more details about the OAI-PMH changes.
However, I think Timo Pröscholdt would be more helpful than I to explain
you this particular change about the resumptionToken format expected.
HTH,
Cheers,
Mathieu
[1]
http://geonetwork.svn.sourceforge.net/viewvc/geonetwork?view=revision&revis
ion=6221 [2] http://trac.osgeo.org/geonetwork/ticket/242
On Wed, Dec 15, 2010 at 8:21 AM, Craig Jones<jonescc@anonymised.com> wrote:
Hi All,
We are currently having some problems harvesting from an OAI-PMH server
based at the Australian Antartic Division
(http://services.aad.gov.au/oai/provider)
When harvesting iso-mcp records from this server we get the following
error:
<error id="operation-aborted">
<message>Raised exception when searching</message>
<class>OperationAbortedEx</class>
<stack>
<at
class="org.fao.geonet.kernel.harvest.harvester.oaipmh.Harvester"
file="Harvester.java" line="170" method="search" />
<at
class="org.fao.geonet.kernel.harvest.harvester.oaipmh.Harvester"
file="Harvester.java" line="103" method="harvest" />
<at
class="org.fao.geonet.kernel.harvest.harvester.oaipmh.OaiPmhHarvester"
file="OaiPmhHarvester.java" line="217" method="doHarvest" />
<at
class="org.fao.geonet.kernel.harvest.harvester.AbstractHarvester
$HarvestWithIndexProcessor" file="AbstractHarvester.java" line="371"
method="process" />
<at class="org.fao.geonet.kernel.MetadataIndexerProcessor"
file="MetadataIndexerProcessor.java" line="39"
method="processWithFastIndexing" />
<at
class="org.fao.geonet.kernel.harvest.harvester.AbstractHarvester"
file="AbstractHarvester.java" line="398" method="harvest" />
<at class="org.fao.geonet.kernel.harvest.harvester.Executor"
file="Executor.java" line="87" method="run" />
</stack>
<object>BadResumptionTokenException: code=badResumptionToken,
message=The 'resumptionToken' argument is unrecognizable</object>
</error>
The resumption token being returned is:
<resumptionToken completeListSize="1147"
cursor="0">0/300/1147/iso-mcp/null/null/null</resumptionToken>
However, the OAI-PMH harvester seems to be expecting a resumptionToken
in a different format:
private void parseToken(String strToken) throws
BadResumptionTokenException {
String temp = strToken.split(SEPARATOR);
if (temp.length != 6)
throw new BadResumptionTokenException("unknown
resumptionToken
format: "+strToken);
set = temp[0];
prefix = temp[1];
from = temp[2] ;
until = temp[3] ;
randomid = temp[4];
pos = Integer.parseInt( temp[5] );
}
Where the separator is '-'.
Looking at the OAI-PMH spec at
http://www.openarchives.org/OAI/openarchivesprotocol.html
I can't see any reference to how the resumptionToken should be
formatted.
Harvesting from this server in previous versions worked fine because the
harvester did not rely on a specific format for the resumption token.
Can some one clarify why harvesting from this server no longer works?
Is a specific format for the resumptionToken required and if so where is
this mandated?
Please note that I'm running the harvester in the BlueNetMEST 1.4.2, but
the code is now the same in geonetwork trunk.
Thanks,
--
Craig Jones
eMII Infrastructure Programmer
IMOS e-Marine Information Infrastructure Facility (eMII)
Ph: +61 3 6226 8567
-------------------------------------------------------
Ing. Emanuele Tajariol
Senior Software Engineer
GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584962313
fax: +39 0584962313
http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://twitter.com/geosolutions_it
http://it.linkedin.com/in/etajariol
-------------------------------------------------------
--
www.xenophily.org - attraction to foreign peoples, cultures, or customs