[GeoNetwork-devel] harvesting from multiple urls with a single harvest entry

Is it possible to tell the harvester to harvest from multiple urls with a
single harvest entry? I was thinking something like using the new metadata
relation feature to specify child records linked to a single parent?
Although I can't find any information on how the new feature relates to the
harvester (maybe it involved no harvester changes?).
--
View this message in context: http://osgeo-org.1803224.n2.nabble.com/harvesting-from-multiple-urls-with-a-single-harvest-entry-tp5680847p5680847.html
Sent from the GeoNetwork developer mailing list archive at Nabble.com.

I was thinking about this further (and googling) and wondering, do I need to
define a new harvesting type to allow this?
--
View this message in context: http://osgeo-org.1803224.n2.nabble.com/harvesting-from-multiple-urls-with-a-single-harvest-entry-tp5680847p5681521.html
Sent from the GeoNetwork developer mailing list archive at Nabble.com.

Can someone please reply and at least tell me if what i want to do is
currently impossible with the latest GN version?
--
View this message in context: http://osgeo-org.1803224.n2.nabble.com/harvesting-from-multiple-urls-with-a-single-harvest-entry-tp5680847p5684838.html
Sent from the GeoNetwork developer mailing list archive at Nabble.com.

hi,

if I understand you correctly, you want to define a single harvester that has a list of URLs (rather than a single one) to harvest from ?

This is currently not possible in GeoNetwork. You need to define a separate harvester for each URL you want to harvest from.

Kind regards
Heikki Doeleman

On Fri, Oct 29, 2010 at 2:39 AM, sway <taniajacob@anonymised.com> wrote:

Can someone please reply and at least tell me if what i want to do is
currently impossible with the latest GN version?

View this message in context: http://osgeo-org.1803224.n2.nabble.com/harvesting-from-multiple-urls-with-a-single-harvest-entry-tp5680847p5684838.html

Sent from the GeoNetwork developer mailing list archive at Nabble.com.


Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

hi,

if I understand you correctly, you want to define a single harvester that has a list of URLs (rather than a single one) to harvest from ?

Not so. We have modified the WebDav harvester client to not only harvest from an ftp-like directory of XML metadata files, but from a sitemap or set of links to XML files listed on an HTML page. This is not yet in trunk, but has been a useful feature we’d like to contribute.

Doug.

···
-- 
Douglas D. Nebert
Senior Advisor for Geospatial Technology, System-of-Systems Architect
FGDC Secretariat   T:703 648 4151    F:703 648-5755    C:703 459-5860 

Z3950 can also harvest from a number of servers - and there are some ehancements for trunk that report on this that I plan to commit shortly. I think the reason why an answer wasn't given earlier was that the original question didn't refer to a harvesting method? :slight_smile:

Cheers,
Simon
________________________________________
From: Douglas Nebert [ddnebert@anonymised.com]
Sent: Monday, 1 November 2010 4:10 AM
To: geonetwork-devel@lists.sourceforge.net
Cc: Wenwen Li; hwu8@anonymised.com
Subject: Re: [GeoNetwork-devel] harvesting from multiple urls with a single harvest entry

On 10/29/10 4:39 AM, heikki wrote:
hi,

if I understand you correctly, you want to define a single harvester that has a list of URLs (rather than a single one) to harvest from ?

Not so. We have modified the WebDav harvester client to not only harvest from an ftp-like directory of XML metadata files, but from a sitemap or set of links to XML files listed on an HTML page. This is not yet in trunk, but has been a useful feature we'd like to contribute.

Doug.
This is currently not possible in GeoNetwork. You need to define a separate harvester for each URL you want to harvest from.

Kind regards
Heikki Doeleman

On Fri, Oct 29, 2010 at 2:39 AM, sway <taniajacob@anonymised.com<mailto:taniajacob@anonymised.com>> wrote:

Can someone please reply and at least tell me if what i want to do is
currently impossible with the latest GN version?
--
View this message in context: http://osgeo-org.1803224.n2.nabble.com/harvesting-from-multiple-urls-with-a-single-harvest-entry-tp5680847p5684838.html
Sent from the GeoNetwork developer mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net<mailto:GeoNetwork-devel@anonymised.comforge.net>

GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net<mailto:GeoNetwork-devel@anonymised.comforge.net>

GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

--
Douglas D. Nebert
Senior Advisor for Geospatial Technology, System-of-Systems Architect
FGDC Secretariat T:703 648 4151 F:703 648-5755 C:703 459-5860

Thanks for the answers. I was afraid I wasn't expressing my question properly
as I am new to the GN world.

Apart from harvesting from multiple URLs with a single harvester entry I am
also wondering if it would be possible (with any of the patches yet to be
committed that you have described) to tell from the harvested records
themselves, that the harvested records from each URL specified are "linked"
(as in linked in this way through a single harvester entry, not via some
other mechanism like similar keywords, etc.)?

Also for an OGC harvester entry will it be possible to set a different WxS
type for each URL (again i am referring to the code you describe that is yet
to be committed)?
--
View this message in context: http://osgeo-org.1803224.n2.nabble.com/harvesting-from-multiple-urls-with-a-single-harvest-entry-tp5680847p5692544.html
Sent from the GeoNetwork developer mailing list archive at Nabble.com.

On 10/31/10 8:10 PM, sway wrote:

Thanks for the answers. I was afraid I wasn't expressing my question properly
as I am new to the GN world.

Apart from harvesting from multiple URLs with a single harvester entry I am
also wondering if it would be possible (with any of the patches yet to be
committed that you have described) to tell from the harvested records
themselves, that the harvested records from each URL specified are "linked"
(as in linked in this way through a single harvester entry, not via some
other mechanism like similar keywords, etc.)?

Also for an OGC harvester entry will it be possible to set a different WxS
type for each URL (again i am referring to the code you describe that is yet
to be committed)?

An example of the deployment was to be able to index an HTML page with many URLs to hundreds of WMS or WFS service endpoints, as GetCapabilities URL links, and then invoking the WxS indexer to create ISO metadata for each of them.

Doug.

--
Douglas D. Nebert
Senior Advisor for Geospatial Technology, System-of-Systems Architect
FGDC Secretariat T:703 648 4151 F:703 648-5755 C:703 459-5860