[GeoNetwork-users] CSW Metadata Harvesting

Dear community,

we want to connect to a CSW and harvest it with GeoNetwork 2.8.0. In the harvesting management, we have added a "Catalogue Service for the Web ISO Profile 2.0" with the following service URL:

http://www.geomis.sachsen.de/soapServices/CSWStartup?SERVICE=CSW&REQUEST=GetCapabilities&Version=2.0.2

After activating and running the process, it takes some time but the only response is the error message for time out posted below.

Could somebody perhaps try the URL above and tell me if it works for you? Or can you give us a URL that you know will work with GeoNetwork? This would really help us, so we could test if it is a problem with our system or something else.

Kind regards

Thomas

Error: Die Wartezeit für die Verbindung ist abgelaufen

Class: ConnectException

Stack:

at: java.net.PlainSocketImpl file: PlainSocketImpl.java line: -2 method: socketConnect

at: java.net.AbstractPlainSocketImpl file: AbstractPlainSocketImpl.java line: 339 method: doConnect

at: java.net.AbstractPlainSocketImpl file: AbstractPlainSocketImpl.java line: 200 method: connectToAddress

at: java.net.AbstractPlainSocketImpl file: AbstractPlainSocketImpl.java line: 182 method: connect

at: java.net.SocksSocketImpl file: SocksSocketImpl.java line: 391 method: connect

at: java.net.Socket file: Socket.java line: 579 method: connect

at: java.net.Socket file: Socket.java line: 528 method: connect

at: java.net.Socket file: Socket.java line: 425 method: <init>

at: java.net.Socket file: Socket.java line: 280 method: <init>

at: org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory file: DefaultProtocolSocketFactory.java line: 79 method: createSocket

at: jeeves.utils.XmlRequest file: XmlRequest.java line: 338 method: doExecute

at: jeeves.utils.XmlRequest file: XmlRequest.java line: 257 method: execute

at: org.fao.geonet.kernel.harvest.harvester.csw.Harvester file: Harvester.java line: 131 method: retrieveCapabilities

at: org.fao.geonet.kernel.harvest.harvester.csw.Harvester file: Harvester.java line: 85 method: harvest

at: org.fao.geonet.kernel.harvest.harvester.csw.CswHarvester file: CswHarvester.java line: 228 method: doHarvest

at: org.fao.geonet.kernel.harvest.harvester.AbstractHarvester$HarvestWithIndexProcessor file: AbstractHarvester.java line: 399 method: process

at: org.fao.geonet.kernel.harvest.harvester.AbstractHarvester file: AbstractHarvester.java line: 429 method: harvest

at: org.fao.geonet.kernel.harvest.harvester.HarvesterJob file: HarvesterJob.java line: 29 method: execute

Dear Thomas,

The URL you provided works fine for me (GeoNetwork 2.10.1).

The issue seems to be with your connection. Do you perhaps have a firewall
active that blocks certain ports?

Searching around I found these pages that may be of help to you:

https://groups.google.com/forum/#!msg/geonode-users/EEob3uQoPVE/ERnlIZ9q02UJ
http://www.linux-forum.de/wget-fehlgeschlagen-die-wartezeit-fuer-die-verbindung-ist-abgelaufen-erneuter-vers-2009527.html

To be sure, you could try
http://www.nationaalgeoregister.nl/geonetwork/srv/eng/csw?request=GetCapabilities&version=2.0.2&service=CSW
which will definitely work with GeoNetwork.

Kind regards,
Jan

ThomasK wrote

Dear community,

we want to connect to a CSW and harvest it with GeoNetwork 2.8.0. In the
harvesting management, we have added a "Catalogue Service for the Web ISO
Profile 2.0" with the following service URL:

http://www.geomis.sachsen.de/soapServices/CSWStartup?SERVICE=CSW&REQUEST=GetCapabilities&Version=2.0.2

After activating and running the process, it takes some time but the only
response is the error message for time out posted below.

Could somebody perhaps try the URL above and tell me if it works for you?
Or can you give us a URL that you know will work with GeoNetwork? This
would really help us, so we could test if it is a problem with our system
or something else.

--------------------------------------------------
Error: Die Wartezeit für die Verbindung ist abgelaufen

Class: ConnectException

Stack:

at: java.net.PlainSocketImpl file: PlainSocketImpl.java line: -2 method:
socketConnect
--------------------------------------------------
_______________________________________________
GeoNetwork-users mailing list

GeoNetwork-users@anonymised.com

https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5072449.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Dear Jan,

we have found a proxy option. But now we get a new error message which we
cannot interpret:

Error: Raised exception when searching: Error on line 1: White spaces are
required between publicId and systemId.

Class: OperationAbortedEx
Stack:
at: org.fao.geonet.kernel.harvest.harvester.csw.Harvester file:
Harvester.java line: 508 method: doSearch
at: org.fao.geonet.kernel.harvest.harvester.csw.Harvester file:
Harvester.java line: 214 method: search
at: org.fao.geonet.kernel.harvest.harvester.csw.Harvester file:
Harvester.java line: 95 method: harvest
at: org.fao.geonet.kernel.harvest.harvester.csw.CswHarvester file:
CswHarvester.java line: 228 method: doHarvest
at:
org.fao.geonet.kernel.harvest.harvester.AbstractHarvester$HarvestWithIndexProcessor
file: AbstractHarvester.java line: 399 method: process
at: org.fao.geonet.kernel.harvest.harvester.AbstractHarvester file:
AbstractHarvester.java line: 429 method: harvest
at: org.fao.geonet.kernel.harvest.harvester.HarvesterJob file:
HarvesterJob.java line: 29 method: execute
at: org.quartz.core.JobRunShell file: JobRunShell.java line: 213 method: run
at: org.quartz.simpl.SimpleThreadPool$WorkerThread file:
SimpleThreadPool.java line: 557 method: run

Any ideas of what it means? In the url there are of course no white spaces.
Does a more detailed error log file exists?

Kind regards
Thomas

JanBWijnands wrote

Dear Thomas,

The URL you provided works fine for me (GeoNetwork 2.10.1).

The issue seems to be with your connection. Do you perhaps have a firewall
active that blocks certain ports?

Searching around I found these pages that may be of help to you:

https://groups.google.com/forum/#!msg/geonode-users/EEob3uQoPVE/ERnlIZ9q02UJ
(http://www.linux-forum.de/wget-fehlgeschlagen-die-wartezeit-fuer-die-verbindung-ist-abgelaufen-erneuter-vers-2009527.html
this one probably not now that I look closer, though it does deal with a
refused connection. I guess?)

To be sure, you could try
http://www.nationaalgeoregister.nl/geonetwork/srv/eng/csw?request=GetCapabilities&version=2.0.2&service=CSW
which will definitely work with GeoNetwork.

Kind regards,
Jan

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5072487.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Dear Thomas,

The only thing I can think of is a misformed XML file. If you are running
GeoNetwork with Jetty, there are log files under ../geonetwork/jetty/logs/
You might need to adjust the log settings. You can do this by configuring
..geonetwork/web/geonetwork/WEB-INF/log4j.cfg (as seen on
http://apps.who.int/geonetwork/docs/apa.html). Since the errors seem to be
concentrated at the harvester, maybe you can change it from WARNING to INFO?

You did try to harvest that same URL right? It has just finished completing
here (took an hour, 2536 records, is that right?) without any exceptions.

(Note: I haven't encountered this error myself, I'm mostly just
brainstorming here)

Kind regards,
Jan

On Wed, Aug 14, 2013 at 2:01 PM, ThomasK [via OSGeo.org] <
ml-node+s1560n5072487h83@anonymised.com> wrote:

Dear Jan,

we have found a proxy option. But now we get a new error message which we
cannot interpret:

Error: Raised exception when searching: Error on line 1: White spaces are
required between publicId and systemId.

Class: OperationAbortedEx
Stack:
at: org.fao.geonet.kernel.harvest.harvester.csw.Harvester file:
Harvester.java line: 508 method: doSearch
at: org.fao.geonet.kernel.harvest.harvester.csw.Harvester file:
Harvester.java line: 214 method: search
at: org.fao.geonet.kernel.harvest.harvester.csw.Harvester file:
Harvester.java line: 95 method: harvest
at: org.fao.geonet.kernel.harvest.harvester.csw.CswHarvester file:
CswHarvester.java line: 228 method: doHarvest
at:
org.fao.geonet.kernel.harvest.harvester.AbstractHarvester$HarvestWithIndexProcessor
file: AbstractHarvester.java line: 399 method: process
at: org.fao.geonet.kernel.harvest.harvester.AbstractHarvester file:
AbstractHarvester.java line: 429 method: harvest
at: org.fao.geonet.kernel.harvest.harvester.HarvesterJob file:
HarvesterJob.java line: 29 method: execute
at: org.quartz.core.JobRunShell file: JobRunShell.java line: 213 method:
run
at: org.quartz.simpl.SimpleThreadPool$WorkerThread file:
SimpleThreadPool.java line: 557 method: run

Any ideas of what it means? In the url there are of course no white
spaces.
Does a more detailed error log file exists?

Kind regards
Thomas

JanBWijnands wrote
Dear Thomas,

The URL you provided works fine for me (GeoNetwork 2.10.1).

The issue seems to be with your connection. Do you perhaps have a firewall
active that blocks certain ports?

Searching around I found these pages that may be of help to you:

https://groups.google.com/forum/#!msg/geonode-users/EEob3uQoPVE/ERnlIZ9q02UJ
(
http://www.linux-forum.de/wget-fehlgeschlagen-die-wartezeit-fuer-die-verbindung-ist-abgelaufen-erneuter-vers-2009527.html this
one probably not now that I look closer, though it does deal with a refused
connection. I guess?)

To be sure, you could try
http://www.nationaalgeoregister.nl/geonetwork/srv/eng/csw?request=GetCapabilities&version=2.0.2&service=CSW which
will definitely work with GeoNetwork.

Kind regards,
Jan

------------------------------
If you reply to this email, your message will be added to the discussion
below:

http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5072487.html
To start a new topic under GeoNetwork users, email
ml-node+s1560n3860293h69@anonymised.com
To unsubscribe from CSW Metadata Harvesting, click here<http://osgeo-org.1560.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5072288&code=amFuYndpam5hbmRzQGdtYWlsLmNvbXw1MDcyMjg4fC05MTk2NDI4NjY=&gt;
.
NAML<http://osgeo-org.1560.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml&gt;

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5072508.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Dear Jan,

yes, we used the same URL
(http://www.geomis.sachsen.de/soapServices/CSWStartup?SERVICE=CSW&REQUEST=GetCapabilities&Version=2.0.2)
with the following options in the Harvesting Management (screenshot). But it
stopped after just a few minutes.

<http://osgeo-org.1560.x6.nabble.com/file/n5072656/screen_harvesting.png&gt;
<http://osgeo-org.1560.x6.nabble.com/file/n5072656/screen_harvesting2.png&gt;

I can only think of a system setting which might be wrong, although we
didn't change the standard one's from the installation. But we will try to
find more about the error of the " publicId and systemId".

Kind regards
Thomas

JanB wrote

You did try to harvest that same URL right? It has just finished
completing
here (took an hour, 2536 records, is that right?) without any exceptions.

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5072656.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Dear community,

unfortunately, we still havent't found a solution. So I wanted to post the
complete error message.

We now have used Jetty and a new installation "out of the box" with the
current version 2.10.1. Perhaps somebody has an idea of how to interpret the
error message* "White spaces are required between publicId and systemId".*

If you need to know more of our system configuration, please let me know.

Kind regards
Thomas

==========================================================
2013-08-20 10:39:47,263 INFO [jeeves.request] - HTML Request (from
172.24.52.61) : /geonetwork/srv/eng/xml.harvesting.run
2013-08-20 10:39:47,264 INFO [jeeves.service] - Dispatching :
xml.harvesting.run
2013-08-20 10:39:47,268 INFO [jeeves.service] - -> dispatching to output
for : xml.harvesting.run
2013-08-20 10:39:47,268 INFO [jeeves.service] - -> writing xml for :
xml.harvesting.run
2013-08-20 10:39:47,269 INFO [jeeves.service] - -> output ended for :
xml.harvesting.run
2013-08-20 10:39:47,269 INFO [jeeves.service] - -> dispatch ended for :
xml.harvesting.run
2013-08-20 10:39:48,261 WARN [geonetwork.harvester] - Raised exception when
searching : org.jdom.input.JDOMParseException: Error on line 1: White spaces
are required between publicId and systemId.
2013-08-20 10:39:48,357 WARN [geonetwork.harvester] - Raised exception when
searching : org.jdom.input.JDOMParseException: Error on line 1: White spaces
are required between publicId and systemId.
2013-08-20 10:39:48,358 WARN [geonetwork.harvester] - Raised exception
while harvesting from : harvest1 (CswHarvester)
2013-08-20 10:39:48,358 WARN [geonetwork.harvester] - (C) Class :
OperationAbortedEx
2013-08-20 10:39:48,358 WARN [geonetwork.harvester] - (C) Message : Raised
exception when searching: Error on line 1: White spaces are required between
publicId and systemId.
2013-08-20 10:39:49,904 INFO [jeeves.request] -

2013-08-20 10:39:49,904 INFO [jeeves.request] - HTML Request (from
172.24.52.61) : /geonetwork/srv/eng/xml.harvesting.get
2013-08-20 10:39:49,905 INFO [jeeves.service] - Dispatching :
xml.harvesting.get
2013-08-20 10:39:49,908 INFO [jeeves.service] - -> dispatching to output
for : xml.harvesting.get
2013-08-20 10:39:49,908 INFO [jeeves.service] - -> writing xml for :
xml.harvesting.get
2013-08-20 10:39:49,909 INFO [jeeves.service] - -> output ended for :
xml.harvesting.get
2013-08-20 10:39:49,909 INFO [jeeves.service] - -> dispatch ended for :
xml.harvesting.get

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5073365.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Hello,

over here we haven´t been succsessful in harvesting CSW either.

According to our tests, it is only possible to harvest CSWs from other
GeoNetwork installations. Whenever we try to harvest a CSW from another
software poduct like TerraCatalog, NOKIS or NUMIS it does not work at all.
Only CSW from other GeoNetwork installations can be harvested.

I have not tried to harvest from WMS, WFS or others so far. To my opinion
something is completely wrong with the harvesting module in GeoNetwork 2.10
/ 2.10.1.

2.8.0 had problems with proxy server configuration, so e.g. BKG did not
manage to harvest with that installation at all. They got an update and now
the Geodatenkatalog.de CSW works on 2.8.whatever - at leasts that´s the last
thing I heard about it. No information about testing 2.10 from them...

With harvesting not working and the bug from creating new metadata (uuid for
fileIdentifier is not saved when clicking on "save"), we cannot use
GeoNetwork 2.10.x at all for now and still stick to 2.6.0 / 2.6.4.

Greetings,
Anja

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5073967.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Hi Thomas,
Did you tried Jan's suggestion about increasing log level ?
log4j.logger.geonetwork.harvester = DEBUG
so you could get more details.

I can successfully harvest the URL you provided. So the issue is probably
that the server where your catalog is installed can't access the URL.
I suspect that the "White spaces are required between publicId and
systemId" is some pages returned by a proxy or firewall which is not valid
XML response.

Cheers

Francois

2013/8/20 ThomasK <thomas.kloss@anonymised.com>

Dear community,

unfortunately, we still havent't found a solution. So I wanted to post the
complete error message.

We now have used Jetty and a new installation "out of the box" with the
current version 2.10.1. Perhaps somebody has an idea of how to interpret
the
error message* "White spaces are required between publicId and systemId".*

If you need to know more of our system configuration, please let me know.

Kind regards
Thomas

==========================================================
2013-08-20 10:39:47,263 INFO [jeeves.request] - HTML Request (from
172.24.52.61) : /geonetwork/srv/eng/xml.harvesting.run
2013-08-20 10:39:47,264 INFO [jeeves.service] - Dispatching :
xml.harvesting.run
2013-08-20 10:39:47,268 INFO [jeeves.service] - -> dispatching to
output
for : xml.harvesting.run
2013-08-20 10:39:47,268 INFO [jeeves.service] - -> writing xml for :
xml.harvesting.run
2013-08-20 10:39:47,269 INFO [jeeves.service] - -> output ended for :
xml.harvesting.run
2013-08-20 10:39:47,269 INFO [jeeves.service] - -> dispatch ended for :
xml.harvesting.run
2013-08-20 10:39:48,261 WARN [geonetwork.harvester] - Raised exception
when
searching : org.jdom.input.JDOMParseException: Error on line 1: White
spaces
are required between publicId and systemId.
2013-08-20 10:39:48,357 WARN [geonetwork.harvester] - Raised exception
when
searching : org.jdom.input.JDOMParseException: Error on line 1: White
spaces
are required between publicId and systemId.
2013-08-20 10:39:48,358 WARN [geonetwork.harvester] - Raised exception
while harvesting from : harvest1 (CswHarvester)
2013-08-20 10:39:48,358 WARN [geonetwork.harvester] - (C) Class :
OperationAbortedEx
2013-08-20 10:39:48,358 WARN [geonetwork.harvester] - (C) Message :
Raised
exception when searching: Error on line 1: White spaces are required
between
publicId and systemId.
2013-08-20 10:39:49,904 INFO [jeeves.request] -

2013-08-20 10:39:49,904 INFO [jeeves.request] - HTML Request (from
172.24.52.61) : /geonetwork/srv/eng/xml.harvesting.get
2013-08-20 10:39:49,905 INFO [jeeves.service] - Dispatching :
xml.harvesting.get
2013-08-20 10:39:49,908 INFO [jeeves.service] - -> dispatching to
output
for : xml.harvesting.get
2013-08-20 10:39:49,908 INFO [jeeves.service] - -> writing xml for :
xml.harvesting.get
2013-08-20 10:39:49,909 INFO [jeeves.service] - -> output ended for :
xml.harvesting.get
2013-08-20 10:39:49,909 INFO [jeeves.service] - -> dispatch ended for :
xml.harvesting.get

--
View this message in context:
http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5073365.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

Hi Anja,

From here, I've been testing CSW servers like ESRI, MDWeb,

geocatalogue.frwithout any problem.

Could you provide URL for "TerraCatalog, NOKIS or NUMIS" for some testing ?

Thanks.

Francois

2013/8/23 Anja <moonbeam@anonymised.com>

Hello,

over here we haven´t been succsessful in harvesting CSW either.

According to our tests, it is only possible to harvest CSWs from other
GeoNetwork installations. Whenever we try to harvest a CSW from another
software poduct like TerraCatalog, NOKIS or NUMIS it does not work at all.
Only CSW from other GeoNetwork installations can be harvested.

I have not tried to harvest from WMS, WFS or others so far. To my opinion
something is completely wrong with the harvesting module in GeoNetwork 2.10
/ 2.10.1.

2.8.0 had problems with proxy server configuration, so e.g. BKG did not
manage to harvest with that installation at all. They got an update and now
the Geodatenkatalog.de CSW works on 2.8.whatever - at leasts that´s the
last
thing I heard about it. No information about testing 2.10 from them...

With harvesting not working and the bug from creating new metadata (uuid
for
fileIdentifier is not saved when clicking on "save"), we cannot use
GeoNetwork 2.10.x at all for now and still stick to 2.6.0 / 2.6.4.

Greetings,
Anja

--
View this message in context:
http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5073967.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

Hello Francois,

good to hear from you :slight_smile:

Here are the URLs to the CSW:

TerraCatalog:
http://gdi.diepholz.de/soapServices/CSWStartup?REQUEST=GetCapabilities&SERVICE=CSW
http://geoportal.braunschweig.de/soapServices/CSWStartup?SERVICE=CSW&REQUEST=GetCapabilities

NOKIS (Preludio is the software):
http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/GetCapabilitiesServant$Get?Service=CSW&Request=GetCapabilities&Version=2.0.2

NUMIS (Ingrid-Software, or also known as PortalU):
http://numis.niedersachsen.de/202/csw/provider/ni_mu?REQUEST=GetCapabilities&SERVICE=CSW

And there is another thing: GeoNetwork cannot deal with non-ogc parameters
behind the question mark of an CSW-URL. For example the non-ogc parameter
behind the question mark gives only part of the metadata in the complete
catalogue to be harvestet, not using this additional parameter in the URL
will bring all the metadata from the cataloge. GeoNetwork every time will
havest all of the metadata, because it cannot "see" the addition parameter.
- Well, please ask if you do not get my point.

Thank your for helping!!

Greetings,
Anja

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5073974.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Hi Thomas,
...

I can successfully harvest the URL you provided. So the issue is probably
that the server where your catalog is installed can't access the URL.
I suspect that the "White spaces are required between publicId and
systemId" is some pages returned by a proxy or firewall which is not valid
XML response.

Cheers

Francois

Dear Francois, dear Thomas,

we have the same error message with our installation. I have just looked it
up in the logfiles. My system administrator assured me, it is all okay with
proxy configuration from our side and I think he is right. We have a wrong
configuration concerning proxies in the map viewer, still. But this wrong /
missing configuration cannot influence the harvesting mechanism.

The tests with GeoNetwork CSW from somewhere in the internet worked out
right. Metadaten could be harvested.
The test with other software CSW from somewhere in the internet went wrong,
saying "no route to host" on the gui, saying "White spaces are required
between publicId and systemId" in catalina.out logfile.

Is GeoNetwork able to recognise other GeoNetwork over internet? Perhaps it
uses slightly other routines to address the CSW. Or the trouble comes from
the various <onlineResource> for certain Request-Types in the
GetCapabilities. GeoNetwork usually only has one url for all of the
requests. Other CSW often use different URL for different Requests. That´s
the main difference between GeoNetwork and other CSW software.

Greetings,
Anja

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5073977.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Dear Anja, dear Francois,

I think my colleague has tried the "log4j.logger.geonetwork.harvester =
DEBUG" option, but I will ask him again.

Next week we will monitor our network traffic to find out if any packages at
all leave the system. We use a virtual maschine on openSUSE. Also we will
try it with a new, external installation on windows. I suspect it's
something on our systems, since Jan and Francois have been able to harvest
the URL successfully. Proxy/Firewall, ...?

@Francois: How long did it take for you to harvest the URL?
@Anja: Could you perhaps explain your opinion about the different URLs for
different Requests further? May be with an URL example from an GeoNetwork
CSW and from an other software CSW.

Kind regards
Thomas

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5073985.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Hi

For a custom project we faced similar issue when using a proxy and noticed
that was caused by changes from http://trac.osgeo.org/geonetwork/ticket/861

We fixed in CatalogRequest.java (
https://github.com/geonetwork/core-geonetwork/blob/2.10.x/web/src/main/java/org/fao/geonet/csw/common/requests/CatalogRequest.java)
replacing the line :

httpMethod.setPath(address);

with

httpMethod.setPath(path);

I'll check to commit asap. Apologies as was not following the discussion
and just noticed now about this.

Regards,
Jose García

On Fri, Aug 23, 2013 at 12:37 PM, Anja <moonbeam@anonymised.com> wrote:

Hi Thomas,
...

I can successfully harvest the URL you provided. So the issue is probably
that the server where your catalog is installed can't access the URL.
I suspect that the "White spaces are required between publicId and
systemId" is some pages returned by a proxy or firewall which is not valid
XML response.

Cheers

Francois

Dear Francois, dear Thomas,

we have the same error message with our installation. I have just looked it
up in the logfiles. My system administrator assured me, it is all okay with
proxy configuration from our side and I think he is right. We have a wrong
configuration concerning proxies in the map viewer, still. But this wrong /
missing configuration cannot influence the harvesting mechanism.

The tests with GeoNetwork CSW from somewhere in the internet worked out
right. Metadaten could be harvested.
The test with other software CSW from somewhere in the internet went wrong,
saying "no route to host" on the gui, saying "White spaces are required
between publicId and systemId" in catalina.out logfile.

Is GeoNetwork able to recognise other GeoNetwork over internet? Perhaps it
uses slightly other routines to address the CSW. Or the trouble comes from
the various <onlineResource> for certain Request-Types in the
GetCapabilities. GeoNetwork usually only has one url for all of the
requests. Other CSW often use different URL for different Requests. That´s
the main difference between GeoNetwork and other CSW software.

Greetings,
Anja

--
View this message in context:
http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5073977.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
*
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on
GeoServer and GeoNetwork. Visit http://geocat.net for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net/&gt;

*

Hello Thomas,

please hava a look at the NOKIS GetCapabilities-Document:
http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/GetCapabilitiesServant$Get?Service=CSW&Request=GetCapabilities&Version=2.0.2

There you find the section for <ows:Operation name="GetRecords">. This
Request has to be posted to the URL
http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/GetRecords$Post$XML&quot;
xlink:type="simple".

You also find the section for <ows:Operation name="GetCapabilities">. The
GetCapabilities request has to be posted to the URL
http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/GetCapabilitiesServant$Get&quot;
xlink:type="simple"/> if you use your browser (httpGet) and to the URL
http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/GetCapabilitiesServant$Post$XML&quot;
xlink:type="simple"> if you use an Firefox add-on like "poster" (httpPost).

The request <ows:Operation name="DescribeRecord"> has to be sent to the
following URL:
<ows:Get
xlink:href="http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/DescribeRecordServant$Get&quot;
xlink:type="simple"/><ows:Post
xlink:href="http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/DescribeRecord$Post$XML&quot;
xlink:type="simple">

So you have different URLs to address to - for various requests. In addition
to that, you have to decide whether you post a request or whether you use
the httpGet method via bowser-url for a request. In case one of the given
URLs does not work, automated communication with the csw might become
impossible.

In GeoNetwork the whole communication works much "easer" (seen from the
human point of view). There is one URL for all of the requests. And it does
not matter, if a request is posted or sent to the csw via httpGet method.
Please hava look on the GetCapabilities document from ZGB Braunschweig:
http://maps.zgb.de:80/metadaten/srv/de/csw?Request=GetCapabilities&Service=CSW

There ist only the URL http://maps.zgb.de:80/metadaten/srv/de/csw. It can be
used for any request and any method (httpPost / httpGet).

So a GeoNetwork GetCapabilities document is an "easy to understand" one -
for humans and for machines. If you have a document like that one from NOKIS
you have to look for the GetCapabilities for each request and for each
method. Man and machine may have a problem with that and it is also
difficult to handle for the csw because there are so many URLs which have to
work properly.

I hope, things have become clearer now.

Greetings,
Anja

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5073995.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Anja, about Nokis node, some comments below

2013/8/23 Anja <moonbeam@anonymised.com>

Hello Thomas,

please hava a look at the NOKIS GetCapabilities-Document:

http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/GetCapabilitiesServant$Get?Service=CSW&Request=GetCapabilities&Version=2.0.2

Sounds like ows:Parameter in NOKIS capabilities should be ows:Constraint

<ows:Parameter name="SupportedQueryables">
  <ows:Value>Abstract</ows:Value>...

See Table 23 in CSW spec. Eg.
<ows:Constraint name="SupportedISOQueryables">
  <ows:Value>Language</ows:Value>
  <ows:Value>AlternateTitle</ows:Value>

Also GeoNetwork looks to SupportedISOQueryables and AdditionalQueryables
but not SupportedQueryables to set CSW filters. Not sure if we should add
that ?

Next, this type of queriables <ows:Value>{
http://purl.org/dc/elements/1.1/\}coverage</ows:Value> generates error on
harvester configuration.

So probably some work is required to support such services.

Cheers.

Francois

There you find the section for <ows:Operation name="GetRecords">. This
Request has to be posted to the URL

http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/GetRecords$Post$XML
"
xlink:type="simple".

You also find the section for <ows:Operation name="GetCapabilities">. The
GetCapabilities request has to be posted to the URL

http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/GetCapabilitiesServant$Get
"
xlink:type="simple"/> if you use your browser (httpGet) and to the URL

http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/GetCapabilitiesServant$Post$XML
"
xlink:type="simple"> if you use an Firefox add-on like "poster" (httpPost).

The request <ows:Operation name="DescribeRecord"> has to be sent to the
following URL:
<ows:Get
xlink:href="
http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/DescribeRecordServant$Get
"
xlink:type="simple"/><ows:Post
xlink:href="
http://nokis.niedersachsen.de/NOKIS/servants/de/disy/preludio2/service/cat/csw/v_2_0_2/DescribeRecord$Post$XML
"
xlink:type="simple">

So you have different URLs to address to - for various requests. In
addition
to that, you have to decide whether you post a request or whether you use
the httpGet method via bowser-url for a request. In case one of the given
URLs does not work, automated communication with the csw might become
impossible.

In GeoNetwork the whole communication works much "easer" (seen from the
human point of view). There is one URL for all of the requests. And it does
not matter, if a request is posted or sent to the csw via httpGet method.
Please hava look on the GetCapabilities document from ZGB Braunschweig:

http://maps.zgb.de:80/metadaten/srv/de/csw?Request=GetCapabilities&Service=CSW

There ist only the URL http://maps.zgb.de:80/metadaten/srv/de/csw. It can
be
used for any request and any method (httpPost / httpGet).

So a GeoNetwork GetCapabilities document is an "easy to understand" one -
for humans and for machines. If you have a document like that one from
NOKIS
you have to look for the GetCapabilities for each request and for each
method. Man and machine may have a problem with that and it is also
difficult to handle for the csw because there are so many URLs which have
to
work properly.

I hope, things have become clearer now.

Greetings,
Anja

--
View this message in context:
http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5073995.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

We have finally managed to get over the previous error message by
reinstalling GeoNetwork with Tomcat and setting the proxies. Now he harvests
the GetCapabilities document, but nothing more. Just one entry.

Here the log file:

2013-09-04 13:52:56,665 INFO [jeeves.request] -

2013-09-04 13:52:56,666 INFO [jeeves.request] - HTML Request (from
172.24.52.142) : /geonetwork/srv/eng/xml.harvesting.run
2013-09-04 13:52:56,668 INFO [jeeves.service] - Dispatching :
xml.harvesting.run
2013-09-04 13:52:56,700 INFO [jeeves.service] - -> dispatching to output
for : xml.harvesting.run
2013-09-04 13:52:56,701 INFO [jeeves.service] - -> writing xml for :
xml.harvesting.run
2013-09-04 13:52:56,703 INFO [jeeves.service] - -> output ended for :
xml.harvesting.run
2013-09-04 13:52:56,704 INFO [jeeves.service] - -> dispatch ended for :
xml.harvesting.run
2013-09-04 13:52:56,729 DEBUG [geonetwork.harvester] - AbstractHarvester
login: ownerId = 1
2013-09-04 13:52:56,752 INFO [geonetwork.harvester] - Started harvesting
from node : try1 (OgcWxSHarvester)
2013-09-04 13:52:56,753 INFO [geonetwork.harvester] - Retrieving remote
metadata information for : try1
2013-09-04 13:52:56,757 DEBUG [geonetwork.harvester] - GetCapabilities
document:
http://www.geomis.sachsen.de/soapServices/CSWStartup?SERVICE=CSW&REQUEST=GetCapabilities&Version=2.0.2&SERVICE=CSW&VERSION=2.0.2&REQUEST=GetCapabilities
2013-09-04 13:52:57,624 DEBUG [geonetwork.harvester] - - Removing old
metadata before update with id: 2
2013-09-04 13:52:57,627 DEBUG [geonetwork.harvester] - - Removing
thumbnail for layer metadata: 2
2013-09-04 13:52:57,825 DEBUG [geonetwork.harvester] - - XSLT
transformation using
/usr/share/tomcat/webapps/geonetwork/WEB-INF/data/config/schema_plugins/iso19139//convert//OGCWxSGetCapabilitiesto19119//OGCCSWGetCapabilities-to-ISO19119_ISO19139.xsl
2013-09-04 13:52:57,831 INFO [geonetwork.harvester] - - Adding metadata
for services with 526bde595b17d9c4cf75b4dc1b14ffc630b89114
2013-09-04 13:52:58,063 INFO [geonetwork.harvester] - Ended harvesting from
node : try1 (OgcWxSHarvester)
2013-09-04 13:52:58,256 INFO [jeeves.request] -

I have uploaded the generated xml metadata file. It looks like it is
generated directly from the GetCapabilities.

Download: Harvested-XML
<http://www.fileconvoy.com/dfl.php?id=g832e5e46ba32fbeb999364292222898a674746783&gt;

Any ideas of how we can tell GeoNetwork to harvest the whole CSW, not just
one file?

Kind regards
Thomas

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5076005.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Hi Thomas,

I'm not sure if this helps. The other day I think I faced similar issue as you mentioned.

When I set up my harvesting job in GeoNetwork, I first chose "OGC Web services (ie WMS, WFS, etc.)" harvester and that returned only one entry i.e. the capabilities document. Jose Garcia pointed out that I should be using "Catalogue Services for the Web ISO profile 2.0" CSW harvester instead.

I did that and managed to harvest all metadata from my harvest source.

Cheers
Richard Goh

-----Original Message-----
From: ThomasK [mailto:thomas.kloss@anonymised.com]
Sent: Wednesday, 4 September 2013 8:35 PM
To: geonetwork-users@lists.sourceforge.net
Subject: Re: [GeoNetwork-users] CSW Metadata Harvesting

We have finally managed to get over the previous error message by reinstalling GeoNetwork with Tomcat and setting the proxies. Now he harvests the GetCapabilities document, but nothing more. Just one entry.

Here the log file:

2013-09-04 13:52:56,665 INFO [jeeves.request] - ==========================================================
2013-09-04 13:52:56,666 INFO [jeeves.request] - HTML Request (from
172.24.52.142) : /geonetwork/srv/eng/xml.harvesting.run
2013-09-04 13:52:56,668 INFO [jeeves.service] - Dispatching :
xml.harvesting.run
2013-09-04 13:52:56,700 INFO [jeeves.service] - -> dispatching to output
for : xml.harvesting.run
2013-09-04 13:52:56,701 INFO [jeeves.service] - -> writing xml for :
xml.harvesting.run
2013-09-04 13:52:56,703 INFO [jeeves.service] - -> output ended for :
xml.harvesting.run
2013-09-04 13:52:56,704 INFO [jeeves.service] - -> dispatch ended for :
xml.harvesting.run
2013-09-04 13:52:56,729 DEBUG [geonetwork.harvester] - AbstractHarvester
login: ownerId = 1
2013-09-04 13:52:56,752 INFO [geonetwork.harvester] - Started harvesting from node : try1 (OgcWxSHarvester)
2013-09-04 13:52:56,753 INFO [geonetwork.harvester] - Retrieving remote metadata information for : try1
2013-09-04 13:52:56,757 DEBUG [geonetwork.harvester] - GetCapabilities
document:
http://www.geomis.sachsen.de/soapServices/CSWStartup?SERVICE=CSW&REQUEST=GetCapabilities&Version=2.0.2&SERVICE=CSW&VERSION=2.0.2&REQUEST=GetCapabilities
2013-09-04 13:52:57,624 DEBUG [geonetwork.harvester] - - Removing old
metadata before update with id: 2
2013-09-04 13:52:57,627 DEBUG [geonetwork.harvester] - - Removing
thumbnail for layer metadata: 2
2013-09-04 13:52:57,825 DEBUG [geonetwork.harvester] - - XSLT
transformation using
/usr/share/tomcat/webapps/geonetwork/WEB-INF/data/config/schema_plugins/iso19139//convert//OGCWxSGetCapabilitiesto19119//OGCCSWGetCapabilities-to-ISO19119_ISO19139.xsl
2013-09-04 13:52:57,831 INFO [geonetwork.harvester] - - Adding metadata
for services with 526bde595b17d9c4cf75b4dc1b14ffc630b89114
2013-09-04 13:52:58,063 INFO [geonetwork.harvester] - Ended harvesting from node : try1 (OgcWxSHarvester)
2013-09-04 13:52:58,256 INFO [jeeves.request] - ==========================================================

I have uploaded the generated xml metadata file. It looks like it is generated directly from the GetCapabilities.

Download: Harvested-XML
<http://www.fileconvoy.com/dfl.php?id=g832e5e46ba32fbeb999364292222898a674746783&gt;

Any ideas of how we can tell GeoNetwork to harvest the whole CSW, not just one file?

Kind regards
Thomas

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5076005.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi Richard,

yes we did use the "Catalogue Services for the Web ISO profile 2.0"
harvester, but only the GetCapabilities document returned.

Kind regards
Thomas

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5076158.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

Hi Thomas,

I tried the following CSW endpoint and I managed to harvest your metadata into my local GeoNetwork instance:

http://www.geomis.sachsen.de/soapServices/CSWStartup?SERVICE=CSW&REQUEST=GetCapabilities&Version=2.0.2

The version of GeoNetwork I used is release 2.10.1-0 (from \WEB-INF\server.prop).

It took quite a while to harvest 2543 metadata into my GeoNetwork instance from the above endpoint.

Cheers
Richard Goh

-----Original Message-----
From: ThomasK [mailto:thomas.kloss@anonymised.com]
Sent: Thursday, 5 September 2013 3:03 PM
To: geonetwork-users@lists.sourceforge.net
Subject: Re: [GeoNetwork-users] CSW Metadata Harvesting

Hi Richard,

yes we did use the "Catalogue Services for the Web ISO profile 2.0"
harvester, but only the GetCapabilities document returned.

Kind regards
Thomas

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/CSW-Metadata-Harvesting-tp5072288p5076158.html
Sent from the GeoNetwork users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork