[GeoNetwork-devel] Sistematically move harvested items to local node

Hi all, I’m setting up a Geonetwork instance which is intended to be populated with meteorological simulations metaadata contained in a thredds catalog.

I would like to run the harvester daily only for the ‘new’ records. The old ones would be enriched on geonetwork and should not be overwritten.

I was planning to add a button in the harvester settings that calls the function ‘assignHarvestedRecordToLocalNode’ after running the harvester. I inserted a checkbox that points to a new harvesting option under the ‘schedule’ block.

thredds-assignHarvestedRecordToLocalNode

Data will be automatically moved to the local node after each harvesting run.

thredds-assignHarvestedRecordToLocalNodeHelp

The option saveDataToNode is added to the specific thredds.js file in the catalog harvest template (/var/lib/tomcat8/webapps/geonetwork/catalog/templates/admin/harvest/type) and in the xsl file (/var/lib/tomcat8/webapps/geonetwork/xsl/xml/harvesting/thredds.xsl). But I think I am missing something: the option is never saved so I should probably modify the kernel.

I had also tried to modify the run harvester but it relies on the option being saved to the backend too:

$scope.runHarvester = function() {
return $http.get(‘admin.harvester.run?_content_type=json&id=’ +
$scope.harvesterSelected[‘@id’])
.success(function(data) {
$scope.$parent.loadHarvesters().then(function() {
refreshSelectedHarvester();

}).then(function() {$scope.assignHarvestedRecordToLocalNode;
});

});
};

However, there should be an easier and clever option to do this (ex. adding something that moves the harvested records to the local node automatically based on the button currently available). Any suggestion?

Many thanks,
Chiara

···

Chiara Scaini

Hi Chiara

You need to update the backend code:

  1. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/ThreddsParams.java to handle the new property to the harvester

  2. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/ThreddsHarvester.java#L127, update the storeNodeExtra method

  3. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/Harvester.java, update the code to handle the new property and set the metadata source

I’m not sure about your change in $scope.runHarvester, I think that is not required. The only change for the UI/JS I think should be to update the JS code with the new property, update the save/retrieve methods and the UI with the checkbox to handle it.

Maybe can be relevant to check this other approach that has been added to OGC WxS harvester to handle similar stuff. But not sure if makes sense for Thredds harvester also: https://github.com/geonetwork/core-geonetwork/pull/2774

Regards,
Jose García

···

On Tue, Jul 3, 2018 at 6:17 PM, Chiara Scaini <saetachiara@anonymised.com> wrote:

Hi all, I’m setting up a Geonetwork instance which is intended to be populated with meteorological simulations metaadata contained in a thredds catalog.

I would like to run the harvester daily only for the ‘new’ records. The old ones would be enriched on geonetwork and should not be overwritten.

I was planning to add a button in the harvester settings that calls the function ‘assignHarvestedRecordToLocalNode’ after running the harvester. I inserted a checkbox that points to a new harvesting option under the ‘schedule’ block.

thredds-assignHarvestedRecordToLocalNode

Data will be automatically moved to the local node after each harvesting run.

thredds-assignHarvestedRecordToLocalNodeHelp

The option saveDataToNode is added to the specific thredds.js file in the catalog harvest template (/var/lib/tomcat8/webapps/geonetwork/catalog/templates/admin/harvest/type) and in the xsl file (/var/lib/tomcat8/webapps/geonetwork/xsl/xml/harvesting/thredds.xsl). But I think I am missing something: the option is never saved so I should probably modify the kernel.

I had also tried to modify the run harvester but it relies on the option being saved to the backend too:

$scope.runHarvester = function() {
return $http.get(‘admin.harvester.run?_content_type=json&id=’ +
$scope.harvesterSelected[‘@id’])
.success(function(data) {
$scope.$parent.loadHarvesters().then(function() {
refreshSelectedHarvester();

}).then(function() {$scope.assignHarvestedRecordToLocalNode;
});

});
};

However, there should be an easier and clever option to do this (ex. adding something that moves the harvested records to the local node automatically based on the button currently available). Any suggestion?

Many thanks,
Chiara

Chiara Scaini


Check out the vibrant tech community on one of the world’s most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


GeoNetwork-devel mailing list
GeoNetwork-devel@…537…sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Vriendelijke groeten / Kind regards,

Jose García


Veenderweg 13
6721 WD Bennekom
The Netherlands
T: +31 (0)318 416664

Please consider the environment before printing this email.

Hi, thanks for the quick reply! I’ll look into that and let you know how it goes.
Thanks also for the interesting link on the OGC WxS harvester, I’ll take that into account too.

Have a nice day,
Chiara

···

On 4 July 2018 at 08:20, Jose Garcia <jose.garcia@anonymised.com> wrote:

Hi Chiara

You need to update the backend code:

  1. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/ThreddsParams.java to handle the new property to the harvester

  2. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/ThreddsHarvester.java#L127, update the storeNodeExtra method

  3. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/Harvester.java, update the code to handle the new property and set the metadata source

I’m not sure about your change in $scope.runHarvester, I think that is not required. The only change for the UI/JS I think should be to update the JS code with the new property, update the save/retrieve methods and the UI with the checkbox to handle it.

Maybe can be relevant to check this other approach that has been added to OGC WxS harvester to handle similar stuff. But not sure if makes sense for Thredds harvester also: https://github.com/geonetwork/core-geonetwork/pull/2774

Regards,
Jose García

On Tue, Jul 3, 2018 at 6:17 PM, Chiara Scaini <saetachiara@anonymised.com> wrote:

Hi all, I’m setting up a Geonetwork instance which is intended to be populated with meteorological simulations metaadata contained in a thredds catalog.

I would like to run the harvester daily only for the ‘new’ records. The old ones would be enriched on geonetwork and should not be overwritten.

I was planning to add a button in the harvester settings that calls the function ‘assignHarvestedRecordToLocalNode’ after running the harvester. I inserted a checkbox that points to a new harvesting option under the ‘schedule’ block.

thredds-assignHarvestedRecordToLocalNode

Data will be automatically moved to the local node after each harvesting run.

thredds-assignHarvestedRecordToLocalNodeHelp

The option saveDataToNode is added to the specific thredds.js file in the catalog harvest template (/var/lib/tomcat8/webapps/geonetwork/catalog/templates/admin/harvest/type) and in the xsl file (/var/lib/tomcat8/webapps/geonetwork/xsl/xml/harvesting/thredds.xsl). But I think I am missing something: the option is never saved so I should probably modify the kernel.

I had also tried to modify the run harvester but it relies on the option being saved to the backend too:

$scope.runHarvester = function() {
return $http.get(‘admin.harvester.run?_content_type=json&id=’ +
$scope.harvesterSelected[‘@id’])
.success(function(data) {
$scope.$parent.loadHarvesters().then(function() {
refreshSelectedHarvester();

}).then(function() {$scope.assignHarvestedRecordToLocalNode;
});

});
};

However, there should be an easier and clever option to do this (ex. adding something that moves the harvested records to the local node automatically based on the button currently available). Any suggestion?

Many thanks,
Chiara

Chiara Scaini


Check out the vibrant tech community on one of the world’s most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


GeoNetwork-devel mailing list
GeoNetwork-devel@anonymised.comorge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Vriendelijke groeten / Kind regards,

Jose García


Veenderweg 13
6721 WD Bennekom
The Netherlands
T: +31 (0)318 416664

Please consider the environment before printing this email.

Chiara Scaini

Hi again. I was checking the code but since I installed the application using a .war file, I just see the .class js files. I could modify those and re-compress everything into a war file, but I was thinking that maybe there’s an easier way… here’s my use case:

I am using a ‘permanent’ thredds catalog populated from the filesystem, and an ‘auxiliary’ catalog that contains symbolic links to the new data added daily. When adding the new data, the old symbolic links in the ‘auxiliary’ catalog are deleted. Unfortunately, running again the harvester the day after leads to the deletion of the records in Geonetwork. Is there a way to prevent this?

Would it still happen if I disable the ‘ignore harvesting attribute’ option? I could set the ‘harvesting attribute’ to ‘True’ for records in the auxiliary catalog and, since the old ones are not in the auxiliary thredds catalog anymore, they would not be harvested.

Note that, since I remove the entries from the auxiliary catalog, I modify the thredds path in the Geonetwork metadata using a python script based on CSW services to point to the permanent catalog where they still exist.

Thanks,
Chiara

···

On 4 July 2018 at 08:25, Chiara Scaini <saetachiara@anonymised.com> wrote:

Hi, thanks for the quick reply! I’ll look into that and let you know how it goes.
Thanks also for the interesting link on the OGC WxS harvester, I’ll take that into account too.

Have a nice day,
Chiara

On 4 July 2018 at 08:20, Jose Garcia <jose.garcia@anonymised.com> wrote:

Hi Chiara

You need to update the backend code:

  1. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/ThreddsParams.java to handle the new property to the harvester

  2. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/ThreddsHarvester.java#L127, update the storeNodeExtra method

  3. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/Harvester.java, update the code to handle the new property and set the metadata source

I’m not sure about your change in $scope.runHarvester, I think that is not required. The only change for the UI/JS I think should be to update the JS code with the new property, update the save/retrieve methods and the UI with the checkbox to handle it.

Maybe can be relevant to check this other approach that has been added to OGC WxS harvester to handle similar stuff. But not sure if makes sense for Thredds harvester also: https://github.com/geonetwork/core-geonetwork/pull/2774

Regards,
Jose García

Chiara Scaini

On Tue, Jul 3, 2018 at 6:17 PM, Chiara Scaini <saetachiara@anonymised.com> wrote:

Hi all, I’m setting up a Geonetwork instance which is intended to be populated with meteorological simulations metaadata contained in a thredds catalog.

I would like to run the harvester daily only for the ‘new’ records. The old ones would be enriched on geonetwork and should not be overwritten.

I was planning to add a button in the harvester settings that calls the function ‘assignHarvestedRecordToLocalNode’ after running the harvester. I inserted a checkbox that points to a new harvesting option under the ‘schedule’ block.

thredds-assignHarvestedRecordToLocalNode

Data will be automatically moved to the local node after each harvesting run.

thredds-assignHarvestedRecordToLocalNodeHelp

The option saveDataToNode is added to the specific thredds.js file in the catalog harvest template (/var/lib/tomcat8/webapps/geonetwork/catalog/templates/admin/harvest/type) and in the xsl file (/var/lib/tomcat8/webapps/geonetwork/xsl/xml/harvesting/thredds.xsl). But I think I am missing something: the option is never saved so I should probably modify the kernel.

I had also tried to modify the run harvester but it relies on the option being saved to the backend too:

$scope.runHarvester = function() {
return $http.get(‘admin.harvester.run?_content_type=json&id=’ +
$scope.harvesterSelected[‘@id’])
.success(function(data) {
$scope.$parent.loadHarvesters().then(function() {
refreshSelectedHarvester();

}).then(function() {$scope.assignHarvestedRecordToLocalNode;
});

});
};

However, there should be an easier and clever option to do this (ex. adding something that moves the harvested records to the local node automatically based on the button currently available). Any suggestion?

Many thanks,
Chiara

Chiara Scaini


Check out the vibrant tech community on one of the world’s most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


GeoNetwork-devel mailing list
GeoNetwork-devel@anonymised.comorge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Vriendelijke groeten / Kind regards,

Jose García


Veenderweg 13
6721 WD Bennekom
The Netherlands
T: +31 (0)318 416664

Please consider the environment before printing this email.

Chiara Scaini

Hi Jose, I decided to try the kernel change in order to move sistematically the data to local node. I added a checkbox to the thredds harvesting template,
and params.saveDataToNode is the boolean option. I’m now modifying the source code based on your suggestions, but I would like to double check if it makes sense. I’ve been looking at this PR: https://github.com/geonetwork/core-geonetwork/commit/649fcc4e83489d3798820749076da09e8adc5379 and tried to do the same in the kernel. Here’s what I modified in kernel/harvest/harvester/thredds/Harvester.java. Do you see anything obviously wrong?

private void saveMetadata(Element md, String uuid, String uri) throws Exception {

//— strip the catalog namespace as it is not required
md.removeNamespaceDeclaration(invCatalogNS);

String schema = dataMan.autodetectSchema(md, null); // should be iso19139
if (schema == null) {
log.warning(“Skipping metadata with unknown schema.”);
result.unknownSchema++;
}

log.info(" - Adding metadata with " + uuid + " schema is set to " + schema + "\n XML is " + Xml.getString(md));

deleteExistingMetadata(uri);

//
// insert metadata
//
AbstractMetadata metadata = new Metadata();
metadata.setUuid(uuid);
metadata.getDataInfo().
setSchemaId(schema).
setRoot(md.getQualifiedName()).
setType(MetadataType.METADATA);

if (params. ) {
log.info(“Moving data to local node…”);
metadata.getSourceInfo().
setSourceId(context.getNodeId()).
setOwner(getOwner()).
setGroupOwner(Integer.valueOf(params.getOwnerIdGroup()));

} else {
metadata.getSourceInfo().
setSourceId(params.getUuid()).
setOwner(getOwner()).
setGroupOwner(Integer.valueOf(params.getOwnerIdGroup()));

// move to local node if option is selected - TODO call function
if (params.saveDataToNode) {
log.info(“Moving data to local node…”);
metadata.getHarvestInfo().
setHarvested(false).
setUuid(null).
setUri(null);
} else {
metadata.getHarvestInfo().
setHarvested(true).
setUuid(params.getUuid()).
setUri(uri);
}

addCategories(metadata, params.getCategories(), localCateg, context, log, null, false);
metadata = dataMan.insertMetadata(context, metadata, md, true, false, false, UpdateDatestamp.NO, false, false);

String id = String.valueOf(metadata.getId());

addPrivileges(id, params.getPrivileges(), localGroups, dataMan, context, log);

dataMan.indexMetadata(id, true, null);

dataMan.flush();
}

Also, since I’m deploying with tomcat, I was hoping to rebuild only the kernel sources, using javac and re-compiling only the kernel. However, there are some dependencies that don’t allow me to do that. Here’s an excerpt:

./HarvestValidationEnum.java:26: error: package jeeves.server.context does not exist
import jeeves.server.context.ServiceContext;
^
./HarvestValidationEnum.java:28: error: package org.jdom does not exist
import org.jdom.Element;
^
./HarvestValidationEnum.java:81: error: cannot find symbol
public abstract void validate(DataManager dataMan, ServiceContext context, Element xml) throws Exception;
^
symbol: class DataManager
location: class HarvestValidationEnum
./HarvestValidationEnum.java:81: error: cannot find symbol
public abstract void validate(DataManager dataMan, ServiceContext context, Element xml) throws Exception;
^
symbol: class ServiceContext
location: class HarvestValidationEnum
./HarvestValidationEnum.java:81: error: cannot find symbol
public abstract void validate(DataManager dataMan, ServiceContext context, Element xml) throws Exception;
^
symbol: class Element
location: class HarvestValidationEnum
./harvest/AbstractAligner.java:26: error: package org.apache.commons.lang does not exist
import org.apache.commons.lang.StringUtils;
^
./harvest/harvester/AbstractParams.java:26: error: package com.google.common.collect does not exist
import com.google.common.collect.Maps;
^
./harvest/harvester/AbstractParams.java:28: error: package com.vividsolutions.jts.util does not exist
import com.vividsolutions.jts.util.Assert;
^
./harvest/harvester/AbstractParams.java:30: error: package org.apache.commons.lang does not exist
import org.apache.commons.lang.StringUtils;
^
./harvest/harvester/AbstractParams.java:31: error: cannot find symbol
import org.fao.geonet.Util;
^
symbol: class Util
location: package org.fao.geonet
./harvest/harvester/AbstractParams.java:32: error: package org.fao.geonet.constants does not exist
import org.fao.geonet.constants.Geonet;
^
./harvest/harvester/AbstractParams.java:33: error: package org.fao.geonet.domain does not exist
import org.fao.geonet.domain.Localized;
^

How should I proceed? Rebuild anything from source and applying again my other changes? (ex. templates and other files that are in the catalog folder). Or do you know at which level I should be able to rebuild sources?

Many thanks,
Chiara

···

On 4 July 2018 at 12:15, Chiara Scaini <saetachiara@anonymised.com> wrote:

Hi again. I was checking the code but since I installed the application using a .war file, I just see the .class js files. I could modify those and re-compress everything into a war file, but I was thinking that maybe there’s an easier way… here’s my use case:

I am using a ‘permanent’ thredds catalog populated from the filesystem, and an ‘auxiliary’ catalog that contains symbolic links to the new data added daily. When adding the new data, the old symbolic links in the ‘auxiliary’ catalog are deleted. Unfortunately, running again the harvester the day after leads to the deletion of the records in Geonetwork. Is there a way to prevent this?

Would it still happen if I disable the ‘ignore harvesting attribute’ option? I could set the ‘harvesting attribute’ to ‘True’ for records in the auxiliary catalog and, since the old ones are not in the auxiliary thredds catalog anymore, they would not be harvested.

Note that, since I remove the entries from the auxiliary catalog, I modify the thredds path in the Geonetwork metadata using a python script based on CSW services to point to the permanent catalog where they still exist.

Thanks,
Chiara

On 4 July 2018 at 08:25, Chiara Scaini <saetachiara@anonymised.com> wrote:

Hi, thanks for the quick reply! I’ll look into that and let you know how it goes.
Thanks also for the interesting link on the OGC WxS harvester, I’ll take that into account too.

Have a nice day,
Chiara

Chiara Scaini

On 4 July 2018 at 08:20, Jose Garcia <jose.garcia@anonymised.com> wrote:

Hi Chiara

You need to update the backend code:

  1. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/ThreddsParams.java to handle the new property to the harvester

  2. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/ThreddsHarvester.java#L127, update the storeNodeExtra method

  3. https://github.com/geonetwork/core-geonetwork/blob/3.4.x/harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/thredds/Harvester.java, update the code to handle the new property and set the metadata source

I’m not sure about your change in $scope.runHarvester, I think that is not required. The only change for the UI/JS I think should be to update the JS code with the new property, update the save/retrieve methods and the UI with the checkbox to handle it.

Maybe can be relevant to check this other approach that has been added to OGC WxS harvester to handle similar stuff. But not sure if makes sense for Thredds harvester also: https://github.com/geonetwork/core-geonetwork/pull/2774

Regards,
Jose García

Chiara Scaini

On Tue, Jul 3, 2018 at 6:17 PM, Chiara Scaini <saetachiara@anonymised.com31…> wrote:

Hi all, I’m setting up a Geonetwork instance which is intended to be populated with meteorological simulations metaadata contained in a thredds catalog.

I would like to run the harvester daily only for the ‘new’ records. The old ones would be enriched on geonetwork and should not be overwritten.

I was planning to add a button in the harvester settings that calls the function ‘assignHarvestedRecordToLocalNode’ after running the harvester. I inserted a checkbox that points to a new harvesting option under the ‘schedule’ block.

thredds-assignHarvestedRecordToLocalNode

Data will be automatically moved to the local node after each harvesting run.

thredds-assignHarvestedRecordToLocalNodeHelp

The option saveDataToNode is added to the specific thredds.js file in the catalog harvest template (/var/lib/tomcat8/webapps/geonetwork/catalog/templates/admin/harvest/type) and in the xsl file (/var/lib/tomcat8/webapps/geonetwork/xsl/xml/harvesting/thredds.xsl). But I think I am missing something: the option is never saved so I should probably modify the kernel.

I had also tried to modify the run harvester but it relies on the option being saved to the backend too:

$scope.runHarvester = function() {
return $http.get(‘admin.harvester.run?_content_type=json&id=’ +
$scope.harvesterSelected[‘@id’])
.success(function(data) {
$scope.$parent.loadHarvesters().then(function() {
refreshSelectedHarvester();

}).then(function() {$scope.assignHarvestedRecordToLocalNode;
});

});
};

However, there should be an easier and clever option to do this (ex. adding something that moves the harvested records to the local node automatically based on the button currently available). Any suggestion?

Many thanks,
Chiara

Chiara Scaini


Check out the vibrant tech community on one of the world’s most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


GeoNetwork-devel mailing list
GeoNetwork-devel@anonymised.comorge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Vriendelijke groeten / Kind regards,

Jose García


Veenderweg 13
6721 WD Bennekom
The Netherlands
T: +31 (0)318 416664

Please consider the environment before printing this email.

Chiara Scaini