Installing GN 4.4.6 on Ubuntu Server version 24.04, under Tomcat 9 I made a mistake and decided to do a complete re-install of Geonetwork.
However, there are traces of the previous install which apparently breaks the sync with Elasticsearch. The result is that the records counter is blank. Trying Re-indexing from the GN Tools does not help. In short, how can I best clear Elasticsearch from previous GN data?
This I have done
1 Undeploy Geonetwork from Tomcat. The GN webapp folder disappears.
2 Erase the H2 database (which incidentally is called gn.mv.hb nowadays).
3 Delete the PostgreSQL database I use for Geonetwork.
4 Delete from Elasticsearch gn-records, gn-features, gn-searchlog (are there more?)
5 Restart Tomcat, add geonetwork.war, restart Elasticsearch
6 Using default H2 database, harvest some 1400 records from a GN source
7 All checks OK - BUT records counter is blank, attempted searches have null results - but a null search shows all records
Hi,
Have you tried recreating the index from the admin tools menu? Maybe the previous index configuration was not correctly removed.
Apart from that, the steps you followed should be more than enough to start from a clean state. Do you maybe have a more detailed error log coming from ElasticSearch to show us?
Thank you for your attention. All GN indexing tools are tried. None of them solved the issue.
The Elasticsearch log has one recurring error, reported like this.
Not sure how this can be solved, or even related.
[2024-12-02T13:45:17,244][ERROR][o.e.i.g.GeoIpDownloader ] [threetest] exception during geoip databases update
org.elasticsearch.ElasticsearchException: not all primary shards of [.geoip_databases] index are active
at org.elasticsearch.ingest.geoip.GeoIpDownloader.updateDatabases(GeoIpDownloader.java:137) ~[ingest-geoip-7.17.15.jar:7.17.15]
at org.elasticsearch.ingest.geoip.GeoIpDownloader.runDownloader(GeoIpDownloader.java:284) [ingest-geoip-7.17.15.jar:7.17.15]
at org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:100) [ingest-geoip-7.17.15.jar:7.17.15]
at org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:46) [ingest-geoip-7.17.15.jar:7.17.15]
at org.elasticsearch.persistent.NodePersistentTasksExecutor$1.doRun(NodePersistentTasksExecutor.java:42) [elasticsearch-7.17.15.jar:7.17.15]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777) [elasticsearch-7.17.15.jar:7.17.15]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-7.17.15.jar:7.17.15]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
at java.lang.Thread.run(Thread.java:1583) [?:?]
That doesn’t look like an error that would prevent ElasticSearch to respond to search requests.
Do you have any error showing up in the output of the Tomcat/GeoNetwork process?
One more thing that could be helpful is to show us what the failing requests look like in your browser dev tools (opened with F12); more specifically requests targeting the /srv/api/search/records/_search
endpoint, with details on the HTTP code and body (if any) returned.
I agree. As a matter of fact that error seems to be fixed later on. The missing records count is just an example. The whole landing page is broken. Feel free to check: geonet.se/geonetwork
I have checked logs of all kinds, but this is interesting. It is a snippet from the only harvest done on this instance (geonetwork.log):
2024-12-02T07:39:49,876 INFO [geonetwork.harvester] - End of alignment for : Geodataportalen SRV
2024-12-02T07:39:49,895 INFO [geonetwork.harvester] - Aligning source logos from for : Geodataportalen SRV
2024-12-02T07:39:50,114 INFO [geonetwork.harvester] - Ended harvesting from node : Geodataportalen SRV (GeonetHarvester)
2024-12-02T07:44:12,595 ERROR [geonetwork.domain] - Error parsing ISO DateTimes '201912-02-08T12:00:00'. Error is: null
java.lang.NullPointerException: null
2024-12-02T07:44:12,597 ERROR [geonetwork.domain] - Error parsing ISO DateTimes '201912-02-08T12:00:00'. Error is: null
java.lang.NullPointerException: null
2024-12-02T07:44:12,597 ERROR [geonetwork.domain] - Error parsing ISO DateTimes '201912-02-08T12:00:00'. Error is: null
java.lang.NullPointerException: null
2024-12-02T07:44:27,303 ERROR [geonetwork.index] - Error during querying index. ErrorCause: {"index_uuid":"_na_","index":"gn-records","resource.type":"index_or_alias","resource.id":"gn-records","type":"index_not_found_exception","reason":"no such index [gn-records]","root_cause":[{"index_uuid":"_na_","index":"gn-records","resource.type":"index_or_alias","resource.id":"gn-records","type":"index_not_found_exception","reason":"no such index [gn-records]"}]}
2024-12-02T07:45:42,772 ERROR [geonetwork.domain] - Error parsing ISO DateTimes '201912-02-08T12:00:00'. Error is: null
java.lang.NullPointerException: null
2024-12-02T07:45:42,773 ERROR [geonetwork.domain] - Error parsing ISO DateTimes '201912-02-08T12:00:00'. Error is: null
java.lang.NullPointerException: null
2024-12-02T07:45:42,773 ERROR [geonetwork.domain] - Error parsing ISO DateTimes '201912-02-08T12:00:00'. Error is: null
java.lang.NullPointerException: null
I think it boils down how to trigger Elasticsearch to create first-run indexes?
You can check all you like on the given GN instance.
I see two different things here:
-
The mention of
"no such index [gn-records]"
from the logs extract in your post: it looks like the records index simply wasn’t created successfully yet; -
The http://geonet.se/geonetwork page shows the following error:
org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.TransactionException: JDBC begin transaction failed:
It looks like there’s also an issue with your setup, probably related to the database.
Can you access anything on your instance? Or is the Hibernate error on the home page new?
Sorry. A scheduled harvester caused this I think. A restart of the webapp (nothing else) made it run again. But with the same crippled search page.
I have failed to mention that the setup was running very well for several days, using a PostgreSQL database. I had some 10000 harvested records before i broke for no apparent reason. I will try to find the Elasticsearch initialisation script in the code.
2024-12-03T00:00:00,029 INFO [geonetwork.harvester] - Starting harvesting of Geodataportalen SRV
2024-12-03T00:00:00,128 INFO [geonetwork.harvester] - Started harvesting from node : Geodataportalen SRV (GeonetHarvester)
2024-12-03T00:00:00,132 INFO [geonetwork.harvester] - Retrieving information from : https://www.geodata.se/geodataportalen
2024-12-03T00:00:00,602 INFO [geonetwork.harvester] - Searching on Geodataportalen SRV. From 1 to 2.
2024-12-03T00:00:00,766 INFO [geonetwork.harvester] - Client didn't respond with page size so using page size of 100
2024-12-03T00:00:00,768 INFO [geonetwork.harvester] - Processing search with these parameters org.fao.geonet.kernel.harvest.harvester.geonet.Search@69776299[from=1401,to=1500,freeText=,title=,abstrac=,keywords=,digital=false,hardcopy=false,sourceUuid=,sourceName=,anyField=,anyValue=]
2024-12-03T00:00:00,768 INFO [geonetwork.harvester] - Searching on : Geodataportalen SRV
2024-12-03T00:00:00,769 INFO [geonetwork.harvester] - Searching on Geodataportalen SRV. From 1 to 100.
...
2024-12-03T00:00:02,724 INFO [geonetwork.harvester] - Searching on Geodataportalen SRV. From 1301 to 1400.
2024-12-03T00:00:02,854 INFO [geonetwork.harvester] - Total records processed from this search :1400
2024-12-03T00:00:02,863 INFO [geonetwork.harvester] - Start of alignment for : Geodataportalen SRV
2024-12-03T00:02:33,070 ERROR [geonetwork.domain] - Error parsing ISO DateTimes '201912-02-08T12:00:00'. Error is: null
java.lang.NullPointerException: null
2024-12-03T00:02:33,080 ERROR [geonetwork.domain] - Error parsing ISO DateTimes '201912-02-08T12:00:00'. Error is: null
java.lang.NullPointerException: null
2024-12-03T00:02:33,080 ERROR [geonetwork.domain] - Error parsing ISO DateTimes '201912-02-08T12:00:00'. Error is: null
java.lang.NullPointerException: null
2024-12-03T00:04:53,423 INFO [geonetwork.harvester] - End of alignment for : Geodataportalen SRV
2024-12-03T00:04:53,440 INFO [geonetwork.harvester] - Aligning source logos from for : Geodataportalen SRV
2024-12-03T00:04:53,547 INFO [geonetwork.harvester] - Ended harvesting from node : Geodataportalen SRV (GeonetHarvester)
2024-12-03T06:00:00,187 ERROR [org.springframework.transaction.interceptor.TransactionInterceptor] - Application exception overridden by rollback exception
javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: could not extract ResultSet
at org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:154) ~[hibernate-core-5.6.15.Final.jar:5.6.15.Final]
at org.hibernate.internal.SessionImpl.find(SessionImpl.java:3448) ~[hibernate-core-5.6.15.Final.jar:5.6.15.Final]
at org.hibernate.internal.SessionImpl.find(SessionImpl.java:3380) ~[hibernate-core-5.6.15.Final.jar:5.6.15.Final]
at jdk.internal.reflect.GeneratedMethodAccessor149.invoke(Unknown Source) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
at org.springframework.orm.jpa.SharedEntityManagerCreator$SharedEntityManagerInvocationHandler.invoke(SharedEntityManagerCreator.java:316) ~[spring-orm-5.3.33.jar:5.3.33]
at com.sun.proxy.$Proxy182.find(Unknown Source) ~[?:?]
at org.springframework.data.jpa.repository.support.SimpleJpaRepository.findById(SimpleJpaRepository.java:335) ~[spring-data-jpa-2.7.18.jar:2.7.18]
at jdk.internal.reflect.GeneratedMethodAccessor148.invoke(Unknown Source) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
Can you try and reset the UI configuration to the default one?
I had not changed it, but creating a new UI from srv did not help.
A new attempt to clean all Elasticsearch indexes, followed by a start of Tomcat and a restart of ES seemed to help. The indexes were created. A harvest of 1400 records seems to have populated the index. However GN still cannot read it.
root@threetest:~# curl -X GET http://localhost:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .geoip_databases 7LhwpPXhReyHmjjR87WJcw 1 0 37 37 35.2mb 35.2mb
yellow open gn-features 0r9HPnZqQyuqBWysi-OHOA 1 1 0 0 227b 227b
yellow open gn-records wW-fu2x5T2q2YHQ8nyyAMg 1 1 0 0 227b 227b
yellow open gn-searchlogs QiPLJopeQ2aPfUEwbGwaoA 1 1 0 0 227b 227b
root@threetest:~# curl -X GET http://localhost:9200/_cluster/health
{"cluster_name":"elasticsearch","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":8,"active_shards":8,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":3,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":72.72727272727273}
root@threetest:~# curl -X GET http://localhost:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .geoip_databases 7LhwpPXhReyHmjjR87WJcw 1 0 37 37 35.2mb 35.2mb
yellow open gn-features 0r9HPnZqQyuqBWysi-OHOA 1 1 0 0 227b 227b
yellow open gn-records wW-fu2x5T2q2YHQ8nyyAMg 1 1 12536 43600 217.6mb 217.6mb
yellow open gn-searchlogs QiPLJopeQ2aPfUEwbGwaoA 1 1 0 0 227b 227b
Ok so I’m 99% sure that what’s going on is that the gn-records
index is not configured properly. On your platform, search requests always fail (when they do) with reasons similar to this one:
Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default.
Please use a keyword field instead. Alternatively, set fielddata=true on [cl_spatialRepresentationType.key] in order to load field data by uninverting the inverted index.
The field cl_spatialRepresentationType.key
has its type set to “keyword” here:
Which leads me to believe that the above configuration was not applied properly. This can be explained by the fact that, depending on the configuration of your instance, this index configuration might be read from outside of the webapp folder; thus, it might not be up to date. I suggest that you review the paths of your current GeoNetwork installation at /geonetwork/srv/eng/admin.console#/dashboard/information
and check whether there’s no remaining records.json
field in the “index configuration folder” path.
Good luck!
Ok. So this is the top 7 lines of records.json in the failing installation.
"settings": {
"index": {
"max_result_window": 15000,
"max_inner_result_window": 200,
"query.default_field": "any.default",
"mapping.total_fields.limit": 4000,
while the file in the code (and war package) has this
"settings": {
"index": {
"max_result_window": ${es.index.max_result_window.limit},
"max_inner_result_window": ${es.index.max_inner_result_window.limit},
"query.default_field": "any.default",
"mapping.total_fields.limit": ${es.index.mapping.total_fields.limit},
Other differences further down (using Notepad++Compare) are like
"ignore_above": 2000
instead of
"ignore_above": ${es.index.ignore_above}
So why are there numbers instead of variables? I have most certainly not edited this file.
The path is /opt/tomcat/webapps/geonetwork/WEB-INF/data/config/index
This is expected, the records.json
object is parametrized and goes through the maven properties substitution process.
Could you make sure that this file is actually the one being used by GeoNetwork like I indicated above (admin menu > information)?
Thank you for your patience. The information gives this path to the index files: /opt/tomcat/webapps/geonetwork/WEB-INF/data/config/index and this folder contains three json files: features.json 4 Mar 2024, records.json 24 Oct 2024, searchlogs.json 4 Mar 2024.
Since yesterday I have cleared out GN, and made a virgin re-installation. All is well until I load some records. I loaded the ISO19115 templates and sample record. This broke the system again. I am using just the H2 database at this point.
Feel free to explore yourself - My GeoNetwork catalogue (default admin credentials).
This is geonetwork.log from the virgin run:
2024-12-04T10:46:46,423 ERROR [org.apache.activemq.broker.BrokerService] - Temporary Store limit is 51200 mb, whilst the temporary data directory: /opt/tomcat/activemq-data/localhost/tmp_storage only has 13039 mb of usable space
2024-12-04T10:46:54,622 WARN [org.apache.commons.dbcp2.BasicDataSource] - The requested JMX name [jdbcDataSource] was not valid and will be ignored.
2024-12-04T10:46:56,267 WARN [geonetwork.databasemigration] - Unable to retrieve the current GeoNetwork version from the database. If this is an initial run of the software, then the database will be auto-populated. Else check that the database is properly configured
2024-12-04T10:46:56,487 WARN [org.springframework.orm.jpa.persistenceunit.DefaultPersistenceUnitManager] - Found explicit default persistence unit with name 'default' in persistence.xml - overriding local default persistence unit settings ('packagesToScan'/'mappingResources')
2024-12-04T10:47:09,090 WARN [geonetwork.databasemigration] - Unable to retrieve the current GeoNetwork version from the database. If this is an initial run of the software, then the database will be auto-populated. Else check that the database is properly configured
2024-12-04T10:47:09,213 ERROR [geonetwork.settings] - Requested setting with name: system/feedback/languages not found. Add it to the settings table.
2024-12-04T10:47:09,245 ERROR [geonetwork.settings] - Requested setting with name: system/feedback/translationFollowsText not found. Add it to the settings table.
2024-12-04T10:47:09,253 ERROR [geonetwork.settings] - Requested setting with name: system/server/timeZone not found. Add it to the settings table.
2024-12-04T10:47:19,552 ERROR [geonetwork.settings] - Requested setting with name: system/platform/version not found. Add it to the settings table.
2024-12-04T10:47:19,557 ERROR [geonetwork.settings] - Requested setting with name: system/server/protocol not found. Add it to the settings table.
2024-12-04T10:47:19,560 ERROR [geonetwork.settings] - Requested setting with name: system/server/host not found. Add it to the settings table.
2024-12-04T10:47:19,563 ERROR [geonetwork.settings] - Requested setting with name: system/server/protocol not found. Add it to the settings table.
2024-12-04T10:47:19,566 ERROR [geonetwork.settings] - Requested setting with name: system/server/port not found. Add it to the settings table.
2024-12-04T10:47:21,240 WARN [org.apache.commons.dbcp2.BasicDataSource] - The requested JMX name [jdbcDataSource] was not valid and will be ignored.
2024-12-04T10:47:21,772 WARN [org.elasticsearch.client.RestClient] - request [GET http://localhost:9200/_cluster/health] returned 1 warnings: [299 Elasticsearch-7.17.15-0b8ecfb4378335f4689c4223d1f1115f16bef3ba "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
2024-12-04T10:47:25,078 WARN [org.elasticsearch.client.RestClient] - request [GET http://localhost:9200/_cluster/health] returned 1 warnings: [299 Elasticsearch-7.17.15-0b8ecfb4378335f4689c4223d1f1115f16bef3ba "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
2024-12-04T10:47:26,462 INFO [geonetwork.encryptor] - Password database encryptor initialized - Keep the file /opt/tomcat/webapps/geonetwork/WEB-INF/data/config/encryptor.properties safe and make a backup. When upgrading to a newer version of GeoNetwork the file must be restored, otherwise GeoNetwork will not be able to decrypt passwords already stored in the database.
And this is the Elasticsearch status from the initial loading of sample records.
curl -X GET 'http://localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .geoip_databases 7LhwpPXhReyHmjjR87WJcw 1 0 37 37 35.2mb 35.2mb
yellow open gn-features bXmSmn-iRQOyohhk-E4uCQ 1 1 0 0 227b 227b
yellow open gn-records Znmotq28Tpe_tUPGdyJVDw 1 1 15 0 454.9kb 454.9kb
yellow open gn-searchlogs 6vn6X2HhTp-7qAQSFCD1gA 1 1 0 0 227b 227b
Thank you for the information. I could reproduce the issue on your instance and locally. It happens when the GEMET INSPIRE thesaurus is added to GeoNetwork on a fresh install. Not 100% why (it might deserve its own bug report) but when I start a fresh GN instance without this thesaurus, everything works fine. I think you might want to only add this thesaurus if you have records relying on it.
Thank you Olivia! I was chasing ghosts here.
I apologize for not mentioning the INSPIRE themes - that seemed like a trivial thing in the case. Now I will have to figure out when this regression occurred, and select an older GN4 release.
I am building a re-incarnation of a geodata catalog I have been running for years, that harvests metadata from some 20 Swedish data catalogs and mapservices. Mostly open sources.
They are very often INSPIRE coded, so I want to present them as INSPIRE themes and EEA topics. See this site for example: https://sdi.eea.europa.eu/catalogue/srv
I see that it is driven by GN 4.4.4-SNAPSHOT? So maybe I will go for 4.4.4 instead.
Will you file a bug report? It is your find, after all.
Regards, Mats.E
I’ll give it a bit more time to see if I can find a reason for this.
Thank you so far. I will monitor the issue on Github.
I think the INSPIRE framework is an integral part of geometadata, so this matter needs to be resolved.
This is also demonstrated by the extensive coverage of INSPIRE in the GN manual: Configuring for the INSPIRE Directive - GeoNetwork opensource (EN)