Hello developers,
We are a small team that has been implementing our GeoNetwork system. We have made some relatively minor user interface enhancements, for example supporting more ISO element subtemplates and some additional thesauri handling details. We plan to contribute this code to the project. We anticipate future development work. It is possible that we will have a large number of metadata records in the future.
We are experiencing performance issues with our GeoNetwork system. For example, it takes from 8 to 15 seconds to change from view to edit for an ISO 19115 record. Most batch imports or changes time out and fail. GeoNetwork has deadlocked.
I assume we have a problem in our local configuration since I see no similar issues at other installations. Some have larger data volumes than we have now. I tried to include many possibly relevant details. I have included statistics about our metadata records. Perhaps our content has a significant difference from other installations that could explain the performance.
Detailed Symptoms
Here are some of the issues:
· Long time (8-15 seconds) for metadata edit page to display content even with the -Penv-prod build option.
About 6 seconds to respond to
geonetwork/srv/api/records/395914/editor?currTab=default&starteditingsession=yes&_random=5952
and 2.9 seconds for
geonetwork/srv/api/records/45755f41-6c0d-4272-91b7-091afaafb14b/related
· Saving a record sometimes takes a long time before displaying the “general failure” message.
· Importing more than about 10 ISO 19139 files at once times outs, but the thread keeps running. (We increased the time out limit on nginx server. This allowed us to import more records at once. But, it does not fix the long processing time.)
· Selecting multiple metadata records from the “Contribute” view and then doing a batch update of permissions also was timing out and still takes substantial time.
· Occasional thread deadlocks from user activity. This has occurred 3 or 4 times. We have not been able to replicate the problem. The deadlocks occurred in the environment without PostGIS. This was using GeoNetwork 3.2.2
· CSW getRecord requests for under about 85 records take a few seconds, for 100 records it takes 15 minutes.
· Running the lucene index rebuild (from admin console) takes up to 10 minutes for 750 records. That was with a single thread, there was substantial improvement when we added more threads for lucene. What is the recommendation for the number of lucene threads?
Data Details
· There are about 30 users. It is unlikely that more than 10 were ever logged in at once. Typical usage could be around 3.
· There are 952 metadata records in the database. Almost all are ISO 19139. INSPIRE features are not used
· There are 590 records with meatadata.root = ‘gmd:MD_Metadata’ and 86 with metadata.root = ‘MD_Metadata’. Is it normal to have the value with and without the namespace?
· There are 73 records with no (null) value for metadata.root. There are 68 with metadata. are gmd:MD_Metadata – This seems to be strange
· There are 282 subtemplates, mostly CI_ResponsibleParty, CI_Citation, CI_Contact, and CI_OnlineResource
· Subtemplates may include xlinks to other subtemplates.
· My guess is that each full metadata record would access about 10 subtemplates. We have not computed this statistic yet.
· SELECT data FROM metadata takes about 10 seconds to run remotely with pgAdmin III user interface
· Metadata.data sizes – total 28302876, average 29729, minimum 538, maximum 97046 characters.
· There are 27 “external” thesauri loaded from RDF files (about 7 MB total in RDF format)
· Several thesauri have around 3000 entries. Most are 50 entries or fewer. Most of the thesauri are hierarchical like the Global Change Master Directory Science keywords. (It seems that the code processing the thesauri run in parallel processes when the edit page loads).
· Most of the metadata records include keywords from multiple thesauri. (We have not computed this statistic yet).
Technical Environment
Some version variability between development, staging, and production vm.
· GeoNetwork 3.4.0 with customizations Deadlocks occurred using GeoNetwork 3.2.2 (with customizations). We have not yet made the 3.4 version public so we do not know if they still occur.
· PostgreSQL 9.4.14 With PostGIS 2.2.5
· Tomcat 8.5.6, 8.0.26
· JVM – Oracle 1.8.0_111b14, 1.8.0_151-b12
· CentOS 6.9
· Nginx 1.12
· Tomcat reports max memory of 7282 MB, maximum thread count = 200 (even with deadlocks I have not seen more than 50 threads active) We can provide other values on request.
· JNDI – 10 threads
Analysis So Far
· Installed and examined postgres stats in a test environment (copied from production). I did not identify any SQL that ran for long times. It is also likely that the number of database records is too small to measure true performance. Query analysis for retrieving metadata and associated operationallowed records showed sequential reads of the tables even when we added indexes in an attempt to improve performance. Postgres documentation says that for small tables, sequential read on the single storage “page” is more efficient than random reads of specific records on the page.
· We have screen captures and files from the admin console/status and from the tomcat manager during at least one of the deadlocks.
· Loading of edit from display: Web browser network console shows an API call that seems to run for a long time and blocks other processing.
· Overall vm cpu utilization and memory seems to be reasonable
· It appears that when the user interface times out (large imports, etc.) even if the thread in tomcat continues, the database updates are rolled back.
· Using an earlier (3.x) version of GeoNetwork, we did an informal test comparing edit page load times for an ISO19139 file with xlinks and the same file with the xlinks already resolved. This did not show a difference in duration. The test may not have contained enough xlinks (and nested xlinks) or a large enough file size to properly test performance versus file content.
Questions:
· Are our number of records, size of metadata.data, thesauri, and subtemplates within the range of existing GeoNetwork installations?
· Is there any obvious causes for the slow performance?
· Any suggestions of what we should try to change?
· What other information could help resolve the issue?
I am sorry this is such a long message. We have been trying to resolve this for several weeks and have found many symptoms but few solutions.
Thank you for any help or guidance.
Jeff
Jeff Campbell
Geospatial Catalog Project Leader