Currently, I adding a a large volume of data sets (about 1~2 million
metadata) into GNK (2.6.0). I tried the latest version, but it has the same
issue. At the beginning of harvesting process (I already stored them in
local disk), it works well. As it goes on, it starts to give me error
message, but I can see data still can be added into database. However, the
query speed becomes extremely slow. A key word search could take around 15
seconds. So, I have a few questions related to this issue.
1. Because most of the data is from NASA earth observation data center, some
of the attributes can not be recognized by GNK parser. Is this the reason
for the bad performance?
2. Is it because I saved millions of data in one local folder? (I already
changed the JVM configuration)
3. If I understand correctly, you use Lucene for text indexing and PostGIS
for spatial indexing (optional). Why don't you use spatial Lucene?
Currently, I adding a a large volume of data sets (about 1~2 million
metadata) into GNK (2.6.0). I tried the latest version, but it has the same
issue.
Anyway, you should always use the latest version. Version 2.6.0 is really old and there has been many bug fixes and improvements since then.
At the beginning of harvesting process (I already stored them in
local disk), it works well. As it goes on, it starts to give me error
message, but I can see data still can be added into database.
What error message do you get? Maybe that clarifies what is happening.
However, the
query speed becomes extremely slow. A key word search could take around 15
seconds. So, I have a few questions related to this issue.
Me too
Did you check memory, cpu and disk access?
The query runs while the harvester is running?
Because most of the data is from NASA earth observation data center, some
of the attributes can not be recognized by GNK parser. Is this the reason
for the bad performance?
Probably not, but without seeing the error message…
Is it because I saved millions of data in one local folder? (I already
changed the JVM configuration)
Not necessarily, if your hardware is good enough. Knowing your JVM configuration is also helpful here.
If I understand correctly, you use Lucene for text indexing and PostGIS
for spatial indexing (optional). Why don’t you use spatial Lucene?
Probably I am not the best to answer this but: we haven’t had sponsors to migrate to Lucene spatial yet (or better: SOLR) And when GeoNetwork was developed first, no spatial extension was available on Lucene.
Now I am going to try the latest version of GNK and see if it can solve my
problem.
Additionally, I have a question about configuring the newest version here.
Now I am able to connect GNK with Postgre, but I want to use PostGIS for my
index since I have a large volume of datasets. I have already create postgis
extension in postgre DB, but I don't think PostGIS is being used (C:\Program
Files\geonetwork\web\geonetwork\WEB-INF\data\index), because the spatial
index file keep getting larger as I add data. Could you help me out with
this?