Hi Kai,
We seem to be experiencing a similar issue to you.
We are using GeoNetwork 2.8 to provide a CSW for a collection of approximately 150,000 metadata records. We've been testing 2.10 with a view to upgrading from 2.8 and have discovered an issue with the number of records being returned by CSW searches using a spatial filter. A bounding box filter covering the full global extent returns around 12,000 of the expected 150,000 records. The discrepancy decreases as the bounding box filter becomes smaller, i.e when the bounding box covers a smaller number of records a larger percentage is successfully returned.
The issue only occurs with CSW searches - equivalent searches using the GN GUI appear to be OK. The issue does not occur in 2.8.
Other than GN versions, our test environment is essentially identical to our production environment:
- OpenJDK 64-Bit 1.6.0_24
- Tomcat 6.0.24
- PostgreSQL 9.1.9 / PostGIS 2.0.3 (test and prod GN databases are on the same postgres server)
- JNDI container managed connection pool
Cheers,
Aaron Sedgmen
Geoscience Australia
-----Original Message-----
From: Kai Liu [mailto:kliu.gis@anonymised.com]
Sent: Monday, 24 June 2013 8:52 AM
To: Francois Prunayre
Cc: Geonetwork-Users@anonymised.com
Subject: Re: [GeoNetwork-users] search capability with more than 100 K records
Hi Francois,
Thanks for your responding. The lucene size is 271M and I think it should store more than 100K documents. I tried to use luke to read it but failed because luke currently doesn't support lucene41. It has two kinds of problem when building the index:
1
ERROR [geonetwork.index] - Failed to convert gml to jts object:
<gml:Polygon xmlns:gml="http://www.opengis.net/gml" xmlns="
http://www.isotc211.org/2005/gmd" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:srv="http://www.isotc211.org/2005/srv" xmlns:fo="
http://www.w3.org/1999/XSL/Format" xmlns:gsr="
http://www.isotc211.org/2005/gsr" xmlns:advmis="
http://www.geodatenzentrum.de/advmis" xmlns:gts="
http://www.isotc211.org/2005/gts" xmlns:gmx="
http://www.isotc211.org/2005/gmx" xmlns:geonet="
http://www.fao.org/geonetwork" gml:id="ID-4686615730496844990" srsName="not defined">
<gml:exterior>
<gml:LinearRing>
<gml:pos>7.0332 51.6311 0.0</gml:pos>
<gml:pos>6.9877 51.6229 0.0</gml:pos>
<gml:pos>6.983 51.6233 0.0</gml:pos>
<gml:pos>6.9319 51.6449 0.0</gml:pos>
<gml:pos>6.9015 51.6354 0.0</gml:pos>
<gml:pos>6.9009 51.6752 0.0</gml:pos>
<gml:pos>6.941 51.7151 0.0</gml:pos>
<gml:pos>6.9104 51.7459 0.0</gml:pos>
<gml:pos>6.9158 51.7784 0.0</gml:pos>
<gml:pos>6.9249 51.7772 0.0</gml:pos>
<gml:pos>6.9561 51.772 0.0</gml:pos>
<gml:pos>6.9778 51.7989 0.0</gml:pos>
<gml:pos>7.0202 51.8002 0.0</gml:pos>
<gml:pos>7.03 51.784 0.0</gml:pos>
<gml:pos>7.0747 51.7776 0.0</gml:pos>
<gml:pos>7.0892 51.7252 0.0</gml:pos>
<gml:pos>7.0546 51.7107 0.0</gml:pos>
<gml:pos>7.0531 51.6959 0.0</gml:pos>
<gml:pos>7.023 51.6699 0.0</gml:pos>
<gml:pos>7.0446 51.6436 0.0</gml:pos>
<gml:pos>7.0332 51.6311 0.0</gml:pos>
<gml:pos>7.0332 51.6311 0.0</gml:pos>
<gml:pos>7.0332 51.6311 0.0</gml:pos>
</gml:LinearRing>
</gml:exterior>
</gml:Polygon>
2.
The metadata document index with id=96533 is corrupt/invalid - ignoring it.
Error: null
java.lang.NullPointerException
at
org.fao.geonet.kernel.search.index.LuceneIndexLanguageTracker.addDocument(LuceneIndexLanguageTracker.java:210)
at
org.fao.geonet.kernel.search.index.LuceneIndexWriterFactory.addDocument(LuceneIndexWriterFactory.java:32)
at
org.fao.geonet.kernel.search.SearchManager.index(SearchManager.java:690)
at
org.fao.geonet.kernel.DataManager.indexMetadata(DataManager.java:581)
at
org.fao.geonet.kernel.DataManager$IndexMetadataTask.run(DataManager.java:399)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
I think these errors occurred when the metadata had invalid format.
However, the index mechanism should only ignore tthose invalid metadata rather than those 90K metadata. It's very strange!
Thanks and best regards,
Kai Liu
On Fri, Jun 21, 2013 at 3:14 AM, Francois Prunayre <fx.prunayre@anonymised.com>wrote:
Hi,
2013/6/17 Kai Liu <kliu.gis@anonymised.com>:
> Hi all,
>
> I am using the latest github code and have a strage problem to
> search records from geonetwork with more than 100 K records: When my
> geonetwork has 96 K records, it return 96 K records when I do the
> search; After I
add
> 10 K more records and the number goes to 106K records, but it show
> 12965 records in the interface and CSW getrecords response ; After I
> delete 10
K
> records and the number is 96 K records, the search works again and
> it returns exact results.
Do you have any indexing errors in the log ? Opening the index with
luke show how many records ?Cheers.
Francois
> I was wondering which setting I should modify to support search 100K
> records.
>
> Thanks and best regards,
>
> Kai Liu
>
----------------------------------------------------------------------
--------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> GeoNetwork-users mailing list
> GeoNetwork-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geonetwork-users
> GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork
Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------