[GeoNetwork-devel] [GeoNetwork opensource Developer website] #1127: recommendations on optimizing lucene indexes needed

#1127: recommendations on optimizing lucene indexes needed
---------------------+------------------------------------------------------
Reporter: plcking | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.9.0
Component: General | Version: v2.6.4
Keywords: |
---------------------+------------------------------------------------------
I have 400K records (ISO19115) loaded into Geonetwork. I have implemented
all the recommendations in http://geonetwork-
opensource.org/manuals/trunk/users/admin/advanced-
configuration/index.html. I am finding that the following CSW
search(below) is taking longer than 60 seconds at times. I am assuming
that optimizing the lucene indexes will help, but I am worried about the
amount of time that optimizing will consume. Should I used
lukeall-1.0.1.jar OR is there a way to call a servlet directly (say, using
wget) to invoke the optimization ? Can I make CSW calls with a reasonable
response time during the optimization ?

<?xml version="1.0" encoding="UTF-8"?>
<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2&quot;
service="CSW" version="2.0.2" resultType="hits"
outputSchema="http://www.iso
tc211.org/2005/gmd" maxRecords="10">
   <csw:Query typeNames="dataset,application,datasetcollection,service">
     <csw:ElementSetName>summary</csw:ElementSetName>
     <csw:Constraint version="1.1.0">
       <ogc:Filter xmlns:ogc="http://www.opengis.net/ogc&quot;
xmlns:gml="http://www.opengis.net/gml&quot;&gt;
         <ogc:And>
           <ogc:PropertyIsLike wildCard="*" singleChar="#" escapeChar="\">
             <ogc:PropertyName>dc:title</ogc:PropertyName>
               <ogc:Literal>%Radarsat-2%</ogc:Literal>
           </ogc:PropertyIsLike>
           <ogc:PropertyIsGreaterThanOrEqualTo>
             <ogc:PropertyName>TempExtent_begin</ogc:PropertyName>
             <ogc:Literal>2012-10-01t00:00:00Z</ogc:Literal>
           </ogc:PropertyIsGreaterThanOrEqualTo>
           <ogc:PropertyIsLessThanOrEqualTo>
             <ogc:PropertyName>TempExtent_end</ogc:PropertyName>
             <ogc:Literal>2012-10-31t07:30:00Z</ogc:Literal>
           </ogc:PropertyIsLessThanOrEqualTo>
         </ogc:And>
       </ogc:Filter>
     </csw:Constraint>
   </csw:Query>
</csw:GetRecords>

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/1127&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#1127: recommendations on optimizing lucene indexes needed
---------------------+------------------------------------------------------
Reporter: plcking | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.9.0
Component: General | Version: v2.6.4
Keywords: |
---------------------+------------------------------------------------------

Comment(by heikki):

I would be careful assuming that optimizing will help at all, rather than
make things worse (see http://www.searchworkings.org/blog/-/blogs/simon-
says%3A-optimize-is-bad-for-you).

Maybe we should consider moving to Lucene 4.0 (see e.g.
http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html).

It may also be helpful if you use an analyzer like !YourKit first, to
determine if your slowness really comes from Lucene.

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/1127#comment:1&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#1127: recommendations on optimizing lucene indexes needed
---------------------+------------------------------------------------------
Reporter: plcking | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.9.0
Component: General | Version: v2.6.4
Keywords: |
---------------------+------------------------------------------------------

Comment(by fxp):

I agree with Heikki, you should first check were is your slowness. Also
using the same approach as in
http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch to CSW
service could probably help (but maybe not that easy if you need ISO19139
as output).

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/1127#comment:2&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#1127: recommendations on optimizing lucene indexes needed
---------------------+------------------------------------------------------
Reporter: plcking | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.9.0
Component: General | Version: v2.6.4
Keywords: |
---------------------+------------------------------------------------------

Comment(by plcking):

Replying to [comment:2 fxp]:
> I agree with Heikki, you should first check were is your slowness. Also
using the same approach as in
http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch to CSW
service could probably help (but maybe not that easy if you need ISO19139
as output).

I'm in the process of installing YourKit now....Pat

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/1127#comment:3&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#1127: recommendations on optimizing lucene indexes needed
---------------------+------------------------------------------------------
Reporter: plcking | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.9.0
Component: General | Version: v2.6.4
Keywords: |
---------------------+------------------------------------------------------

Comment(by plcking):

It looks like Lucene indexing search routines are the culprit.
I'm assuming that you have YourKit installed, so I would like to send you
cpu and memory snapshots of the query. I can't upload the snapshots to
this site due to file size(> 4Mb.), so do you have an alternative ?

Pat

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/1127#comment:4&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#1127: recommendations on optimizing lucene indexes needed
---------------------+------------------------------------------------------
Reporter: plcking | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.9.0
Component: General | Version: v2.6.4
Keywords: |
---------------------+------------------------------------------------------

Comment(by jesseeichar):

Do you have a ftp server we can download them from?

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/1127#comment:5&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#1127: recommendations on optimizing lucene indexes needed
---------------------+------------------------------------------------------
Reporter: plcking | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.9.0
Component: General | Version: v2.6.4
Keywords: |
---------------------+------------------------------------------------------

Comment(by plcking):

Replying to [comment:5 jesseeichar]:
> Do you have a ftp server we can download them from?

Please try the following URL's :

http://ceocat.ccrs.nrcan.gc.ca/memory.snapshot
http://ceocat.ccrs.nrcan.gc.ca/cpu.snapshot

Pat

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/1127#comment:6&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#1127: recommendations on optimizing lucene indexes needed
---------------------+------------------------------------------------------
Reporter: plcking | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.9.0
Component: General | Version: v2.6.4
Keywords: |
---------------------+------------------------------------------------------

Comment(by jesseeichar):

I have downloaded them. But I am really busy right now so I am not
completely sure when I will have time to analyze them. Perhaps you can
get the queries and run then with luke and see if what can be done to
speed them up?

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/1127#comment:7&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.