Hi Guys
Is it possible to create a more ‘threaded’ version of the lucene indexer? I have lot of grunt to throw at it, but it is barely using more than one CPU on the box. The database is on a separate box, and it looks underutilised too. A flood of threads may be the option here.
A config option which lets us set the thread pool size (fix it to 1 for the defaults) should speed the process up a lot.
Also, lucene 3.3 is out, this seems to have lots of things better since 2.9 (3.0) series. Has anyone had a look at it?
Regards
Terry Rankine
The 3.0.2 IndexWriter (and GeoNetwork IndexWriterFactory) is thread safe so I tried threading the batch import metadata indexer. Results were nice with Postgres load time for 20,000 odd records going from approx 1200 secs for single thread to approx 400 secs with four threads (this was running on a 2 core machine and solid state disk) - utilization in perf monitor looked very good, machine fans on high too :-).
Unfortunately some extra steps will have to be taken to avoid problems with some of our less capable databases (eg. McKoi and H2 - although I think H2 could be config'd around this) as they seem to stall/deadlock when multiple threads are used to interact with them.
Cheers,
Simon
________________________________________
From: Terry.Rankine@anonymised.com [Terry.Rankine@anonymised.com]
Sent: Thursday, 28 July 2011 12:31 PM
To: geonetwork-devel@lists.sourceforge.net
Subject: [ExternalEmail] [GeoNetwork-devel] lucene indexer
Hi Guys
Is it possible to create a more ‘threaded’ version of the lucene indexer? I have lot of grunt to throw at it, but it is barely using more than one CPU on the box. The database is on a separate box, and it looks underutilised too. A flood of threads may be the option here.
A config option which lets us set the thread pool size (fix it to 1 for the defaults) should speed the process up a lot.
Also, lucene 3.3 is out, this seems to have lots of things better since 2.9 (3.0) series. Has anyone had a look at it?
Regards
Terry Rankine