Thanks for the replies everyone. I'm working with a user who wants us to ingest one image, w/ associated radar data, every 5 minutes for several years into the future, so over 100,000 per year. We can automate the process with a nested parent/child metadata hierarchy, I'm thinking of a separate GN instance for each year, and harvest from those. I would bundle the images together if they were small enough, but the total comes out to 315 MB/day.
Does anyone have experience they would like to share with such large catalogs?
Thank You,
Jim
On 02/16/2012 04:50 AM, heikki wrote:
The TooManyClauses error happens because of the way certain queries are
constructed, it's not directly the number of documents that are in the
index. See e.g. this explanation
http://stackoverflow.com/questions/1534789/help-needed-figuring-out-reason-for-maxclausecount-is-set-to-1024-error.
MaxClauseCount is set to 16384 in GeoNetwork, which seems a bit random
to me, it could be set to Integer.MAX_VALUE (which is 2^31 -1).
Also, we're using TermRangeQuery for date ranges, where it may be more
ideal to use NumericRangeQuery
(http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/core/org/apache/lucene/search/NumericRangeQuery.html).
Apart from range queries, I would expect that a catalog with 100,000s or
millions of records would still be fine -- though I have no performance
data on such sizes. If anyone does, please let us know ?
Kind regards,
Heikki Doeleman
On Thu, Feb 16, 2012 at 2:20 PM, Victor Epitropou
<vepitrop@anonymised.com <mailto:vepitrop@anonymised.com>> wrote:
Well, I first investigated this aspect a few days ago, when somebody
posted a complaint about a maxClauseCount parameter in the Lucene
index (forwarding post):
On Tue, Feb 14, 2012 at 4:32 PM, <kieransun@anonymised.com
<mailto:kieransun@anonymised.com>> wrote:
> Hello GeoNetwork users,
>
> we are searching for time ranges in about 54500 datasets inside
the CSW and get the following error:
>
> Raised exception while searching metadata :
org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount
is set to 16384
>
> How to raise it?
>
> Kind regards,
> Kieran
After that, just googling for "maxClauseCount" brought up an
interesting backstory to it. Apparently it is always hardcoded to some
"high enough" value (which, according to who you ask, might be 16000,
32000 or 64000 records etc.), which is of course proverbially proven
wrong, sooner or later. Fortunately, it can be changed through
configuration. That's the only obvious hard & fast limit which is due
only to an arbitrary constraint that I could identify. For the
rest....I suppose interesting things could happen if someone exceeds a
total of 2^31 records (might cause 32-bit integer overflows in certain
software modules).
--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
James Long
International Arctic Research Center
University of Alaska Fairbanks
jlong|at|alaska.edu
(907) 474-2440
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%