[GeoNetwork-devel] Using lucene to index document content

Hi list,

Has anyone already used Lucene to index document content?
I am actually studying the feasibility of indexing the content of documents quoted in the medata (<onLineSrc>).

My goal would be to allow the user to search in metadata but also in the documents content.
I have already taken a look at Lucene FAQ (http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-e7d23f91df094d7baeceb46b04d518dc426d7d2e). It seems that parsers already exist to extract text from the document we want to index with Lucene.

Any experience on that inside the Geonetwork community (apparently I should, at least, deal with pdf, MS-office files …) ?

Regards.
Sylvain