Hi list,
Has anyone already used Lucene to index document content?
I am actually studying the feasibility of indexing the content of documents quoted in the medata (<onLineSrc>).
My goal would be to allow the user to search in metadata but also in the documents content.
I have already taken a look at Lucene FAQ (http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-e7d23f91df094d7baeceb46b04d518dc426d7d2e). It seems that parsers already exist to extract text from the document we want to index with Lucene.
Any experience on that inside the Geonetwork community (apparently I should, at least, deal with pdf, MS-office files …) ?
Regards.
Sylvain