[GeoNetwork-devel] CFV: Proposal to add a Lucene-Only search mode

Dear PSC,

The target of this proposal is to add a lucene-only search service to
use with the widget UI which provides much better performance (10 to
20 times
faster and better concurrency support). The proposal is available here
[1]. A patch is available here [2].

For testing, a demo website (which may be offline sometimes) is
available with the new service here [3] and the current search here
[4] or [5].
Run a search and do paging to see the differences (increase hits per
page if needed).

Looking forward to your votes.

Regards

Francois

[1] http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch
[2] http://trac.osgeo.org/geonetwork/ticket/652
[3] http://188.165.244.186/geonetwork/apps/search/index_debug.html
[4] http://188.165.244.186/geonetwork/apps/search/index_debug_slow.html
[5] http://188.165.244.186/geonetwork/srv/en/main.home

This was something I have been planning on doing as well. Well done.

I have done performance and scalability tests and shown that I can very easily perform a denial of service attacks very easily.

I think this change is a requirement at some point.

I am very much in support of this and will gladly help test this out in Geocat and give back bug fixes that I find.

My 2 cents,

Jesse

On Fri, Nov 25, 2011 at 6:02 PM, Francois Prunayre <fx.prunayre@anonymised.com> wrote:

Dear PSC,

The target of this proposal is to add a lucene-only search service to
use with the widget UI which provides much better performance (10 to
20 times
faster and better concurrency support). The proposal is available here
[1]. A patch is available here [2].

For testing, a demo website (which may be offline sometimes) is
available with the new service here [3] and the current search here
[4] or [5].
Run a search and do paging to see the differences (increase hits per
page if needed).

Looking forward to your votes.

Regards

Francois

[1] http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch
[2] http://trac.osgeo.org/geonetwork/ticket/652
[3] http://188.165.244.186/geonetwork/apps/search/index_debug.html
[4] http://188.165.244.186/geonetwork/apps/search/index_debug_slow.html
[5] http://188.165.244.186/geonetwork/srv/en/main.home


All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi Francois,

+1 from me - nice improvement (also the graphs in the proposal showing the perf improvements will be very useful!)

Cheers and thanks,
Simon

On 11/26/2011 04:02 AM, Francois Prunayre wrote:

Dear PSC,

The target of this proposal is to add a lucene-only search service to
use with the widget UI which provides much better performance (10 to
20 times
faster and better concurrency support). The proposal is available here
[1]. A patch is available here [2].

For testing, a demo website (which may be offline sometimes) is
available with the new service here [3] and the current search here
[4] or [5].
Run a search and do paging to see the differences (increase hits per
page if needed).

Looking forward to your votes.

Regards

Francois

[1] proposals/LuceneOnlySearch – GeoNetwork opensource Developer website
[2] #652 (Lucene-Only search mode) – GeoNetwork opensource Developer website
[3] http://188.165.244.186/geonetwork/apps/search/index_debug.html
[4] http://188.165.244.186/geonetwork/apps/search/index_debug_slow.html
[5] http://188.165.244.186/geonetwork/srv/en/main.home

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi Francois,

This is very interesting, looks like a big performance improvement.
Just a question.

I didn't understand where the actual XML metadata documents are stored.
If not in the relational database are they stored in Lucene's files or a seperate
a collection of XML files on disc?

Regards,

Andrew

----- Original Message ----- From: "Francois Prunayre" <fx.prunayre@anonymised.com>
To: <geonetwork-devel@lists.sourceforge.net>
Sent: Saturday, November 26, 2011 4:02 AM
Subject: [GeoNetwork-devel] CFV: Proposal to add a Lucene-Only search mode

Dear PSC,

The target of this proposal is to add a lucene-only search service to
use with the widget UI which provides much better performance (10 to
20 times
faster and better concurrency support). The proposal is available here
[1]. A patch is available here [2].

For testing, a demo website (which may be offline sometimes) is
available with the new service here [3] and the current search here
[4] or [5].
Run a search and do paging to see the differences (increase hits per
page if needed).

Looking forward to your votes.

Regards

Francois

[1] http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch
[2] http://trac.osgeo.org/geonetwork/ticket/652
[3] http://188.165.244.186/geonetwork/apps/search/index_debug.html
[4] http://188.165.244.186/geonetwork/apps/search/index_debug_slow.html
[5] http://188.165.244.186/geonetwork/srv/en/main.home

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi Andrew,

2011/11/30 andrew walsh <awalsh@anonymised.com>:

Hi Francois,

This is very interesting, looks like a big performance improvement.
Just a question.

I didn't understand where the actual XML metadata documents are stored.

No changes made on the XML documents storage. The proposal only stores
some more information in index fields in order to be able to display
search results without accessing the XML document frolm the DB.

Cheers.

Francois

If not in the relational database are they stored in Lucene's files or a
seperate
a collection of XML files on disc?

Regards,

Andrew

----- Original Message ----- From: "Francois Prunayre"
<fx.prunayre@anonymised.com>
To: <geonetwork-devel@lists.sourceforge.net>
Sent: Saturday, November 26, 2011 4:02 AM
Subject: [GeoNetwork-devel] CFV: Proposal to add a Lucene-Only search mode

Dear PSC,

The target of this proposal is to add a lucene-only search service to
use with the widget UI which provides much better performance (10 to
20 times
faster and better concurrency support). The proposal is available here
[1]. A patch is available here [2].

For testing, a demo website (which may be offline sometimes) is
available with the new service here [3] and the current search here
[4] or [5].
Run a search and do paging to see the differences (increase hits per
page if needed).

Looking forward to your votes.

Regards

Francois

[1] http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch
[2] http://trac.osgeo.org/geonetwork/ticket/652
[3] http://188.165.244.186/geonetwork/apps/search/index_debug.html
[4] http://188.165.244.186/geonetwork/apps/search/index_debug_slow.html
[5] http://188.165.244.186/geonetwork/srv/en/main.home

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

+1

   Ciao,
   Emanuele

Alle 18:02:34 di Friday 25 November 2011, Francois Prunayre ha scritto:

Dear PSC,

The target of this proposal is to add a lucene-only search service to
use with the widget UI which provides much better performance (10 to
20 times
faster and better concurrency support). The proposal is available here
[1]. A patch is available here [2].

For testing, a demo website (which may be offline sometimes) is
available with the new service here [3] and the current search here
[4] or [5].
Run a search and do paging to see the differences (increase hits per
page if needed).

Looking forward to your votes.

Regards

Francois

[1] http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch
[2] http://trac.osgeo.org/geonetwork/ticket/652
[3] http://188.165.244.186/geonetwork/apps/search/index_debug.html
[4] http://188.165.244.186/geonetwork/apps/search/index_debug_slow.html
[5] http://188.165.244.186/geonetwork/srv/en/main.home

---------------------------------------------------------------------------
--- All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

Dear PSC,

The target of this proposal is to add a lucene-only search service to
use with the widget UI which provides much better performance (10 to
20 times
faster and better concurrency support). The proposal is available here
[1]. A patch is available here [2].

For testing, a demo website (which may be offline sometimes) is
available with the new service here [3] and the current search here
[4] or [5].
Run a search and do paging to see the differences (increase hits per
page if needed).

Francois, if the target is the widget UI, then you should probably also expose it to the opensearch API for full text search.

+1 for widget UI and opensearch API

Doug.

Looking forward to your votes.

Regards

Francois

[1] http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch
[2] http://trac.osgeo.org/geonetwork/ticket/652
[3] http://188.165.244.186/geonetwork/apps/search/index_debug.html
[4] http://188.165.244.186/geonetwork/apps/search/index_debug_slow.html
[5] http://188.165.244.186/geonetwork/srv/en/main.home

---------------------------------------------------------------------------
--- All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

--
Douglas D. Nebert
Senior Advisor for Geospatial Technology, System-of-Systems Architect
FGDC Secretariat Tel/Fax:+1 503 454-6248 Cell:+1 703 459-5860

I would like to see it extended for certain CSW searches as well if that is ok. Obviously the ones that need the full metadata can’t be done but hits and summary can be to some degree.

Does that sound like an good idea?

Jesse

On Wed, Nov 30, 2011 at 4:54 PM, Douglas Nebert <ddnebert@anonymised.com> wrote:

Dear PSC,

The target of this proposal is to add a lucene-only search service to
use with the widget UI which provides much better performance (10 to
20 times
faster and better concurrency support). The proposal is available here
[1]. A patch is available here [2].

For testing, a demo website (which may be offline sometimes) is
available with the new service here [3] and the current search here
[4] or [5].
Run a search and do paging to see the differences (increase hits per
page if needed).

Francois, if the target is the widget UI, then you should probably also
expose it to the opensearch API for full text search.

+1 for widget UI and opensearch API

Doug.

Looking forward to your votes.

Regards

Francois

[1] http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch
[2] http://trac.osgeo.org/geonetwork/ticket/652
[3] http://188.165.244.186/geonetwork/apps/search/index_debug.html
[4] http://188.165.244.186/geonetwork/apps/search/index_debug_slow.html
[5] http://188.165.244.186/geonetwork/srv/en/main.home


— All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork


All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


Douglas D. Nebert
Senior Advisor for Geospatial Technology, System-of-Systems Architect
FGDC Secretariat Tel/Fax:+1 503 454-6248 Cell:+1 703 459-5860


All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

2011/12/1 Jesse Eichar <jesse.eichar@anonymised.com>:

I would like to see it extended for certain CSW searches as well if that is
ok. Obviously the ones that need the full metadata can't be done but hits
and summary can be to some degree.
Does that sound like an good idea?

Sounds good to me if you have funding to do it Jesse ! I will commit
that proposal first, probably next week and we could improve CSW,
OpenSearch, ... later when resources available.

Also for the summary in results_with_summary, I have been
investigating using Lucene facetting module [1] which is maybe a
better than our custom summary builder (and also improve
performances). But it needs some more work.

Cheers

Francois

PS : I'll try to be online for IRC next week so we could discuss that if needed

[1] http://www.neogeo-online.net/blog/archives/1524/

Jesse

On Wed, Nov 30, 2011 at 4:54 PM, Douglas Nebert <ddnebert@anonymised.com> wrote:

>> Dear PSC,
>>
>> The target of this proposal is to add a lucene-only search service to
>> use with the widget UI which provides much better performance (10 to
>> 20 times
>> faster and better concurrency support). The proposal is available here
>> [1]. A patch is available here [2].
>>
>> For testing, a demo website (which may be offline sometimes) is
>> available with the new service here [3] and the current search here
>> [4] or [5].
>> Run a search and do paging to see the differences (increase hits per
>> page if needed).
Francois, if the target is the widget UI, then you should probably also
expose it to the opensearch API for full text search.

+1 for widget UI and opensearch API

Doug.
>> Looking forward to your votes.
>>
>> Regards
>>
>> Francois
>>
>> [1] http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch
>> [2] http://trac.osgeo.org/geonetwork/ticket/652
>> [3] http://188.165.244.186/geonetwork/apps/search/index_debug.html
>> [4] http://188.165.244.186/geonetwork/apps/search/index_debug_slow.html
>> [5] http://188.165.244.186/geonetwork/srv/en/main.home
>>
>>
>> ---------------------------------------------------------------------------
>> --- All the data continuously generated in your IT infrastructure
>> contains a definitive record of customers, application performance,
>> security threats, fraudulent activity, and more. Splunk takes this
>> data and makes sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-novd2d
>> _______________________________________________
>> GeoNetwork-devel mailing list
>> GeoNetwork-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
>> GeoNetwork OpenSource is maintained at
>> http://sourceforge.net/projects/geonetwork
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> GeoNetwork-devel mailing list
> GeoNetwork-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
> GeoNetwork OpenSource is maintained at
> http://sourceforge.net/projects/geonetwork
>

--
Douglas D. Nebert
Senior Advisor for Geospatial Technology, System-of-Systems Architect
FGDC Secretariat Tel/Fax:+1 503 454-6248 Cell:+1 703 459-5860

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

Dear François,
Nice proposal. +1 from me.
Cheers,
Jeroen

On 25 nov. 2011, at 18:02, Francois Prunayre wrote:

Dear PSC,

The target of this proposal is to add a lucene-only search service to
use with the widget UI which provides much better performance (10 to
20 times
faster and better concurrency support). The proposal is available here
[1]. A patch is available here [2].

For testing, a demo website (which may be offline sometimes) is
available with the new service here [3] and the current search here
[4] or [5].
Run a search and do paging to see the differences (increase hits per
page if needed).

Looking forward to your votes.

Regards

Francois

[1] proposals/LuceneOnlySearch – GeoNetwork opensource Developer website
[2] #652 (Lucene-Only search mode) – GeoNetwork opensource Developer website
[3] http://188.165.244.186/geonetwork/apps/search/index_debug.html
[4] http://188.165.244.186/geonetwork/apps/search/index_debug_slow.html
[5] http://188.165.244.186/geonetwork/srv/en/main.home

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Sorry for my delay.

+1 from me as well,

Patrizia

-----Original Message-----
From: Francois Prunayre [mailto:fx.prunayre@anonymised.com]
Sent: 25 November 2011 18:03
To: Devel geonetwork-devel@lists.sourceforge.net
Subject: [GeoNetwork-devel] CFV: Proposal to add a Lucene-Only search mode

Dear PSC,

The target of this proposal is to add a lucene-only search service to use with the widget UI which provides much better performance (10 to 20 times faster and better concurrency support). The proposal is available here [1]. A patch is available here [2].

For testing, a demo website (which may be offline sometimes) is available with the new service here [3] and the current search here [4] or [5]. Run a search and do paging to see the differences (increase hits per page if needed).

Looking forward to your votes.

Regards

Francois

[1] http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch
[2] http://trac.osgeo.org/geonetwork/ticket/652
[3] http://188.165.244.186/geonetwork/apps/search/index_debug.html
[4] http://188.165.244.186/geonetwork/apps/search/index_debug_slow.html
[5] http://188.165.244.186/geonetwork/srv/en/main.home

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________
GeoNetwork-devel mailing list GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork