[GeoNetwork-devel] Challenges of Geonetwork CSW performance for concurrent requests

Hi all,
We set up a clearinghouse based on GeoNetwork opensource2.4.3 ( http://clearinghouse.cisc.gmu.edu/geonetwork ). About 26K records have been ingested into the clearinghouse.

When 1 CSW request was sent to the clearinghouse, the time to get the response is 2.6 seconds.

Now I test the performance when 100 CSW requests are sent to the clearinghouse concurrently and get the average is 83 seconds.

The average response time (83s) is much longer than sending only 1 request. Is it normal?
Thanks,

Kai
Joint Center for Intelligent Spatial Computing
703-395-2337

Hi Kai

Can you provide more information about the server config and the requests send?

I tried with JMeter using this request with the server you provide:

<?xml version="1.0" encoding="UTF-8"?>

<env:Envelope xmlns:env=“http://www.w3.org/2003/05/soap-envelope”>
env:Body
<csw:GetRecords xmlns:csw=“http://www.opengis.net/cat/csw/2.0.2” xmlns:ogc=“http://www.opengis.net/ogc” maxRecords=“10”
outputFormat=“application/xml” outputSchema=“http://www.opengis.net/cat/csw/2.0.2
resultType=“results” service=“CSW” version=“2.0.2”>
<csw:Query typeNames=“csw:Record”>
csw:ElementSetNamefull</csw:ElementSetName>
<csw:Constraint version=“1.1.0”>
<ogc:Filter
xmlns:gml=“http://www.opengis.net/gml”>
<ogc:PropertyIsLike escape=“" singleChar=”_" wildCard=“%”>
ogc:PropertyNameAnyText</ogc:PropertyName>
ogc:Literal%water%</ogc:Literal>
</ogc:PropertyIsLike>
</ogc:Filter>
</csw:Constraint>
</csw:Query>
</csw:GetRecords>
</env:Body>
</env:Envelope>

And got these results:

1 thread/40 repeats: request average 0.57 s
20 threads/10 s ramp period/10 repeats: request average 1.3 s
100 threads/10 s ramp period/10 repeats: request average 8 s

the last is a bit high, but not as high as 83 s. If you can provide the queries used should be helpful.

Also for 2.6 version (avalaible in august) some improvements in lucene indexes have been done, so hopefully the search times will benefit.

Regards,
Jose Garcia

On Wed, Jun 23, 2010 at 10:29 PM, Kai Liu <kliu4@anonymised.com> wrote:

Hi all,
We set up a clearinghouse based on GeoNetwork opensource2.4.3 ( http://clearinghouse.cisc.gmu.edu/geonetwork ). About 26K records have been ingested into the clearinghouse.

When 1 CSW request was sent to the clearinghouse, the time to get the response is 2.6 seconds.

Now I test the performance when 100 CSW requests are sent to the clearinghouse concurrently and get the average is 83 seconds.

The average response time (83s) is much longer than sending only 1 request. Is it normal?
Thanks,

Kai
Joint Center for Intelligent Spatial Computing
703-395-2337


ThinkGeek and WIRED’s GeekDad team up for the Ultimate
GeekDad Father’s Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Can you provide more information about the server config and the
requestssend?

We stopped the geonetwork just now for another test.
Could you please do that test again?
Our server config as below:
CPU: 2*4 2.33G,
Memory: 16G,
OS: RED HAT Enterprise 4.0
Java version: 1.6.0_20

I tried with JMeter using this request with the server you provide:

<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope&quot;&gt;
<env:Body>
   <csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2&quot;
xmlns:ogc="http://www.opengis.net/ogc&quot; maxRecords="10"
        outputFormat="application/xml" outputSchema="
http://www.opengis.net/cat/csw/2.0.2&quot;
        resultType="results" service="CSW" version="2.0.2">
     <csw:Query typeNames="csw:Record">
       <csw:ElementSetName>full</csw:ElementSetName>
       <csw:Constraint version="1.1.0">
         <ogc:Filter
             xmlns:gml="http://www.opengis.net/gml&quot;&gt;
       <ogc:PropertyIsLike escape="\" singleChar="_" wildCard="%">
         <ogc:PropertyName>AnyText</ogc:PropertyName>
         <ogc:Literal>%water%</ogc:Literal>
       </ogc:PropertyIsLike>
         </ogc:Filter>
       </csw:Constraint>
     </csw:Query>
   </csw:GetRecords>
</env:Body>
</env:Envelope>

And got these results:

1 thread/40 repeats: request average 0.57 s
20 threads/10 s ramp period/10 repeats: request average 1.3 s
100 threads/10 s ramp period/10 repeats: request average 8 s

the last is a bit high, but not as high as 83 s. If you can provide
thequeries used should be helpful.

I send the "get" request like below:
http://clearinghouse.cisc.gmu.edu/srv/en/csw?Service=CSW&Version=2.0.2&Request=GetCapabilities

http://clearinghouse.cisc.gmu.edu/srv/en/csw?Service=CSW&Version=2.0.2&Request=GetRecordById&ID=USGS_Map_MF-2337

Also for 2.6 version (avalaible in august) some improvements in lucene
indexes have been done, so hopefully the search times will benefit.

I used the geonetwork(6059) from trunk and the lucene is 2.9.2.

Thanks and Best Regards,

Kai

Regards,
Jose Garcia

On Wed, Jun 23, 2010 at 10:29 PM, Kai Liu <kliu4@anonymised.com> wrote:

> Hi all,
> We set up a clearinghouse based on GeoNetwork opensource2.4.3 (
> http://clearinghouse.cisc.gmu.edu/geonetwork ). About 26K records
have> been ingested into the clearinghouse.
>
> When 1 CSW request was sent to the clearinghouse, the time to get
the> response is 2.6 seconds.
>
> Now I test the performance when 100 CSW requests are sent to the
> clearinghouse concurrently and get the average is 83 seconds.
>
> The average response time (83s) is much longer than sending only
1 request.
> Is it normal?
> Thanks,
>
>
> Kai
> Joint Center for Intelligent Spatial Computing
> 703-395-2337
>
>
> ------------------------------------------------------------------
------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit. See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> GeoNetwork-devel mailing list
> GeoNetwork-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
> GeoNetwork OpenSource is maintained at
> http://sourceforge.net/projects/geonetwork
>

Hi Kai

Server seem really good.

I ran a small tests with your requests: 1 thread running a GetCapabilities and GetRecordById repeated 10 times

and is surprising the time that takes the GetCapabilities request, about 2.8 s, compared to the GetRecordById about 0.3 s

I tried in other servers with 2.4.3 and 2.5 the GetCapabilities request and is taking about 0.1 s.

GetCapabilities time response seem depending on the number of metadata records in the catalog to fill the ows:Keywords section, that can explain the very long time that takes in your server. I’ll check how is implemented to try to optimize it.

One question more, do you run GeoNetwork in Tomcat or Jetty?

If running in tomcat check this: http://osgeo-org.1803224.n2.nabble.com/Re-GeoNetwork-users-Java-updates-td4935064.html to apply the fixes for saxon and jvm 1.6.0_20 that are applied by default when running in jetty.

Also please check the memory configuration for the jvm. At least, defaults for jetty are really low for the machine configuration you have (-Xms48m -Xmx512m).

Regards,
Jose Garcia

On Thu, Jun 24, 2010 at 1:46 AM, Kai Liu <kliu4@anonymised.com> wrote:

Can you provide more information about the server config and the
requestssend?

We stopped the geonetwork just now for another test.
Could you please do that test again?
Our server config as below:
CPU: 2*4 2.33G,
Memory: 16G,
OS: RED HAT Enterprise 4.0
Java version: 1.6.0_20

I tried with JMeter using this request with the server you provide:

<?xml version="1.0" encoding="UTF-8"?>

<env:Envelope xmlns:env=“http://www.w3.org/2003/05/soap-envelope”>
env:Body
<csw:GetRecords xmlns:csw=“http://www.opengis.net/cat/csw/2.0.2
xmlns:ogc=“http://www.opengis.net/ogc” maxRecords=“10”
outputFormat=“application/xml” outputSchema="
http://www.opengis.net/cat/csw/2.0.2"
resultType=“results” service=“CSW” version=“2.0.2”>
<csw:Query typeNames=“csw:Record”>
csw:ElementSetNamefull</csw:ElementSetName>
<csw:Constraint version=“1.1.0”>
<ogc:Filter
xmlns:gml=“http://www.opengis.net/gml”>
<ogc:PropertyIsLike escape=“" singleChar=”_" wildCard=“%”>
ogc:PropertyNameAnyText</ogc:PropertyName>
ogc:Literal%water%</ogc:Literal>
</ogc:PropertyIsLike>
</ogc:Filter>
</csw:Constraint>
</csw:Query>
</csw:GetRecords>
</env:Body>
</env:Envelope>

And got these results:

1 thread/40 repeats: request average 0.57 s
20 threads/10 s ramp period/10 repeats: request average 1.3 s
100 threads/10 s ramp period/10 repeats: request average 8 s

the last is a bit high, but not as high as 83 s. If you can provide
thequeries used should be helpful.

I send the “get” request like below:
http://clearinghouse.cisc.gmu.edu/srv/en/csw?Service=CSW&Version=2.0.2&Request=GetCapabilities

http://clearinghouse.cisc.gmu.edu/srv/en/csw?Service=CSW&Version=2.0.2&Request=GetRecordById&ID=USGS_Map_MF-2337

Also for 2.6 version (avalaible in august) some improvements in lucene
indexes have been done, so hopefully the search times will benefit.

I used the geonetwork(6059) from trunk and the lucene is 2.9.2.

Thanks and Best Regards,

Kai

Regards,
Jose Garcia

On Wed, Jun 23, 2010 at 10:29 PM, Kai Liu <kliu4@anonymised.com> wrote:

Hi all,
We set up a clearinghouse based on GeoNetwork opensource2.4.3 (
http://clearinghouse.cisc.gmu.edu/geonetwork ). About 26K records
have> been ingested into the clearinghouse.

When 1 CSW request was sent to the clearinghouse, the time to get
the> response is 2.6 seconds.

Now I test the performance when 100 CSW requests are sent to the
clearinghouse concurrently and get the average is 83 seconds.

The average response time (83s) is much longer than sending only
1 request.
Is it normal?
Thanks,

Kai
Joint Center for Intelligent Spatial Computing
703-395-2337



ThinkGeek and WIRED’s GeekDad team up for the Ultimate
GeekDad Father’s Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

Hi Kai and Jose,

Collating the keywords by frequency for getcapabilities can take a while if there are lots of keywords in your metadata (and I think the gcmd records Kai was using earlier had lots of keywords in each record!) and many thousands of metadata records. Just looking at the GetDomain.handlePropertyName method, it looks like we could speed things up by using a field selector to lighten the lucene document retrieval load and limiting the maximum number of records to process in a GetDomain operation (with a user configurable parameter)? (This is all much the same as what we already do for collating keywords by frequency when returning a search).

Cheers,
Simon
________________________________________
From: jose garcia [josegar74@anonymised.com]
Sent: Thursday, 24 June 2010 5:16 PM
To: Kai Liu
Cc: geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] Challenges of Geonetwork CSW performance for concurrent requests

Hi Kai

Server seem really good.

I ran a small tests with your requests: 1 thread running a GetCapabilities and GetRecordById repeated 10 times

and is surprising the time that takes the GetCapabilities request, about 2.8 s, compared to the GetRecordById about 0.3 s

I tried in other servers with 2.4.3 and 2.5 the GetCapabilities request and is taking about 0.1 s.

GetCapabilities time response seem depending on the number of metadata records in the catalog to fill the ows:Keywords section, that can explain the very long time that takes in your server. I'll check how is implemented to try to optimize it.

One question more, do you run GeoNetwork in Tomcat or Jetty?

If running in tomcat check this: http://osgeo-org.1803224.n2.nabble.com/Re-GeoNetwork-users-Java-updates-td4935064.html to apply the fixes for saxon and jvm 1.6.0_20 that are applied by default when running in jetty.

Also please check the memory configuration for the jvm. At least, defaults for jetty are really low for the machine configuration you have (-Xms48m -Xmx512m).

Regards,
Jose Garcia

On Thu, Jun 24, 2010 at 1:46 AM, Kai Liu <kliu4@anonymised.com<mailto:kliu4@anonymised.com25…>> wrote:

Can you provide more information about the server config and the
requestssend?

We stopped the geonetwork just now for another test.
Could you please do that test again?
Our server config as below:
CPU: 2*4 2.33G,
Memory: 16G,
OS: RED HAT Enterprise 4.0
Java version: 1.6.0_20

I tried with JMeter using this request with the server you provide:

<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope&quot;&gt;
<env:Body>
   <csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2&quot;
xmlns:ogc="http://www.opengis.net/ogc&quot; maxRecords="10"
        outputFormat="application/xml" outputSchema="
http://www.opengis.net/cat/csw/2.0.2&quot;
        resultType="results" service="CSW" version="2.0.2">
     <csw:Query typeNames="csw:Record">
       <csw:ElementSetName>full</csw:ElementSetName>
       <csw:Constraint version="1.1.0">
         <ogc:Filter
             xmlns:gml="http://www.opengis.net/gml&quot;&gt;
       <ogc:PropertyIsLike escape="\" singleChar="_" wildCard="%">
         <ogc:PropertyName>AnyText</ogc:PropertyName>
         <ogc:Literal>%water%</ogc:Literal>
       </ogc:PropertyIsLike>
         </ogc:Filter>
       </csw:Constraint>
     </csw:Query>
   </csw:GetRecords>
</env:Body>
</env:Envelope>

And got these results:

1 thread/40 repeats: request average 0.57 s
20 threads/10 s ramp period/10 repeats: request average 1.3 s
100 threads/10 s ramp period/10 repeats: request average 8 s

the last is a bit high, but not as high as 83 s. If you can provide
thequeries used should be helpful.

I send the "get" request like below:
http://clearinghouse.cisc.gmu.edu/srv/en/csw?Service=CSW&Version=2.0.2&Request=GetCapabilities

http://clearinghouse.cisc.gmu.edu/srv/en/csw?Service=CSW&Version=2.0.2&Request=GetRecordById&ID=USGS_Map_MF-2337

Also for 2.6 version (avalaible in august) some improvements in lucene
indexes have been done, so hopefully the search times will benefit.

I used the geonetwork(6059) from trunk and the lucene is 2.9.2.

Thanks and Best Regards,

Kai

Regards,
Jose Garcia

On Wed, Jun 23, 2010 at 10:29 PM, Kai Liu <kliu4@anonymised.com<mailto:kliu4@anonymised.com…525…>> wrote:

> Hi all,
> We set up a clearinghouse based on GeoNetwork opensource2.4.3 (
> http://clearinghouse.cisc.gmu.edu/geonetwork ). About 26K records
have> been ingested into the clearinghouse.
>
> When 1 CSW request was sent to the clearinghouse, the time to get
the> response is 2.6 seconds.
>
> Now I test the performance when 100 CSW requests are sent to the
> clearinghouse concurrently and get the average is 83 seconds.
>
> The average response time (83s) is much longer than sending only
1 request.
> Is it normal?
> Thanks,
>
>
> Kai
> Joint Center for Intelligent Spatial Computing
> 703-395-2337
>
>
> ------------------------------------------------------------------
------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit. See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> GeoNetwork-devel mailing list
> GeoNetwork-devel@lists.sourceforge.net<mailto:GeoNetwork-devel@anonymised.comurceforge.net>
> geonetwork-devel List Signup and Options
> GeoNetwork OpenSource is maintained at
> http://sourceforge.net/projects/geonetwork
>