[GeoNetwork-users] Geosource/Geonetwork robustness

Someone would have tested the robustness of GEOSOURCE v2.1 or Geonetwork to
manage many metadatas (20000 or 30000 metadatas) ?

I think that Geosource tries and fails to generate the lucene index (I
waited over 40 minutes ...).
I insert metadata directly into postgres without using the entry forms. So,
the index's generation is required.

Have you experienced about this problem ?

Geonetwork accepts to manage many metadata (more than geosource) ?
The use of parent/child metadatas can be a solution for solving this problem
?

--
View this message in context: http://n2.nabble.com/Geosource-Geonetwork-robustness-tp3092808p3092808.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

As a test, I put over 50,000 records into Geonetwork (caveat: these were just duplicates of a dozen or so different records). It indexed and searched them with very little trouble. As the numbers got into the tens of thousands, the Search Results page rendered slowly (10-20 seconds) -- probably because it uses XSLT on a very big XML stream. But overall, quite usable, especially with targeted searches returning only a few results.

--Rich

Richard Fozzard, Computer Scientist
  Geospatial Metadata at NGDC: http://www.ngdc.noaa.gov/metadata

Cooperative Institute for Research in Environmental Sciences (CIRES)
Univ. Colorado & NOAA National Geophysical Data Center, Enterprise Data Systems 325 S. Broadway, Skaggs 1B-305, Boulder, CO 80305
Office: 303-497-6487, Cell: 303-579-5615, Email: richard.fozzard@anonymised.com

af65 said the following on 06/17/2009 07:33 AM:

Someone would have tested the robustness of GEOSOURCE v2.1 or Geonetwork to
manage many metadatas (20000 or 30000 metadatas) ?

I think that Geosource tries and fails to generate the lucene index (I
waited over 40 minutes ...).
I insert metadata directly into postgres without using the entry forms. So,
the index's generation is required.

Have you experienced about this problem ?

Geonetwork accepts to manage many metadata (more than geosource) ?
The use of parent/child metadatas can be a solution for solving this problem
?

what do you say by "these were just duplicates of a dozen or so different
records" ?
you talk about a parent/child metadata ?
Perhaps my problem is due because I have 20,000 same data (exception of one
attribute) ?
If I create a parent data and 20,000 children it does it better ?

Alex

Richard Fozzard wrote:

As a test, I put over 50,000 records into Geonetwork (caveat: these were
just duplicates of a dozen or so different records). It indexed and
searched them with very little trouble. As the numbers got into the tens
of thousands, the Search Results page rendered slowly (10-20 seconds) --
probably because it uses XSLT on a very big XML stream. But overall,
quite usable, especially with targeted searches returning only a few
results.

--Rich

Richard Fozzard, Computer Scientist
  Geospatial Metadata at NGDC: http://www.ngdc.noaa.gov/metadata

Cooperative Institute for Research in Environmental Sciences (CIRES)
Univ. Colorado & NOAA National Geophysical Data Center, Enterprise Data
Systems
325 S. Broadway, Skaggs 1B-305, Boulder, CO 80305
Office: 303-497-6487, Cell: 303-579-5615, Email: richard.fozzard@anonymised.com

af65 said the following on 06/17/2009 07:33 AM:

Someone would have tested the robustness of GEOSOURCE v2.1 or Geonetwork
to
manage many metadatas (20000 or 30000 metadatas) ?

I think that Geosource tries and fails to generate the lucene index (I
waited over 40 minutes ...).
I insert metadata directly into postgres without using the entry forms.
So,
the index's generation is required.

Have you experienced about this problem ?

Geonetwork accepts to manage many metadata (more than geosource) ?
The use of parent/child metadatas can be a solution for solving this
problem
?

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
View this message in context: http://n2.nabble.com/Geosource-Geonetwork-robustness-tp3092808p3095120.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

Hello Alex

2009/6/17 af65 <af65@anonymised.com>:

Someone would have tested the robustness of GEOSOURCE v2.1 or Geonetwork to
manage many metadatas (20000 or 30000 metadatas) ?

As mentionned by Richard some nodes are running well with quite a lot
of records. You could also use GAST to load thousand of metadata
records in your node for testing.

I think that Geosource tries and fails to generate the lucene index (I
waited over 40 minutes ...).

Sounds quite a long time :slight_smile: Any errors or info in log files ?
In GéoSource we noticed some issue when harvesting some records from
the national metadata catalogue in France due to the use of gmx:Anchor
elements having xlink not supported by the xlink resolver available in
GéoSource. Next release of GéoSource will fix that issue (comming
soon, next week). We've been working on adding cache mechanism to the
xlink resolver to speed up things. You should try next release.

I insert metadata directly into postgres without using the entry forms. So,
the index's generation is required.

Have you experienced about this problem ?

Geonetwork accepts to manage many metadata (more than geosource) ?

I discuss that point with Heikki last week, we could probably improve
search speed, storing some more information in the index and use it
accordingly. Index update could also be improved. Maybe a point to be
discussed next week in Bolsena.

The use of parent/child metadatas can be a solution for solving this problem
?

Not sure what you're trying to achieve ?

Ciao.
Francois

for the generation of these datas, I knew only one position xy. I applied
this to the bbox (xmin = xmax and ymin = ymax).
The fact that I didn't know the position of the opposite point and I
repeated the same point can not have generated this problem ?

Thanks

Francois Prunayre wrote:

Hello Alex

2009/6/17 af65 <af65@anonymised.com>:

Someone would have tested the robustness of GEOSOURCE v2.1 or Geonetwork
to
manage many metadatas (20000 or 30000 metadatas) ?

As mentionned by Richard some nodes are running well with quite a lot
of records. You could also use GAST to load thousand of metadata
records in your node for testing.

I think that Geosource tries and fails to generate the lucene index (I
waited over 40 minutes ...).

Sounds quite a long time :slight_smile: Any errors or info in log files ?
In GéoSource we noticed some issue when harvesting some records from
the national metadata catalogue in France due to the use of gmx:Anchor
elements having xlink not supported by the xlink resolver available in
GéoSource. Next release of GéoSource will fix that issue (comming
soon, next week). We've been working on adding cache mechanism to the
xlink resolver to speed up things. You should try next release.

I insert metadata directly into postgres without using the entry forms.
So,
the index's generation is required.

Have you experienced about this problem ?

Geonetwork accepts to manage many metadata (more than geosource) ?

I discuss that point with Heikki last week, we could probably improve
search speed, storing some more information in the index and use it
accordingly. Index update could also be improved. Maybe a point to be
discussed next week in Bolsena.

The use of parent/child metadatas can be a solution for solving this
problem
?

Not sure what you're trying to achieve ?

Ciao.
Francois

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
View this message in context: http://n2.nabble.com/Geosource-Geonetwork-robustness-tp3092808p3114518.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

I conducted some tests because I observed that the creation of lucene index
was very long.

For these tests, I used GAST and procedures import (Corine Landcover
example):
1 Metadatabase indexing take 27 sec.
100 Metadatabases 97 sec.
1000 Metadatabases 10 min and 47 sec !!

These tests were conducted on a PC desktop with 1GB of RAM. On a dedicated
server, the creation of the index would probably be accelerated, but these
comments are surprising ...

Am I the only one to have noticed this problem and if so, do you think my
hardware has a failure (since I did not observe any significant delays in
use with other applications)? Other people could conduct this test to
compare the computing time?

Thanks

af65 wrote:

for the generation of these datas, I knew only one position xy. I applied
this to the bbox (xmin = xmax and ymin = ymax).
The fact that I didn't know the position of the opposite point and I
repeated the same point can not have generated this problem ?

Thanks

Francois Prunayre wrote:

Hello Alex

2009/6/17 af65 <af65@anonymised.com>:

Someone would have tested the robustness of GEOSOURCE v2.1 or Geonetwork
to
manage many metadatas (20000 or 30000 metadatas) ?

As mentionned by Richard some nodes are running well with quite a lot
of records. You could also use GAST to load thousand of metadata
records in your node for testing.

I think that Geosource tries and fails to generate the lucene index (I
waited over 40 minutes ...).

Sounds quite a long time :slight_smile: Any errors or info in log files ?
In GéoSource we noticed some issue when harvesting some records from
the national metadata catalogue in France due to the use of gmx:Anchor
elements having xlink not supported by the xlink resolver available in
GéoSource. Next release of GéoSource will fix that issue (comming
soon, next week). We've been working on adding cache mechanism to the
xlink resolver to speed up things. You should try next release.

I insert metadata directly into postgres without using the entry forms.
So,
the index's generation is required.

Have you experienced about this problem ?

Geonetwork accepts to manage many metadata (more than geosource) ?

I discuss that point with Heikki last week, we could probably improve
search speed, storing some more information in the index and use it
accordingly. Index update could also be improved. Maybe a point to be
discussed next week in Bolsena.

The use of parent/child metadatas can be a solution for solving this
problem
?

Not sure what you're trying to achieve ?

Ciao.
Francois

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
View this message in context: http://n2.nabble.com/Geosource-Geonetwork-robustness-tp3092808p3261333.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

Hello,

2009/7/15 af65 <af65@anonymised.com>:

I conducted some tests because I observed that the creation of lucene index
was very long.

For these tests, I used GAST and procedures import (Corine Landcover
example):
1 Metadatabase indexing take 27 sec.
100 Metadatabases 97 sec.
1000 Metadatabases 10 min and 47 sec !!

GAST import is not only indexing time. It connects to the catalogue,
send the document, which is indexed, parsed and stored in the db. If
importing a MEF file, it will also set privileges, thumbnails, data
according to ZIP file content.

These tests were conducted on a PC desktop with 1GB of RAM. On a dedicated
server, the creation of the index would probably be accelerated, but these
comments are surprising ...

Am I the only one to have noticed this problem and if so, do you think my
hardware has a failure (since I did not observe any significant delays in
use with other applications)? Other people could conduct this test to
compare the computing time?

GeoNetwork or GéoSource usually does not index every records at the
same time (only on update/save operation and harvesting). Only if your
lucene index crash or if lucene index is missing will force a complete
index creation. In 2.4, lucene index creation is probably a bit slower
due to spatial index creation which was not there in 2.2.

HTH. Francois

Temporal elements were analyzed from the log files ...
So, I think I have excluded other expectations related to the insert/update
(privileges ...).

When I start the import module, it is a new data so the index is
automatically recalculated (I'm use the geosource v2.2) ?
  
You don't find that times listed are very long (about 1 hour for 10000 data
!!) ?
What could I change ?
  
I manage the metadata from another system that import the data directly in
the Postgres database Géosource. As it is new data, I have to recalculate
the index .... and it takes hours !!

Can you give me an another solution ? Can I cancel these indexes ?

Thanks

Francois Prunayre wrote:

Hello,

2009/7/15 af65 <af65@anonymised.com>:

I conducted some tests because I observed that the creation of lucene
index
was very long.

For these tests, I used GAST and procedures import (Corine Landcover
example):
1 Metadatabase indexing take 27 sec.
100 Metadatabases 97 sec.
1000 Metadatabases 10 min and 47 sec !!

GAST import is not only indexing time. It connects to the catalogue,
send the document, which is indexed, parsed and stored in the db. If
importing a MEF file, it will also set privileges, thumbnails, data
according to ZIP file content.

These tests were conducted on a PC desktop with 1GB of RAM. On a
dedicated
server, the creation of the index would probably be accelerated, but
these
comments are surprising ...

Am I the only one to have noticed this problem and if so, do you think my
hardware has a failure (since I did not observe any significant delays in
use with other applications)? Other people could conduct this test to
compare the computing time?

GeoNetwork or GéoSource usually does not index every records at the
same time (only on update/save operation and harvesting). Only if your
lucene index crash or if lucene index is missing will force a complete
index creation. In 2.4, lucene index creation is probably a bit slower
due to spatial index creation which was not there in 2.2.

HTH. Francois

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full
prize
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
View this message in context: http://n2.nabble.com/Geosource-Geonetwork-robustness-tp3092808p3263144.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

Hi,
The subject of the message does not really seem to relate to the problem it
seems :wink:
On the time it takes to rebuild the indexes, that seems very long to me. I
used to rebuild indexes for about 20000 records in less than 10 minutes.
You can indeed just remove the index files and on start up the system will
automatically rebuild the indexes for you. v2.4 offers a special function
to trigger this rebuilding in the Administration interface.
Ciao,
Jeroen

On Wed, 15 Jul 2009 06:20:57 -0700 (PDT), af65 <af65@anonymised.com> wrote:

Temporal elements were analyzed from the log files ...
So, I think I have excluded other expectations related to the

insert/update

(privileges ...).

When I start the import module, it is a new data so the index is
automatically recalculated (I'm use the geosource v2.2) ?
  
You don't find that times listed are very long (about 1 hour for 10000

data

!!) ?
What could I change ?
  
I manage the metadata from another system that import the data directly

in

the Postgres database Géosource. As it is new data, I have to

recalculate

the index .... and it takes hours !!

Can you give me an another solution ? Can I cancel these indexes ?

Thanks

Francois Prunayre wrote:

Hello,

2009/7/15 af65 <af65@anonymised.com>:

I conducted some tests because I observed that the creation of lucene
index
was very long.

For these tests, I used GAST and procedures import (Corine Landcover
example):
1 Metadatabase indexing take 27 sec.
100 Metadatabases 97 sec.
1000 Metadatabases 10 min and 47 sec !!

GAST import is not only indexing time. It connects to the catalogue,
send the document, which is indexed, parsed and stored in the db. If
importing a MEF file, it will also set privileges, thumbnails, data
according to ZIP file content.

These tests were conducted on a PC desktop with 1GB of RAM. On a
dedicated
server, the creation of the index would probably be accelerated, but
these
comments are surprising ...

Am I the only one to have noticed this problem and if so, do you think
my
hardware has a failure (since I did not observe any significant delays
in
use with other applications)? Other people could conduct this test to
compare the computing time?

GeoNetwork or GéoSource usually does not index every records at the
same time (only on update/save operation and harvesting). Only if your
lucene index crash or if lucene index is missing will force a complete
index creation. In 2.4, lucene index creation is probably a bit slower
due to spatial index creation which was not there in 2.2.

HTH. Francois

------------------------------------------------------------------------------

Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,

vendors submitting new applications to BlackBerry App World(TM) will

have

the opportunity to enter the BlackBerry Developer Challenge. See full
prize
details at: Best Open Source Mac Front-Ends 2024
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
geonetwork-users List Signup and Options
GeoNetwork OpenSource is maintained at
GeoNetwork - Geographic Metadata Catalog download | SourceForge.net

The subject of the message does not really seem to relate to the problem it
seems :wink:

yes :wink: but at first I thought the problem was an inability to manage lot of
data ....

I know this option to trigger this rebuilding in the Administration
interface (which is already present in the 2.2) and I also tried to delete
all files before running service ...

But, rebuilding are very long ...

With my computer, the rebuilding of 20000 data would take about 80 minutes
... against less than 10 for you !! What is my problem ? My computer is so
bad ? Can you give me your configuration ? Do you think that a dedicated
server will accelerate the building ?

Jeroen Ticheler - GeoCat wrote:

Hi,
The subject of the message does not really seem to relate to the problem
it
seems :wink:
On the time it takes to rebuild the indexes, that seems very long to me. I
used to rebuild indexes for about 20000 records in less than 10 minutes.
You can indeed just remove the index files and on start up the system will
automatically rebuild the indexes for you. v2.4 offers a special function
to trigger this rebuilding in the Administration interface.
Ciao,
Jeroen

On Wed, 15 Jul 2009 06:20:57 -0700 (PDT), af65 <af65@anonymised.com> wrote:

Temporal elements were analyzed from the log files ...
So, I think I have excluded other expectations related to the

insert/update

(privileges ...).

When I start the import module, it is a new data so the index is
automatically recalculated (I'm use the geosource v2.2) ?
  
You don't find that times listed are very long (about 1 hour for 10000

data

!!) ?
What could I change ?
  
I manage the metadata from another system that import the data directly

in

the Postgres database Géosource. As it is new data, I have to

recalculate

the index .... and it takes hours !!

Can you give me an another solution ? Can I cancel these indexes ?

Thanks

Francois Prunayre wrote:

Hello,

2009/7/15 af65 <af65@anonymised.com>:

I conducted some tests because I observed that the creation of lucene
index
was very long.

For these tests, I used GAST and procedures import (Corine Landcover
example):
1 Metadatabase indexing take 27 sec.
100 Metadatabases 97 sec.
1000 Metadatabases 10 min and 47 sec !!

GAST import is not only indexing time. It connects to the catalogue,
send the document, which is indexed, parsed and stored in the db. If
importing a MEF file, it will also set privileges, thumbnails, data
according to ZIP file content.

These tests were conducted on a PC desktop with 1GB of RAM. On a
dedicated
server, the creation of the index would probably be accelerated, but
these
comments are surprising ...

Am I the only one to have noticed this problem and if so, do you think
my
hardware has a failure (since I did not observe any significant delays
in
use with other applications)? Other people could conduct this test to
compare the computing time?

GeoNetwork or GéoSource usually does not index every records at the
same time (only on update/save operation and harvesting). Only if your
lucene index crash or if lucene index is missing will force a complete
index creation. In 2.4, lucene index creation is probably a bit slower
due to spatial index creation which was not there in 2.2.

HTH. Francois

------------------------------------------------------------------------------

Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,

vendors submitting new applications to BlackBerry App World(TM) will

have

the opportunity to enter the BlackBerry Developer Challenge. See full
prize
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full
prize
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
View this message in context: http://n2.nabble.com/Geosource-Geonetwork-robustness-tp3092808p3263503.html
Sent from the geonetwork-users mailing list archive at Nabble.com.