[GeoNetwork-users] Geonetwork's data folder growing out of proportion

I wonder if anyone has noticed how quickly the data folder in the geonetwork folder grows over time:
our installation is online just a little under a year, we only host less than a dozen metadata records, yet the
data folder has grown to contain literally tens of thousands of files and folders (and subfolders), for a
total size of 2 GB.

I realize that some of the hosted data is actually stored in those folders with increasing serial numbers under
GEONETWORK_DIR\data (e.g. 01231-21345) however there's no justification for such an ever-increasing s
pace hog, especially when no new data has been explicitly added. No matter what, however, these seems to be a new
(or more) such folder being added everyday.

So far, I've seen no obvious functionality or guideline as to what the purpose of these folders is (considering that
metadata is supposed to be stored in the relational database) and which of these folders are safe to delete and
which are not. Some are obviously just cached thumbnails, but others seem to contain actual attachments to
metadata content (we have some downloads and custom thumbnails that go with certain metadata records, and
those surely are not in the DB).

Having tens of thousands of files is really problematic with any filesystem, and just makes backup slower.
Disk space usage and fragmentation over an extended period of time is also an issue, as there seems to be no upper
bound to the growth, so given enough time geonetwork will eventually consume all disk space available or flood
the filesystem with hundreds of thousands if not millions of files.

Is keeping all that data around (most seem to be cached .png files anyway) really necessary?
Is there some function to automatically clean up or a configuration option limit the proliferation size?

BTW, I'm using Geonetwork 2.4.3. Has this behavior been significantly changed in the 2.6.x branch?

Thanks in advance for any replies,
Victor Epitropou

Something that got fixed early on in 2.6:
- Fix #250: Removing old thumbnail for OGC harvester.

We suffered this too. We had 44,000 thumbnails from our 9 harvesters. It is fixed in 2.6 - but I never worked out a good way of 'cleaning out the leftovers' (ie - deleting the unused metadata dirs.)

If you do - please let me know.

Terry

-----Original Message-----
From: Victor Epitropou [mailto:vepitrop@anonymised.com]
Sent: Tuesday, 29 March 2011 4:28 PM
To: geonetwork-users@lists.sourceforge.net
Subject: [GeoNetwork-users] Geonetwork's data folder growing out of proportion

I wonder if anyone has noticed how quickly the data folder in the
geonetwork folder grows over time:
our installation is online just a little under a year, we only host less
than a dozen metadata records, yet the
data folder has grown to contain literally tens of thousands of files
and folders (and subfolders), for a
total size of 2 GB.

I realize that some of the hosted data is actually stored in those
folders with increasing serial numbers under
GEONETWORK_DIR\data (e.g. 01231-21345) however there's no justification
for such an ever-increasing s
pace hog, especially when no new data has been explicitly added. No
matter what, however, these seems to be a new
(or more) such folder being added everyday.

So far, I've seen no obvious functionality or guideline as to what the
purpose of these folders is (considering that
metadata is supposed to be stored in the relational database) and which
of these folders are safe to delete and
which are not. Some are obviously just cached thumbnails, but others
seem to contain actual attachments to
metadata content (we have some downloads and custom thumbnails that go
with certain metadata records, and
those surely are not in the DB).

Having tens of thousands of files is really problematic with any
filesystem, and just makes backup slower.
Disk space usage and fragmentation over an extended period of time is
also an issue, as there seems to be no upper
bound to the growth, so given enough time geonetwork will eventually
consume all disk space available or flood
the filesystem with hundreds of thousands if not millions of files.

Is keeping all that data around (most seem to be cached .png files
anyway) really necessary?
Is there some function to automatically clean up or a configuration
option limit the proliferation size?

BTW, I'm using Geonetwork 2.4.3. Has this behavior been significantly
changed in the 2.6.x branch?

Thanks in advance for any replies,
Victor Epitropou

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork