I wonder if anyone has noticed how quickly the data folder in the geonetwork folder grows over time:
our installation is online just a little under a year, we only host less than a dozen metadata records, yet the
data folder has grown to contain literally tens of thousands of files and folders (and subfolders), for a
total size of 2 GB.
I realize that some of the hosted data is actually stored in those folders with increasing serial numbers under
GEONETWORK_DIR\data (e.g. 01231-21345) however there's no justification for such an ever-increasing s
pace hog, especially when no new data has been explicitly added. No matter what, however, these seems to be a new
(or more) such folder being added everyday.
So far, I've seen no obvious functionality or guideline as to what the purpose of these folders is (considering that
metadata is supposed to be stored in the relational database) and which of these folders are safe to delete and
which are not. Some are obviously just cached thumbnails, but others seem to contain actual attachments to
metadata content (we have some downloads and custom thumbnails that go with certain metadata records, and
those surely are not in the DB).
Having tens of thousands of files is really problematic with any filesystem, and just makes backup slower.
Disk space usage and fragmentation over an extended period of time is also an issue, as there seems to be no upper
bound to the growth, so given enough time geonetwork will eventually consume all disk space available or flood
the filesystem with hundreds of thousands if not millions of files.
Is keeping all that data around (most seem to be cached .png files anyway) really necessary?
Is there some function to automatically clean up or a configuration option limit the proliferation size?
BTW, I'm using Geonetwork 2.4.3. Has this behavior been significantly changed in the 2.6.x branch?
Thanks in advance for any replies,
Victor Epitropou