[GRASS-dev] which files are mandatory for raster maps

Hello,

I'm working with a large batch of raster maps (4550 * 4 bands) and including all the auxiliary files this explodes my inode quota on our university's HPC system. While waiting to see if they can increase this, I was wondering which of the auxiliary files are absolutely mandatory.

Can I just erase colr/* and hist/* for example ?

Moritz

Hi Moritz,

On Tue, Apr 16, 2019 at 2:56 PM Moritz Lennert
<mlennert@club.worldonline.be> wrote:

Hello,

I'm working with a large batch of raster maps (4550 * 4 bands) and
including all the auxiliary files this explodes my inode quota on our
university's HPC system.

Out of curiosity, which file system is that?

While waiting to see if they can increase this,
I was wondering which of the auxiliary files are absolutely mandatory.

Can I just erase colr/* and hist/* for example ?

I just tried:

r.info elevation
WARNING: Unable to get history information for <elevation@>
...
[rest as usual]

So, it should not be a problem.

# idea r.external:
I just checked, it seems that r.external generates the same amount of
files, hence not helpful in this regard.

# idea find directories with most inode consumption (source [1]):
for i in `ls -1A | grep -v "\.\./" | grep -v "\./"`; do echo "`find $i
| sort -u | wc -l` $i"; done | sort -rn | head -10

Maybe you can still delete something?

Markus

[1] https://talk.plesk.com/threads/inode-full-no-more-space.296442/

PS: it happened to me in the past when exporting XFS over NFS (while
it worked ok inside the HPC system). In the end I had to reformat the
disk array...

* Moritz Lennert <mlennert@club.worldonline.be> [2019-04-16 14:56:02 +0200]:

Hello,

I'm working with a large batch of raster maps (4550 * 4 bands) and including all the auxiliary files this explodes my inode quota on our university's HPC system. While waiting to see if they can increase this, I was wondering which of the auxiliary files are absolutely mandatory.

Can I just erase colr/* and hist/* for example ?

Related:

- https://lists.osgeo.org/pipermail/grass-dev/2018-May/088463.html
- https://gitlab.com/NikosAlexandris/r.internal.sh
    (see in partucilar
    https://gitlab.com/NikosAlexandris/r.internal.sh/blob/5aea42f14c0701938270c4787794f408ee046dfe/r.internal#L99)

I would be nice to know which are the minimum requirements for a raster
map. And nicer to convert `r.internal.sh` into a Python script.

Nikos

On 16/04/19 18:10, Markus Neteler wrote:

Hi Moritz,

On Tue, Apr 16, 2019 at 2:56 PM Moritz Lennert
<mlennert@club.worldonline.be> wrote:

Hello,

I'm working with a large batch of raster maps (4550 * 4 bands) and
including all the auxiliary files this explodes my inode quota on our
university's HPC system.

Out of curiosity, which file system is that?

GPFS AFAIK.

But it is not an intrinsic limit to the system. It's them limiting individual access in order to be sure to be able to provide services to the potentionally thousands of users.

While waiting to see if they can increase this,
I was wondering which of the auxiliary files are absolutely mandatory.

Can I just erase colr/* and hist/* for example ?

I just tried:

r.info elevation
WARNING: Unable to get history information for <elevation@>
...
[rest as usual]

So, it should not be a problem.

# idea r.external:
I just checked, it seems that r.external generates the same amount of
files, hence not helpful in this regard.

# idea find directories with most inode consumption (source [1]):
for i in `ls -1A | grep -v "\.\./" | grep -v "\./"`; do echo "`find $i
| sort -u | wc -l` $i"; done | sort -rn | head -10

Well, most are clearly in cell_misc as there are three files per raster map. But I do think these are necessary.

Maybe you can still delete something?

At this stage they've raised my quota, so I can go on working for now, but I'll explore this when I have the time.

Markus

PS: it happened to me in the past when exporting XFS over NFS (while
it worked ok inside the HPC system). In the end I had to reformat the
disk array...

I have to admit that I don't have the knowledge of exactly the different parts of the HPC system are connected.

Thanks for the help !

Moritz

Hi,

The small auxiliary files are also causing trouble on the NAS filers (NFS3 mounted network attached storage with SSD or Hybrid disk systems) where my IT department wants us to store (and process) GRASS data.

Especially for time series data with literally thousands of maps performance decreases significantly.

Would it be possible to reduce the number of files for native GRASS raster format in GRASS 8 (e.g.writing color tables or other metadata into a kind of header?

Cheers,

Stefan


From: grass-dev grass-dev-bounces@lists.osgeo.org on behalf of Moritz Lennert mlennert@club.worldonline.be
Sent: Wednesday, April 17, 2019 2:03:50 PM
To: Markus Neteler
Cc: GRASS developers list
Subject: Re: [GRASS-dev] which files are mandatory for raster maps

On 16/04/19 18:10, Markus Neteler wrote:

Hi Moritz,

On Tue, Apr 16, 2019 at 2:56 PM Moritz Lennert
mlennert@club.worldonline.be wrote:

Hello,

I’m working with a large batch of raster maps (4550 * 4 bands) and
including all the auxiliary files this explodes my inode quota on our
university’s HPC system.

Out of curiosity, which file system is that?

GPFS AFAIK.

But it is not an intrinsic limit to the system. It’s them limiting
individual access in order to be sure to be able to provide services to
the potentionally thousands of users.

While waiting to see if they can increase this,
I was wondering which of the auxiliary files are absolutely mandatory.

Can I just erase colr/* and hist/* for example ?

I just tried:

r.info elevation
WARNING: Unable to get history information for <elevation@>

[rest as usual]

So, it should not be a problem.

idea r.external:

I just checked, it seems that r.external generates the same amount of
files, hence not helpful in this regard.

idea find directories with most inode consumption (source [1]):

for i in ls -1A | grep -v "\.\./" | grep -v "\./"; do echo “find $i | sort -u | wc -l $i”; done | sort -rn | head -10

Well, most are clearly in cell_misc as there are three files per raster
map. But I do think these are necessary.

Maybe you can still delete something?

At this stage they’ve raised my quota, so I can go on working for now,
but I’ll explore this when I have the time.

Markus

PS: it happened to me in the past when exporting XFS over NFS (while
it worked ok inside the HPC system). In the end I had to reformat the
disk array…

I have to admit that I don’t have the knowledge of exactly the different
parts of the HPC system are connected.

Thanks for the help !

Moritz


grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev