[GRASS-dev] [RFC] Glossary: GISDATABASE -> DATASTORE

Hello,

I plan to make the following changes in the variables names in KerGIS.
Since we do share the same origin and since it will be suboptimal to
have distinct naming schemes, I'd like to hear your reactions about the
following choices.

Thanks in advance.

Glossary proposal for gis "database" naming scheme
--------------------------------------------------

A database is an organized set of logically related data. A table is a
set of instances of elements composed of a fixed number of sub-elements
(fields).

Historically, in CERL GRASS derived systems, GISDATABASE has been the
name of the variable holding the value of the pathname to a directory
where GRASS data is stored.

There is absolutely no question whether some parts of the gis data
stored are databases: there are. But this is not what is designated by
GISDATABASE. The real database level is the LOCATION, where all the data
is logically related at least by the REGION. The MAPSETs are the tables.

Hence the use of GISDATABASE is misleading in several ways:

1) The directory is not a database (databases are sub-directories);

2) This is not _the_ GISDATABASE since there may be many ones (contrary
to _the_ GISBASE where _this_ version of the gis system is put);

3) Since, for non geometrical attributes, other types of RDBMs are used,
there is some confusion or at least some implicit assumptions about what
a "database" is.

Hence I propose to replace GISDATABASE by: DATASTORE.

Not GISDATASTORE to avoid the assumption about the uniqueness (the
GISBASE, but one of several DATASTORE; we are in the gis, so this shall
be taken for granted; furthermore GISDATABASE is a gis environment
variable, not a system host one [contrary to GISBASE]; so this make
sense).

The SQL terminology (even if the CERL GRASS databases [LOCATIONs] have
nothing to do with SQL, but _are_ databases, I looked for "prior art"):
catalog is not widely used, and, for me, does not hold the correct
meaning: a catalog is the listing of what is in a store, not the store
itself.

The comparison with the SQL terminology, and a RDBM (PostgreSQL) can
share some supplementary light (this is a naming comparison; it should
not be pushed to far):

SQL PostgreSQL GRASS/KerGIS

cluster cluster cluster

catalog database cluster DATASTORE

schema database LOCATION

object tables/views/routines MAPSET

Note1: someone wrote that one of the "out of fashion" aspect of CERL
GRASS derived systems was, with the use of not in the mood programming
languages, the fact that the data is stored in a file hierarchy. Well
that is exactly how PostgreSQL, for example, does it and I do not see
why this should be plagued as a bad choice (for example allowing to
dedicate some chunk of a disk with a size allowing backup [matching
backup capabilities], and using Unix access permissions, or, if usable,
ACLs to manage access).

Note 2: in CERL GRASS, the gis environment variable LOCATION_NAME was
used for the "location", while LOCATION was set to the full path. I have
found that indeed, for users, LOCATION should be set to the location
name, symetric with the use of MAPSET. The full pathname is only used in
scripts, since the gis has dedicated functions to precisely find a name
in its databases or tables (locations and mapsets). Any comments?
--
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
                 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C

Thierry wrote:

Glossary proposal for gis "database" naming scheme

..

Hence I propose to replace GISDATABASE by: DATASTORE.

DATASTORE is a fine suggestion, the best I've heard so far. See also
email threads discussing the wording of the start-up mapset-picker menu.

Note 2: in CERL GRASS, the gis environment variable LOCATION_NAME was
used for the "location", while LOCATION was set to the full path. I
have found that indeed, for users, LOCATION should be set to the
location name, symetric with the use of MAPSET. The full pathname is
only used in scripts, since the gis has dedicated functions to
precisely find a name in its databases or tables (locations and
mapsets). Any comments? --

In GPL GRASS 6 LOCATION_NAME is still a GIS variable (stored in
~/.grassrc6 and not a shell variable*) and viewed/changed with the
g.gisenv module. LOCATION_NAME is just the name, not the path.
GISDBASE has the path.

The shell variable $LOCATION has been abandoned, but a number of scripts
do this to recreate it: (due to path of least modification after
$LOCATION was dropped)

eval `g.gisenv`
: ${GISBASE?} ${GISDBASE?} ${LOCATION_NAME?} ${MAPSET?}
LOCATION="$GISDBASE"/"$LOCATION_NAME"/"$MAPSET"

[*] not a shell variable to allow child process (the GUI menus) to
adjust variables which affect the parent, and thus new non-derivative
sibling processes. [e.g. switch mapsets without exiting GRASS]

Hamish

On Mon, Mar 05, 2007 at 12:11:13PM +1300, Hamish wrote:

Thierry wrote:
> Glossary proposal for gis "database" naming scheme
..
> Hence I propose to replace GISDATABASE by: DATASTORE.

DATASTORE is a fine suggestion, the best I've heard so far. See also
email threads discussing the wording of the start-up mapset-picker menu.

Thanks for the feed-back, Hamish. I will give a look at the thread you
are mentioning.

[about LOCATION_NAME and LOCATION]

In GPL GRASS 6 LOCATION_NAME is still a GIS variable (stored in
~/.grassrc6 and not a shell variable*) and viewed/changed with the
g.gisenv module. LOCATION_NAME is just the name, not the path.
GISDBASE has the path.

The shell variable $LOCATION has been abandoned, but a number of scripts
do this to recreate it: (due to path of least modification after
$LOCATION was dropped)

eval `g.gisenv`
: ${GISBASE?} ${GISDBASE?} ${LOCATION_NAME?} ${MAPSET?}
LOCATION="$GISDBASE"/"$LOCATION_NAME"/"$MAPSET"

For now in the published versions of KerGIS this is the same: LOCATION
has been dropped.
What I wanted to say is that, from an user point of view, he enters:

g.gisenv MAPSET=<some_name>

but
g.gisenv LOCATION_NAME=<some_name>

and this is a frequent error to do:

g.gisenv LOCATION=<some_name>

And the user is right: there is a lack of symetry, and if we speak all
along about a "location" (implied in the current DATASTORE), its name
(and not its pathname) shall be set via LOCATION.

Since to add implicit information to the sources, I do a lot of renaming
(so that the organization of the sources, and the consistence of naming
is, to some extent, self-explanatory), I have introduced in kergis
sources the tools/ed directory where I put the correspondence between
old name and new name like this (this works for C sources or scripts).
For example in this case I will create a tools/ed/sh.ed (the format is
: <condition>|<regex>|<replacement>

|GISDBASE|GISDATADIR
|GISDATADIR|GISDATABASE
|GISDATABASE|DATASTORE

[you see in this example that I did not find at first glance the correct
name:
GISDBASE GISDATADIR
  GISDBASE was to close to GISBASE so I wanted a more "at first sight"
  distinct name. But I thought that the data was not really a
  "database"
GISDATADIR GISDATABASE
  Blunder: the data _is_ organized as a database since there are
  dedicated functions (GISLIB) to implement a policy and to organize
  the data.
GISDATABASE DATASTORE
  Yes, databases, but not at the GISDATABASE level: below, at the
  location. And SQL and PostgreSQL other examples finally made the
  distinction more clear.
]

Then I automatically generate an ed(1) script:

g/GISDBASE/s/^GISDBASE$/GISDATADIR/
g/GISDBASE/s/^GISDBASE\([^a-zA-Z0-9_-]\)/GISDATADIR\1/
g/GISDBASE/s/\([^a-zA-Z0-9_-]\)GISDBASE$/\1GISDATADIR/
g/GISDBASE/s/\([^a-zA-Z0-9_-]\)GISDBASE\([^a-zA-Z0-9_-]\)/\1GISDATADIR\2/g
g/GISDATADIR/s/^GISDATADIR$/GISDATABASE/
g/GISDATADIR/s/^GISDATADIR\([^a-zA-Z0-9_-]\)/GISDATABASE\1/
g/GISDATADIR/s/\([^a-zA-Z0-9_-]\)GISDATADIR$/\1GISDATABASE/
g/GISDATADIR/s/\([^a-zA-Z0-9_-]\)GISDATADIR\([^a-zA-Z0-9_-]\)/\1GISDATABASE\2/g
g/GISDATABASE/s/^GISDATABASE$/DATASTORE/
g/GISDATABASE/s/^GISDATABASE\([^a-zA-Z0-9_-]\)/DATASTORE\1/
g/GISDATABASE/s/\([^a-zA-Z0-9_-]\)GISDATABASE$/\1DATASTORE/
g/GISDATABASE/s/\([^a-zA-Z0-9_-]\)GISDATABASE\([^a-zA-Z0-9_-]\)/\1DATASTORE\2/g
w
q

that is applied on files selected. Just to say that "upgrading" is
relatively easy.
[In fact, to upgrade from CERL to KerGIS version d.d.d.d you will have
to apply in order: sh.ed sh-1.0.0.0 ... {last sh.ed with version <
target version}]

[*] not a shell variable to allow child process (the GUI menus) to
adjust variables which affect the parent, and thus new non-derivative
sibling processes. [e.g. switch mapsets without exiting GRASS]

That's another case where I think I will change a name.

`.grassrc' or `.kergisrc' are misnamed. This is in fact a _session_.
I will go with `.kergis_session'. The session is precisely what
you are describing.

To come back to the DATASTORE and to a comparison---that is useful for
example in teaching---, the good thing is helping to find good ideas.
In this case, KerGIS shall clearly (I don't know if GPL GRASS has
already made this) implement:

g.dump(8)
g.restore(8)

to dump a location in ascii, and to restore an ascii version of a
location to the binary one. I use a lot CVS to store versions of the
ascii gis files, and it will be a great archival/transfer even testing
and debugging tool (to verify the basic functionnality of a fresh
installation, g.restore spearfish for example).

It is almost straightforward to do a shell script for that. But the
devil is in the details:

1) I would like user kergis (the 'root' superuser of KerGIS) to be able
to dump a whole DATASTORE. But individual permissions on directory might
prevent this (kergis is not system root; sudo(1) may help in this case);

2) Some part of options found in pg_dump and pg_restore would be useful
for CERL GRASS based GIS too, namely to dump in the ascii subdirectory,
or to tar (pax) the files for archival or transfer.

Once we put at light the rules underlying---and naming is one mean to do
that---good suggestions and natural extensions come to mind :slight_smile: And at
the moment, at least KerGIS (GPL GRASS has perhaps made some progress
already) is poor in administration tools.

Cheers,
--
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
                 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C

tlaronde@polynum.com wrote:

g.gisenv MAPSET=<some_name>

but
g.gisenv LOCATION_NAME=<some_name>

and this is a frequent error to do:

g.gisenv LOCATION=<some_name>

And the user is right: there is a lack of symetry, and if we speak all
along about a "location" (implied in the current DATASTORE), its name
(and not its pathname) shall be set via LOCATION.

LOCATION makes more sense if doing it cleanly, but I believe it was
named something different (_NAME) to limit bugs during the transition.
There is always user confusion about shell variables vs. grass
variables, to have two different things using the same name would have
been bad.

As this variable isn't typically seen by the user (it is handled by the
g.mapset module in GRASS 6+), the extra _NAME part does more good than
harm I think. Call it a historically beneficial wart.

By GRASS 7 probably enough time has passed to rename it "LOCATION".

To come back to the DATASTORE and to a comparison---that is useful for
example in teaching---, the good thing is helping to find good ideas.
In this case, KerGIS shall clearly (I don't know if GPL GRASS has
already made this) implement:

g.dump(8)
g.restore(8)

to dump a location in ascii, and to restore an ascii version of a
location to the binary one. I use a lot CVS to store versions of the
ascii gis files, and it will be a great archival/transfer even testing
and debugging tool (to verify the basic functionnality of a fresh
installation, g.restore spearfish for example).

is that just saving the projection settings & default region info found
in PERMANENT/, or is it a full "tar czf spearfish.tgz spearfish/" ?
r.out.ascii, v.out.ascii all maps?

Hamish

On Tue, Mar 06, 2007 at 02:02:12PM +1300, Hamish wrote:

> g.dump(8)
> g.restore(8)
>
> to dump a location in ascii, and to restore an ascii version of a
> location to the binary one. I use a lot CVS to store versions of the
> ascii gis files, and it will be a great archival/transfer even testing
> and debugging tool (to verify the basic functionnality of a fresh
> installation, g.restore spearfish for example).

is that just saving the projection settings & default region info found
in PERMANENT/, or is it a full "tar czf spearfish.tgz spearfish/" ?
r.out.ascii, v.out.ascii all maps?

Yes, as PostgreSQL pg_dump(8) and pg_restore(8), it will dump in ascii
the binary files (raster, vector and other types if any) on all the
MAPSETs in the LOCATION, and restore will rebuild everything (this means
that not only the geometry in ascii is saved [case tar or pax], but
this includes the cats, atts [or whatever they are called]; all the ascii
(or text) files needed to restore a complete compiled (binary) version
of the LOCATION). These text files are typically what has to be saved in
a CVS.
--
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
                 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C

Hamish wrote:

> g.gisenv MAPSET=<some_name>
>
> but
> g.gisenv LOCATION_NAME=<some_name>
>
> and this is a frequent error to do:
>
> g.gisenv LOCATION=<some_name>
>
> And the user is right: there is a lack of symetry, and if we speak all
> along about a "location" (implied in the current DATASTORE), its name
> (and not its pathname) shall be set via LOCATION.

LOCATION makes more sense if doing it cleanly, but I believe it was
named something different (_NAME) to limit bugs during the transition.
There is always user confusion about shell variables vs. grass
variables,

And, it seems, some developer confusion about shell variables versus
environment variables.

Also, it doesn't help that the documentation used to refer to GRASS
variables as environment variables.

to have two different things using the same name would have
been bad.

As this variable isn't typically seen by the user (it is handled by the
g.mapset module in GRASS 6+), the extra _NAME part does more good than
harm I think. Call it a historically beneficial wart.

By GRASS 7 probably enough time has passed to rename it "LOCATION".

Actually, I'd say that enough time has already passed. We stopped
exporting GRASS variables to the environment nearly 5 years ago:

  RCS file: /grassrepository/grass/src/general/g.gisenv/main.c,v

  revision 1.4
  date: 2002/03/26 13:25:09; author: glynn; state: Exp; lines: +3 -3
  branches: 1.4.2;
  Don't export GISDBASE, LOCATION_NAME, MAPSET, LOCATION to the environment
  Fix scripts to obtain settings from g.gisenv

  RCS file: /grassrepository/grass/src/general/init/init.sh,v
  
  revision 1.36
  date: 2002/03/26 13:25:10; author: glynn; state: Exp; lines: +2 -8
  branches: 1.36.2;
  Don't export GISDBASE, LOCATION_NAME, MAPSET, LOCATION to the environment
  Fix scripts to obtain settings from g.gisenv

And the GRASS variable has been called LOCATION_NAME for at least as
long as GRASS has been in CVS (1999-12-29).

The main issue is that we would have to fix all of the shell scripts
which use LOCATION as the path to the mapset directory and expect
"eval `g.gisenv`" to store the location in LOCATION_NAME rather than
in LOCATION. But that's equally true whether we change it now or
later.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

And the GRASS variable has been called LOCATION_NAME for at least as
long as GRASS has been in CVS (1999-12-29).

The main issue is that we would have to fix all of the shell scripts
which use LOCATION as the path to the mapset directory and expect
"eval `g.gisenv`" to store the location in LOCATION_NAME rather than
in LOCATION. But that's equally true whether we change it now or
later.

c.22 scripts and lib/init/init.sh

grass63/scripts$ grep -rI LOCATION * | wc -l
83
grass63/scripts$ grep -rI LOCATION_NAME * | wc -l
53

grass63/scripts$ grep -rI LOCATION * | cut -f1 -d: | uniq

d.rast.leg/d.rast.leg
d.slide.show/d.slide.show
g.mremove/g.mremove
g.mremove/g.mremove1
i.image.mosaic/i.image.mosaic
i.in.spotvgt/i.in.spotvgt
i.oif/i.oif
r.fillnulls/r.fillnulls
r.in.srtm/r.in.srtm
r.mask/r.mask
r.plane/r.plane
v.build.all/v.build.all
v.convert.all/v.convert.all
v.db.addtable/v.db.addtable
v.db.reconnect.all/v.db.reconnect.all
v.in.e00/v.in.e00
v.in.garmin/v.in.garmin
v.in.gns/v.in.gns
v.in.gpsbabel/v.in.gpsbabel
v.in.mapgen/v.in.mapgen
v.in.sites.all/v.in.sites.all
v.report/v.report

Hamish

On Tue, Mar 06, 2007 at 06:39:16PM +1300, Hamish wrote:

grass63/scripts$ grep -rI LOCATION * | wc -l
83
grass63/scripts$ grep -rI LOCATION_NAME * | wc -l
53
[..]

Well, for the scripts (or the programs), this is not, whether for
GRASS/KerGIS core programs or users' programs, a problem: ed(1) does
the job easily.
If the CHANGES are conspicuous, this will not come as a big surprise.

But it should come with a major change in the version number (that's one
of the reason why I have still not release an official 1.0: I want to
make all the "dramatic" changes once, to keep the 1.x family relatively
stable on the naming scheme at least).

I think, if we reflect about the naming scheme (LOCATION is one element;
DATASTORE is another etc.) we should try to bundle the name changes in one
step. From a pedagogical and practical point of view, the benefits
clearly exceed the cons. If the legacy is not consistent, it shall be
changed.

Cheers,
--
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
                 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C