[GRASS-dev] Non ASCII in GISBASE, LOCATION, MAPSET, MAP

Hi,
can you point me to some place where restrictions for non ASCII characters use
in GISBASE, LOCATION, MAPSET and MAP names are described?

I have only found this mail
http://www.osgeo.org/pipermail/grass-dev/2007-January/028553.html
which does not sound very optimistic.

Thanks
Radim

Radim Blazek wrote:

can you point me to some place where restrictions for non ASCII characters use
in GISBASE, LOCATION, MAPSET and MAP names are described?

The main restriction is G_legal_filename(), in lib/gis/legal_name.c.
This prohibits all characters >=127 and <=32 as well as the specific
characters:

  / " ' @ , = *

[slash, double quote, single quote, at, comma, equals, asterisk.]

However, it should really be extended to also prohibit some additional
characters which aren't allowed in FAT/NTFS filenames:

  \ : ? < > |

[backslash, colon, question mark, less than, greater than, vertical bar.]

Allowing 8-bit characters runs into all sorts of problems with
encoding issues, particularly when combined with case-folding and
codepage issues on Windows (and on Windows filesystems).

As you know, vector map names are further restricted to valid SQL
identifiers. Personally, I'd like to see that restriction removed.
There's no fundamental reason why a map's attribute table must have
*exactly* the same name as the map; an approximation (e.g. with
invalid characters replaced by underscores) would suffice, IMHO.

--
Glynn Clements <glynn@gclements.plus.com>

On 25/03/10 16:57, Glynn Clements wrote:

As you know, vector map names are further restricted to valid SQL
identifiers. Personally, I'd like to see that restriction removed.
There's no fundamental reason why a map's attribute table must have
*exactly* the same name as the map; an approximation (e.g. with
invalid characters replaced by underscores) would suffice, IMHO.

Even today, there is not obligation for the attribute table to have the same name as the map as you can link the map to any table you want. Radim, correct me if I'm wrong, but I believe that the fact that currently this restriction is imposed for new map creation was just coded as a convenience to not have to check and, if necessary, transform the name of the table, but I imagine that it would not be too complicated to replace all offending characters with an underscore or something similar.

Moritz

On Thu, Mar 25, 2010 at 5:57 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:

The main restriction is G_legal_filename(), in lib/gis/legal_name.c.
This prohibits all characters >=127 and <=32 as well as the specific
characters:

G_legal_filename is used in lib/init/set_data.c to check MAPSET and
LOCATION_NAME. Does it mean that there are no restrictions for
GISBASE?

As you know, vector map names are further restricted to valid SQL
identifiers. Personally, I'd like to see that restriction removed.
There's no fundamental reason why a map's attribute table must have
*exactly* the same name as the map; an approximation (e.g. with
invalid characters replaced by underscores) would suffice, IMHO.

Just simple replacement results in duplicate names, does not?

Radim

2010/4/20, Radim Blazek <radim.blazek@gmail.com>:

G_legal_filename is used in lib/init/set_data.c to check MAPSET and
LOCATION_NAME. Does it mean that there are no restrictions for
GISBASE?

Recently there where fixes to permit various characters in GISBASE. It
should work with space and non-latin characters, as users on Windows
might have no control over GISBASE and GISDBASE paths.

As you know, vector map names are further restricted to valid SQL
identifiers. Personally, I'd like to see that restriction removed.
There's no fundamental reason why a map's attribute table must have
*exactly* the same name as the map; an approximation (e.g. with
invalid characters replaced by underscores) would suffice, IMHO.

Just simple replacement results in duplicate names, does not?

Radim

Replacement isn't an option for non-latin based languages - i.e. how
would look map name with replacement characters for "āšņļ" (latin
based), "ЙЦУКЕН" (Cyrillic) or "仪仫们仭"?
One option would be to use some internal ID for SQL table names, still
it would be not so hacker-friendly.

Maris.

PS. Radim - thumbs up for keeping alive this discussion :slight_smile:

Radim Blazek wrote:

> The main restriction is G_legal_filename(), in lib/gis/legal_name.c.
> This prohibits all characters >=127 and <=32 as well as the specific
> characters:

G_legal_filename is used in lib/init/set_data.c to check MAPSET and
LOCATION_NAME. Does it mean that there are no restrictions for
GISBASE?

Correct. At least, there *shouldn't* be any restrictions for GISDBASE
(or GISBASE).

[On Windows, they must be be accessible via the codepage-based API,
i.e. they can't contain characters outside of the current codepage.
This isn't something which can easily be worked around.]

> As you know, vector map names are further restricted to valid SQL
> identifiers. Personally, I'd like to see that restriction removed.
> There's no fundamental reason why a map's attribute table must have
> *exactly* the same name as the map; an approximation (e.g. with
> invalid characters replaced by underscores) would suffice, IMHO.

Just simple replacement results in duplicate names, does not?

It's possible that could occur, but it seems unlikely in practice. In
the worst case, the code could add a unique suffix (_1, _2, etc) in
the event of a conflict.

--
Glynn Clements <glynn@gclements.plus.com>

Maris Nartiss wrote:

>> As you know, vector map names are further restricted to valid SQL
>> identifiers. Personally, I'd like to see that restriction removed.
>> There's no fundamental reason why a map's attribute table must have
>> *exactly* the same name as the map; an approximation (e.g. with
>> invalid characters replaced by underscores) would suffice, IMHO.
>
> Just simple replacement results in duplicate names, does not?

Replacement isn't an option for non-latin based languages - i.e. how
would look map name with replacement characters for "āšņļ" (latin
based), "ЙЦУКЕН" (Cyrillic) or "仪仫们仭"?
One option would be to use some internal ID for SQL table names, still
it would be not so hacker-friendly.

GRASS has never allowed 8-bit characters in the names of maps,
mapsets, locations, etc.

I'm not proposing that we lift that restriction, only that the rules
for vector maps be made consistent with the rest of GRASS. The sole
reason for the restriction was to avoid having to create a unique name
for the map's attribute table (although this would still be an issue
if you want to use a single database for multiple mapsets).

--
Glynn Clements <glynn@gclements.plus.com>