[GRASS-user] SQL: generating numeric class numbers from class text labels?

Hi,

I have a landuse map with text labels (forest, street, ...). For
r.learn.ml I need to have them as numeric classes.
It is not important for me which number is assigned but I search for
an automated solution, i.e. SQL statement unless there is a different
way.

So:

cat|label|label_int
1|forest|1
2|forest|1
3|street|2
4|forest|1
5|street|2
6|urban|3
...

I guess I have done that already some years ago but I can't remember
the trick :slight_smile:

thanks for a hint,
Markus

Hi

I cannot think of a short and simple SQL "one liner" statement but can think of a short script that will do that using SQL statements.

In pseudo-code, translate to your favorite language:

# get the unique classes

SELECT distinct(label) FROM table;

-> fetch results in an array "txt_label"

# iterate over array and insert new integer labels in table

int_label = 0

foreach txt_label {

 INSERT INTO table\.label\_int VALUES int\_label WHERE label = 'txt\_label'

 n\+\+

}

This may seem obvious to you, but since you asked :slight_smile:

Cheers and good luck

HernĂĄn

On 2019-12-04 18:11, Markus Neteler wrote:

Hi,

I have a landuse map with text labels (forest, street, ...). For
r.learn.ml I need to have them as numeric classes.
It is not important for me which number is assigned but I search for
an automated solution, i.e. SQL statement unless there is a different
way.

So:

cat|label|label_int
1|forest|1
2|forest|1
3|street|2
4|forest|1
5|street|2
6|urban|3
...

I guess I have done that already some years ago but I can't remember
the trick :slight_smile:

thanks for a hint,
Markus
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user

The variable to increment should be of course as corrected below:

How about doing this in R? The labels will be read into R as factors, and the factor levels can easily be extracted as numbers.

Something like this:

micha@tp480:~$ v.info -c stations
Displaying column types/names for database connection of layer <1>:
INTEGER>cat
INTEGER>station_num
TEXT>station_he
TEXT>station_en
TEXT>type
INTEGER>x_coord
INTEGER>y_coord
DOUBLE PRECISION|long
DOUBLE PRECISION|lat
INTEGER>elev
TEXT>date_open
DOUBLE PRECISION|dist
DOUBLE PRECISION|azim

micha@tp480:~$ R

R version 3.5.2 (2018-12-20) – “Eggshell Igloo”
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
…

library(rgrass7)
Loading required package: XML
GRASS GIS interface loaded with GRASS version: GRASS 7.6.0 (2019)
and location: ITM
use_sf()
stations = readVECT(“stations”)
WARNING: Vector map is 3D. Use format specific layer creation
options (parameter ‘lco’) to export <in 3D rather than 2D
(default).
Exporting 94 features…
100%
…

stations[‘new_station_num’] = as.numeric(stations$station_en)
stations$new_station_num
[1] 71 26 6 55 54 63 7 8 31 30 46 84 92 38 32 88 27 12 67 62 47 33 53 76 89
[26] 2 86 11 40 65 64 45 13 85 60 59 1 74 73 22 19 15 39 50 56 14 44 23 36 83
[51] 41 42 43 18 17 75 16 82 81 37 48 28 87 3 66 10 34 91 61 93 94 72 5 4 68
[76] 78 77 9 29 51 58 57 49 52 24 25 80 79 35 70 69 90 21 20

writeVECT(SDF=stations, vname=“new_stations”)

Best regards, Micha

¡¡¡

On 04/12/2019 19:11, Markus Neteler wrote:

Hi,

I have a landuse map with text labels (forest, street, ...). For
r.learn.ml I need to have them as numeric classes.
It is not important for me which number is assigned but I search for
an automated solution, i.e. SQL statement unless there is a different
way.

So:

cat|label|label_int
1|forest|1
2|forest|1
3|street|2
4|forest|1
5|street|2
6|urban|3
...

I guess I have done that already some years ago but I can't remember
the trick :-)

thanks for a hint,
Markus
_______________________________________________
grass-user mailing list
[grass-user@lists.osgeo.org](mailto:grass-user@lists.osgeo.org)
[https://lists.osgeo.org/mailman/listinfo/grass-user](https://lists.osgeo.org/mailman/listinfo/grass-user)
-- 
Micha Silver
Ben Gurion Univ.
Sde Boker, Remote Sensing Lab
cell: +972-523-665918
[https://orcid.org/0000-0002-1128-1325](https://orcid.org/0000-0002-1128-1325)

How about doing this in R? The labels will be read into R as factors, and the factor levels can easily be extracted as numbers.

Something like this:

micha@tp480:~$ v.info -c stations
Displaying column types/names for database connection of layer <1>:
INTEGER|cat
INTEGER|station_num
TEXT|station_he
TEXT|station_en
TEXT|type
INTEGER|x_coord
INTEGER|y_coord
DOUBLE PRECISION|long
DOUBLE PRECISION|lat
INTEGER|elev
TEXT|date_open
DOUBLE PRECISION|dist
DOUBLE PRECISION|azim

micha@tp480:~$ R

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
.....

> library(rgrass7)
Loading required package: XML
GRASS GIS interface loaded with GRASS version: GRASS 7.6.0 (2019)
and location: ITM
> use_sf()
> stations = readVECT("stations")
WARNING: Vector map <stations> is 3D. Use format specific layer creation
options (parameter 'lco') to export <in 3D rather than 2D
(default).
Exporting 94 features...
100%
.....

> stations['new_station_num'] = as.numeric(stations$station_en)
> stations$new_station_num
[1] 71 26 6 55 54 63 7 8 31 30 46 84 92 38 32 88 27 12 67 62 47 33 53 76 89
[26] 2 86 11 40 65 64 45 13 85 60 59 1 74 73 22 19 15 39 50 56 14 44 23 36 83
[51] 41 42 43 18 17 75 16 82 81 37 48 28 87 3 66 10 34 91 61 93 94 72 5 4 68
[76] 78 77 9 29 51 58 57 49 52 24 25 80 79 35 70 69 90 21 20

> writeVECT(SDF=stations, vname="new_stations")

Best regards, Micha

On 04/12/2019 19:11, Markus Neteler wrote:

Hi,

I have a landuse map with text labels (forest, street, ...). For
r.learn.ml I need to have them as numeric classes.
It is not important for me which number is assigned but I search for
an automated solution, i.e. SQL statement unless there is a different
way.

So:

cat|label|label_int
1|forest|1
2|forest|1
3|street|2
4|forest|1
5|street|2
6|urban|3
...

I guess I have done that already some years ago but I can't remember
the trick :slight_smile:

thanks for a hint,
Markus
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user

--
Micha Silver
Ben Gurion Univ.
Sde Boker, Remote Sensing Lab
cell: +972-523-665918
https://orcid.org/0000-0002-1128-1325

Thanks for your answers.
In fact I need it in Python…

Best
Markus

Micha Silver <tsvibar@gmail.com> schrieb am Mi., 4. Dez. 2019, 18:57:

How about doing this in R? The labels will be read into R as factors, and the factor levels can easily be extracted as numbers.

Something like this:

micha@tp480:~$ v.info -c stations
Displaying column types/names for database connection of layer <1>:
INTEGER|cat
INTEGER|station_num
TEXT|station_he
TEXT|station_en
TEXT|type
INTEGER|x_coord
INTEGER|y_coord
DOUBLE PRECISION|long
DOUBLE PRECISION|lat
INTEGER|elev
TEXT|date_open
DOUBLE PRECISION|dist
DOUBLE PRECISION|azim

micha@tp480:~$ R

R version 3.5.2 (2018-12-20) – “Eggshell Igloo”
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
…

library(rgrass7)
Loading required package: XML
GRASS GIS interface loaded with GRASS version: GRASS 7.6.0 (2019)
and location: ITM
use_sf()
stations = readVECT(“stations”)
WARNING: Vector map is 3D. Use format specific layer creation
options (parameter ‘lco’) to export <in 3D rather than 2D
(default).
Exporting 94 features…
100%
…

stations[‘new_station_num’] = as.numeric(stations$station_en)
stations$new_station_num
[1] 71 26 6 55 54 63 7 8 31 30 46 84 92 38 32 88 27 12 67 62 47 33 53 76 89
[26] 2 86 11 40 65 64 45 13 85 60 59 1 74 73 22 19 15 39 50 56 14 44 23 36 83
[51] 41 42 43 18 17 75 16 82 81 37 48 28 87 3 66 10 34 91 61 93 94 72 5 4 68
[76] 78 77 9 29 51 58 57 49 52 24 25 80 79 35 70 69 90 21 20

writeVECT(SDF=stations, vname=“new_stations”)

Best regards, Micha

On 04/12/2019 19:11, Markus Neteler wrote:

Hi,

I have a landuse map with text labels (forest, street, ...). For
[r.learn.ml](http://r.learn.ml) I need to have them as numeric classes.
It is not important for me which number is assigned but I search for
an automated solution, i.e. SQL statement unless there is a different
way.

So:

cat|label|label_int
1|forest|1
2|forest|1
3|street|2
4|forest|1
5|street|2
6|urban|3
...

I guess I have done that already some years ago but I can't remember
the trick :-)

thanks for a hint,
Markus
_______________________________________________
grass-user mailing list
[grass-user@lists.osgeo.org](mailto:grass-user@lists.osgeo.org)
[https://lists.osgeo.org/mailman/listinfo/grass-user](https://lists.osgeo.org/mailman/listinfo/grass-user)
-- 
Micha Silver
Ben Gurion Univ.
Sde Boker, Remote Sensing Lab
cell: +972-523-665918
[https://orcid.org/0000-0002-1128-1325](https://orcid.org/0000-0002-1128-1325)

On 4/12/19 19:58, Markus Neteler wrote:

Thanks for your answers.
In fact I need it in Python...

Using SQL, you can do something like this (SQLite version):

create table mytab (cat int, label varchar, labelint int);

inserts...

select * from mytab;
1|forest|
2|forest|
3|forest|
4|street|
5|street|
6|forest|
7|forest|
8|street|
9|grass|
10|grass|

SELECT cat, label, rank() OVER win FROM mytab WINDOW win as (ORDER BY label);
1|forest|1
2|forest|1
3|forest|1
6|forest|1
7|forest|1
9|grass|6
10|grass|6
4|street|8
5|street|8
8|street|8

Playing around with that should allow you to feed your table.

Or in pure python:

- get unique labels with v.db.select col=label group=label and put them in a list
- get numbers with something like this: classnums = [x+1 for x in range(len(labels))]
- zip the two lists: zip(labels, classnums)
- for each tuple in the list:
  v.db.update col=labelint value=tuple[1] where=label=tuple[0]

Probably there are more elegant solutions.

Moritz

Micha Silver <tsvibar@gmail.com <mailto:tsvibar@gmail.com>> schrieb am Mi., 4. Dez. 2019, 18:57:

    How about doing this in R? The labels will be read into R as
    factors, and the factor levels can easily be extracted as numbers.

    Something like this:

    micha@tp480:~$ v.info <http://v.info> -c stations
    Displaying column types/names for database connection of layer <1>:
    INTEGER|cat
    INTEGER|station_num
    TEXT|station_he
    TEXT|station_en
    TEXT|type
    INTEGER|x_coord
    INTEGER|y_coord
    DOUBLE PRECISION|long
    DOUBLE PRECISION|lat
    INTEGER|elev
    TEXT|date_open
    DOUBLE PRECISION|dist
    DOUBLE PRECISION|azim

    micha@tp480:~$ R

    R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
    Copyright (C) 2018 The R Foundation for Statistical Computing
    Platform: x86_64-pc-linux-gnu (64-bit)
    .....

     > library(rgrass7)
    Loading required package: XML
    GRASS GIS interface loaded with GRASS version: GRASS 7.6.0 (2019)
    and location: ITM
     > use_sf()
     > stations = readVECT("stations")
    WARNING: Vector map <stations> is 3D. Use format specific layer creation
      options (parameter 'lco') to export <in 3D rather than 2D
      (default).
    Exporting 94 features...
      100%
    .....

     > stations['new_station_num'] = as.numeric(stations$station_en)
     > stations$new_station_num
      [1] 71 26 6 55 54 63 7 8 31 30 46 84 92 38 32 88 27 12 67 62 47
    33 53 76 89
    [26] 2 86 11 40 65 64 45 13 85 60 59 1 74 73 22 19 15 39 50 56 14
    44 23 36 83
    [51] 41 42 43 18 17 75 16 82 81 37 48 28 87 3 66 10 34 91 61 93 94
    72 5 4 68
    [76] 78 77 9 29 51 58 57 49 52 24 25 80 79 35 70 69 90 21 20

     > writeVECT(SDF=stations, vname="new_stations")

    Best regards, Micha

    On 04/12/2019 19:11, Markus Neteler wrote:

    Hi,

    I have a landuse map with text labels (forest, street, ...). For
    r.learn.ml <http://r.learn.ml> I need to have them as numeric classes.
    It is not important for me which number is assigned but I search for
    an automated solution, i.e. SQL statement unless there is a different
    way.

    So:

    cat|label|label_int
    1|forest|1
    2|forest|1
    3|street|2
    4|forest|1
    5|street|2
    6|urban|3
    ...

    I guess I have done that already some years ago but I can't remember
    the trick :slight_smile:

    thanks for a hint,
    Markus
    _______________________________________________
    grass-user mailing list
    grass-user@lists.osgeo.org <mailto:grass-user@lists.osgeo.org>
    https://lists.osgeo.org/mailman/listinfo/grass-user

    -- Micha Silver
    Ben Gurion Univ.
    Sde Boker, Remote Sensing Lab
    cell: +972-523-665918
    https://orcid.org/0000-0002-1128-1325

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user

Hi Moritz, all,

On Thu, Dec 5, 2019 at 9:22 AM Moritz Lennert
<mlennert@club.worldonline.be> wrote:

On 4/12/19 19:58, Markus Neteler wrote:
> Thanks for your answers.
> In fact I need it in Python...

Using SQL, you can do something like this (SQLite version):

create table mytab (cat int, label varchar, labelint int);

inserts...

select * from mytab;
1|forest|
2|forest|
3|forest|
4|street|
5|street|
6|forest|
7|forest|
8|street|
9|grass|
10|grass|

SELECT cat, label, rank() OVER win FROM mytab WINDOW win as (ORDER BY
label);
1|forest|1
2|forest|1
3|forest|1
6|forest|1
7|forest|1
9|grass|6
10|grass|6
4|street|8
5|street|8
8|street|8

Playing around with that should allow you to feed your table.

I tried this with NC:

g.copy vector=zipcodes_wake,myzipcodes_wake
db.select sql="SELECT cat, NAME, rank() OVER win as NAME_num FROM
myzipcodes_wake WINDOW win as (ORDER BY NAME);"
cat|NAME|NAME_num
22|ANGIER|1
31|ANGIER|1
16|APEX|3
23|APEX|3
30|APEX|3
34|APEX|3
29|CARY|7
35|CARY|7
36|CARY|7
38|CARY|7
19|CLAYTON|11
1|CREEDMOOR|12
...

Cool :slight_smile:

The question is now: how to turn this into an UPDATE statement (for
v.db.update or db.execute)?

thanks
Markus