[GRASS-user] Script for idw cross validation

Hello! i’m trying to validate my IDW model for field values of pH (and other parameters) using the following:

# Partition sample points in 4 groups
Module("v.kcv", map="pHsr", npartitions=4, column="part")
Module("v.extract", flags="r", input="pHsr", output="pHcalibration",
       where="part=1")
Module("v.extract", input="pHsr", output="pHvalidation",
       where="part=1")
Module("v.surf.idw", input="pHcalibration", column="pH_values",
       npoints=12, power=2, output="pH_CAL_idw")
Module("v.what.rast", map="pHvalidation", raster="pHsr",
       column="actual")
Module("v.what.rast", map="pHvalidation", raster="pH_CAL_idw", column="Estimated")

Things work pretty good so far, i obtained a vector with the actual values and the estimated ones, the next step would be to evaluate the root mean square error for this routine. (Eventually i also want to loop the idw estimation for different set of Validation and Calibration groups).
So far i haven’t used the script but instead i used the GUI interface for every single comand (v.kcv, v.extract, v.surf.idw and v.what.rast).
For the following section i wanted to use a script but i have trouble working it, as stats is ‘not defined’.

sqlstat = "SELECT actual,estimated FROM pHvalidation"
stats = Module("db.select", flags="c", sql=sqlstat,
               stdout_=PIPE).outputs.stdout
stats = stats.replace("\n", "|")[:-1].split("|")
stats = (np.asarray([float(x) for x in stats], dtype="float").
         reshape(len(stats)/2, 2))
rsme = np.sqrt(np.mean(np.diff(stats, axis=1)**2))

I tried to run the GUI for db.select specifing in the sql SELECT= SELECT actual,estimated FROM pHvalidation".
I obtained in the output of the GUI the right selected values, but i don’t know how to procede calculating the RMSE value, when i try to run the script above, stats stay as ‘not defined’. Also i was expecting a vector with two columns* with my selected values as an output of the GUI, but i dont know if and where such file is been saved.
I guess that the line ‘‘stats= Module(’‘db.select’’, flags=‘‘c’’, sql=sqlstat, stdout_=PIPE).outputs.stdout" is required for define ‘‘stats’’ but must be missing the reference to ‘‘stats’’ somewhere before that.

Is it possible that i just have to create a vector with the column im trying to select and named that vector ‘‘stats’’?

*I get the idea that the output of the GUI is providing me the right values but i dont know how to create a vector from that output to call it ‘‘stats’’ and procede with the script.

I hope that i was able to explain the issue properly, i may be missing something simple because i’m not an expert at all in using Gis GRASS, especially when it comes to writing scripts.
Thanks in advance,
FS.

Hi Francesco,

On 13/11/18 05:57, francesco sapienza wrote:

Hello! i'm trying to validate my IDW model for field values of pH (and other parameters) using the following:

|# Partition sample points in 4 groups Module("v.kcv", map="pHsr", npartitions=4, column="part")|

|Module("v.extract", flags="r", input="pHsr", output="pHcalibration", where="part=1")

> Module("v.extract", input="pHsr", output="pHvalidation",

where="part=1")|

You probably want to change your part=1 to something else in one of these calls.

|Module("v.surf.idw", input="pHcalibration", column="pH_values", npoints=12, power=2, output="pH_CAL_idw")|

|Module("v.what.rast", map="pHvalidation", raster="pHsr", column="actual") Module("v.what.rast", map="pHvalidation", raster="pH_CAL_idw", column="Estimated")|

Things work pretty good so far, i obtained a vector with the actual values and the estimated ones, the next step would be to evaluate the root mean square error for this routine.

Calculating something like the RMSE is actually quite easy, with no need to use any SQL:

- Create a new column squared error, type double precision, with v.db.addcolum

- Fill it using v.db.update using qcol="(estimated - actual)*(estimated - actual)"

- Extract aggregated stats on that column using v.univar

- calculate the square root of the mean value to get RMSE

(You could also just get the mean of the absolute values of a column containing simply (estimated - actual).)

(Eventually i also want to loop the idw estimation for different set of Validation and Calibration groups).
*So far i haven't used the script but instead i used the GUI interface for every single comand (v.kcv, v.extract, v.surf.idw and v.what.rast).*
For the following section i wanted to use a script but i have trouble working it, as stats is 'not defined'.

|sqlstat = "SELECT actual,estimated FROM pHvalidation" stats = Module("db.select", flags="c", sql=sqlstat, stdout_=PIPE).outputs.stdout stats = stats.replace("\n", "|")[:-1].split("|") stats = (np.asarray([float(x) for x in stats], dtype="float"). reshape(len(stats)/2, 2))|

rsme=np.sqrt(np.mean(np.diff(stats,axis=1)**2))

I tried to run the GUI for db.select specifing in the sql SELECT= SELECT actual,estimated FROM pHvalidation".
I obtained in the output of the GUI the right selected values, but i don't know how to procede calculating the RMSE value, when i try to run the script above, stats stay as 'not defined'.

I don't think you're using Module as expected. Please read [1]. AFAIR, Modules.outputs is a dictionary of the modules' output parameters, not the output of the module run.

Moritz

[1] https://grass.osgeo.org/grass74/manuals/libpython/pygrass_modules.html