[GRASS-dev] Test suite for GRASS - proposal, discussion welcome

Hello all,

Sören and me have discussed today about creating a comprehensive test
suite for GRASS. We wrote down the draft [0] and I plan to work on the
implementation as soon as possible.

I hope it is of interest of the developers team, and of course more
opinions are welcome, as well as coding hands.

best regards from the GRASS Community Sprint!
Anne

[0] http://grass.osgeo.org/wiki/Test_Suite

Anne Ghisla wrote:

Sören and me have discussed today about creating a comprehensive test
suite for GRASS. We wrote down the draft [0] and I plan to work on the
implementation as soon as possible.

I hope it is of interest of the developers team, and of course more
opinions are welcome, as well as coding hands.

A couple of comments (I thought about using the Wiki "talk" page but I
wasn't sure if anyone would read it):

We plan to run unittests and integration tests for both libraries
and modules. The test suite will be run after compilation, with a
command like:

$ make tests [ proj [ libs | modules | all ] ]

I'd suggest adding a "test" target to the various *.make files, so you
can do e.g.:

  make -C display test
  make -C display/d.rast test

etc.

Tests are in each module's folder (written in Python)

Python might be overkill for module tests. The build system absolutely
requires a Bourne shell (for make), so portability isn't an issue.

The per-module test script shouldn't need to analyse the output; it
can just be a sequence of commands which generate textual output (e.g.
a test script for a raster module might use r.out.ascii to generate
this; a test script for a display module would output to a PPM file
which can just be "cat"d). The analysis would consist of the output
being compared against a reference file.

E.g. Module.make might have:

  $(PGM).out: $(BIN)/$(PGM)$(EXE) $(PGM).test
    $(call run_grass,$(SHELL) $(PGM).test > $(PGM).out)
  
  test: $(PGM).out $(PGM).ok
    diff -u $(PGM).out $(PGM).ok || echo "$(PGM) test failed"

Each directory would have a <module>.test script and a <module>.ok
reference file.

This would make it relatively straightfoward to add test cases without
needing to know Python or the test framework. Just add commands to the
test script, "make test", and rename <module>.out to <module>.ok.

More complex features (e.g. multiple test scripts) are possible. The
general idea remains the same: make the test "system" deal with as
much as possible so that adding new tests is easy.

--
Glynn Clements <glynn@gclements.plus.com>

Hello Glynn, all,

On Sun, 2011-05-22 at 20:56 +0100, Glynn Clements wrote:

Anne Ghisla wrote:

> Sören and me have discussed today about creating a comprehensive test
> suite for GRASS. We wrote down the draft [0] and I plan to work on the
> implementation as soon as possible.

[...]

> We plan to run unittests and integration tests for both libraries
> and modules. The test suite will be run after compilation, with a
> command like:
>
> $ make tests [ proj [ libs | modules | all ] ]

I'd suggest adding a "test" target to the various *.make files, so you
can do e.g.:

  make -C display test
  make -C display/d.rast test

Good idea. Sadly I don't know makefile system well, so most of my
questions are basic. Your suggestion is to explicitely run tests on a
given module. Would it also be possible to run tests in one run,
launching 'make' in the code root folder?

> Tests are in each module's folder (written in Python)

Python might be overkill for module tests. The build system absolutely
requires a Bourne shell (for make), so portability isn't an issue.

The per-module test script shouldn't need to analyse the output; it
can just be a sequence of commands which generate textual output (e.g.
a test script for a raster module might use r.out.ascii to generate
this; a test script for a display module would output to a PPM file
which can just be "cat"d). The analysis would consist of the output
being compared against a reference file.

It saves us writing lots of functions that would compare datatypes; but
OTOH it would add dependencies of other GRASS modules. Then, it will be
necessary (or at least better) to test first the modules without
dependencies. We would need to check if a test includes more than one
GRASS module... or just let all tests run and fail, and then understand
which modules are actually broken, and causing other (correct) modules'
test to fail.

E.g. Module.make might have:

  $(PGM).out: $(BIN)/$(PGM)$(EXE) $(PGM).test
    $(call run_grass,$(SHELL) $(PGM).test > $(PGM).out)
  
  test: $(PGM).out $(PGM).ok
    diff -u $(PGM).out $(PGM).ok || echo "$(PGM) test failed"

Each directory would have a <module>.test script and a <module>.ok
reference file.

This would make it relatively straightfoward to add test cases without
needing to know Python or the test framework. Just add commands to the
test script, "make test", and rename <module>.out to <module>.ok.

More complex features (e.g. multiple test scripts) are possible. The
general idea remains the same: make the test "system" deal with as
much as possible so that adding new tests is easy.

I definitely agree, writing tests must be as easy as possible for the
developers.
We chose Python, as Sören already wrote tests as shell scripts and found
them too cumbersome and difficult to mantain. Python provides already a
good testing framework, and we plan to write base classes specific for
GRASS.
At the moment I'm unable to work on makefiles and have not enough
familiarity with shell to write better tests than the ones written by
Sören; whereas I/we can produce a proof of concept of Python tests in
few weeks.

So, it seems that the choice of the language is up to the coders
involved. If more developers are willing to work in shell, and have some
time to give me some hints, let's go for it. The ultimate goal is to
create a testing framework (currently missing), on which as many
developers as possible can work.

all the best!
Anne
--
http://wiki.osgeo.org/wiki/User:Aghisla

Anne Ghisla wrote:

> I'd suggest adding a "test" target to the various *.make files, so you
> can do e.g.:
>
> make -C display test
> make -C display/d.rast test

Good idea. Sadly I don't know makefile system well, so most of my
questions are basic. Your suggestion is to explicitely run tests on a
given module. Would it also be possible to run tests in one run,
launching 'make' in the code root folder?

Yes. In Dir.make (used for directories) the "test" target would be
recursive, i.e. it would run "make -C $dir test" for each
subdirectory, the same way that the "clean" target is handled. This
target would literally be just:

  test: test-recursive

Dir.make already has a pattern rule for "%-recursive", which makes the
stem (whatever the "%" matches) for each subdirectory.

> > Tests are in each module's folder (written in Python)
>
> Python might be overkill for module tests. The build system absolutely
> requires a Bourne shell (for make), so portability isn't an issue.
>
> The per-module test script shouldn't need to analyse the output; it
> can just be a sequence of commands which generate textual output (e.g.
> a test script for a raster module might use r.out.ascii to generate
> this; a test script for a display module would output to a PPM file
> which can just be "cat"d). The analysis would consist of the output
> being compared against a reference file.

It saves us writing lots of functions that would compare datatypes; but
OTOH it would add dependencies of other GRASS modules. Then, it will be
necessary (or at least better) to test first the modules without
dependencies. We would need to check if a test includes more than one
GRASS module... or just let all tests run and fail, and then understand
which modules are actually broken, and causing other (correct) modules'
test to fail.

I don't think that there's any way around this. Few GRASS modules are
useful in isolation. On the positive side, modules which are essential
for tests (e.g. g.region, *.out.ascii) etc are unlikely to be changed
regularly.

As I see it, this isn't really any different to the situation where
someone commits a change which causes a library to fail to compile,
and every module which uses that library also fails to compile.

We chose Python, as Sören already wrote tests as shell scripts and found
them too cumbersome and difficult to mantain.

But is this because you're trying to perform analysis of the results
within the test scripts?

--
Glynn Clements <glynn@gclements.plus.com>

On Mon, May 23, 2011 at 4:04 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Anne Ghisla wrote:

> I'd suggest adding a "test" target to the various *.make files, so you
> can do e.g.:
>
> make -C display test
> make -C display/d.rast test

Good idea. Sadly I don't know makefile system well, so most of my
questions are basic. Your suggestion is to explicitely run tests on a
given module. Would it also be possible to run tests in one run,
launching 'make' in the code root folder?

Yes. In Dir.make (used for directories) the "test" target would be
recursive, i.e. it would run "make -C $dir test" for each
subdirectory, the same way that the "clean" target is handled. This
target would literally be just:

   test: test\-recursive

Dir.make already has a pattern rule for "%-recursive", which makes the
stem (whatever the "%" matches) for each subdirectory.

There is already a test in the vector libs [0], performed during
compilation because there is no test target in the GRASS make system.
Maybe looking at [0] gives you an idea how to add tests to the make
system. In [0], a test target is added and always called when running
make (default calls test). It's pretty much like Glynn suggested: have
a reference file *.ok, create a test file, compare reference and test
file, here with diff. Files differ -> test failed. That was pretty
convenient when I implemented vector LFS and tested it on different
platforms.

Markus M

[0] https://trac.osgeo.org/grass/browser/grass/trunk/lib/vector/diglib/Makefile#L26

Hi,

[snip]

It saves us writing lots of functions that would compare datatypes; but
OTOH it would add dependencies of other GRASS modules. Then, it will be
necessary (or at least better) to test first the modules without
dependencies. We would need to check if a test includes more than one
GRASS module... or just let all tests run and fail, and then understand
which modules are actually broken, and causing other (correct) modules'
test to fail.

I don't think that there's any way around this. Few GRASS modules are
useful in isolation. On the positive side, modules which are essential
for tests (e.g. g.region, *.out.ascii) etc are unlikely to be changed
regularly.

Indeed, we will need several modules for data generation, region
settings and data output. The testsuite for grass 6.4 uses g.region,
r/r3/v.out.ascii, r.mapcalc and several other modules.

We chose Python, as Sören already wrote tests as shell scripts and found
them too cumbersome and difficult to mantain.

But is this because you're trying to perform analysis of the results
within the test scripts?

In many cases there are situtation where you will need to perform
several independent tests for a single module with several kinds of
outputs each test:

* Module output on stdout
* single or multiple raster and voxel maps
* single or multiple vector maps
* mixed raster and vector output
* and so on

The framework should be able to handle such output automatically i.e:
* parsing the stout and compare it with validation data (text file,
MD5 checksum, ...)
* Generation of ASCII output from raster, vector and voxel maps to
compare with validation data (text file, MD5 checksum, ...)

To make it as easy as possible the writer of the test should only
provide the command line of the test and data generation and the
validation data. I tried to implement this using the shell approach,
but because of the limitation of the shell this was much to complex. I
am not sure if this kind of data handling can be done using only the
make system?

The next reason is the generation of HTML representation of the test
results for web access. The idea is to run tests on several different
configured server after compilation. The result of these tests will be
posted in the web. This is an important feature, because the developer
will see what modules/libraries failed the tests on which machine
configuration. Have a look at the VTK dashboard[1] and the result of
the old testsuite [2].

The old testsuite has the possibility to run each module in a valgrind
environment to find memory leaks. It is able to validate outputs of
different kind against MD5 checksums. It logs stdout and stderr of the
module and provides this data as HTML output. [3] It links to the test
description [4].

Using python the implementation will be much easier as with shell and
the related unix toolset.

Best regards
Soeren

[1] http://www.cdash.org/CDash/index.php?project=VTK
[2] http://www-pool.math.tu-berlin.de/~soeren/grass/GRASS_TestSuite/html_grass-6.4/
[3] http://www-pool.math.tu-berlin.de/~soeren/grass/GRASS_TestSuite/html_grass-6.4/log.html
[4] http://www-pool.math.tu-berlin.de/~soeren/grass/GRASS_TestSuite/html_grass-6.4//r.gwflow-test.sh.html

--
Glynn Clements <glynn@gclements.plus.com>
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Hi,
i have updated the wiki and added a simple Python example of a
hypothetical r.series test. Please have a look at:
http://grass.osgeo.org/wiki/Test_Suite#Test_framework

The sample Python code shows in principle how a test case would look
like and what kind of methods should be provided by the framework.

I have put the old but functional test suite for grass6 into the
google code repository:
http://code.google.com/p/grass6-test-suite/

Maybe the documentation is of interest for you too:
http://grass6-test-suite.googlecode.com/files/GRASS-Testsuite-Presentation.pdf

Here a quick start:
http://code.google.com/p/grass6-test-suite/wiki/QuickStart

Best
Soeren

Soeren Gebbert wrote:

i have updated the wiki and added a simple Python example of a
hypothetical r.series test. Please have a look at:
http://grass.osgeo.org/wiki/Test_Suite#Test_framework

The sample Python code shows in principle how a test case would look
like and what kind of methods should be provided by the framework.

This isn't a particularly good example. For most modules, you won't be
able to synthesise the reference data with r.mapcalc; you'll need to
provide it.

But my main concern is that few people will bother to learn a complex
test framework. If you want people to write tests, they need to be
little more than sample commands. Anything else (e.g. directives which
indicate how to compare the output) needs to be capable of being
summarised in a dozen lines. E.g.:

  # Test the average method
  r.series input=testmap1,testmap2 output=resmap_av method=average
  # Test the sum method
  r.series input=testmap1,testmap2 output=resmap_sum method=sum
  # Test the max method
  r.series input=testmap1,testmap2 output=resmap_max method=max

The implementation of the framework which executes test scripts can be
as complex as you like, but the interface (i.e. the syntax of test
scripts) needs to be trivial.

Boilerplate should be minimised; a region should already be set, test
maps should be available, parameters should have default values. Don't
force the user to specify anything that the framework ought to be able
to do itself.

E.g. in the above example, if a command creates a raster map named
resmap_av and a file named resmap_av.ref exists, that should be
sufficient for the framework to deduce that it should compare the map
to the reference data using default parameters.

--
Glynn Clements <glynn@gclements.plus.com>

Hi Glynn,

2011/6/3 Glynn Clements <glynn@gclements.plus.com>:

Soeren Gebbert wrote:

i have updated the wiki and added a simple Python example of a
hypothetical r.series test. Please have a look at:
http://grass.osgeo.org/wiki/Test_Suite#Test_framework

The sample Python code shows in principle how a test case would look
like and what kind of methods should be provided by the framework.

This isn't a particularly good example. For most modules, you won't be
able to synthesise the reference data with r.mapcalc; you'll need to
provide it.

Indeed, you are right, bad example. I just tried to show some
hypothetical interface methods. There should be of course default
region settings and test maps available.

But my main concern is that few people will bother to learn a complex
test framework. If you want people to write tests, they need to be
little more than sample commands. Anything else (e.g. directives which
indicate how to compare the output) needs to be capable of being
summarised in a dozen lines. E.g.:

   \# Test the average method
   r\.series input=testmap1,testmap2 output=resmap\_av method=average
   \# Test the sum method
   r\.series input=testmap1,testmap2 output=resmap\_sum method=sum
   \# Test the max method
   r\.series input=testmap1,testmap2 output=resmap\_max method=max

I was thinking about a similar approach, but the effort to parse the
modules XML interface description to identify the command line
arguments to compare the created data was to much effort for me.

Besides that, the handling of test description, module dependencies
and the comparison of multiple/timeseries outputs (r.sim.water)
bothered me too. I still have no simple (interface) answers to this
issues (maybe these are no issues??).

The implementation of the framework which executes test scripts can be
as complex as you like, but the interface (i.e. the syntax of test
scripts) needs to be trivial.

Well, i thought my example is trivial ... but your may be right. My
perspective is a developers perspective familiar with different test
frameworks. Switching to a software tester perspective is difficult
for me, so many thanks for your feedback Glynn!

Boilerplate should be minimised; a region should already be set, test
maps should be available, parameters should have default values. Don't
force the user to specify anything that the framework ought to be able
to do itself.

Your are completely right, i had this in mind too. I would like to put
several small test locations in the grass source tree for this reason.
These locations, which differ only in the coordinate reference system,
should provide test maps and data for most test cases in the PERMANENT
mapset. The module test mapsets are created on the fly and removed for
each module test, to avoid a test location mess-up.

E.g. in the above example, if a command creates a raster map named
resmap_av and a file named resmap_av.ref exists, that should be
sufficient for the framework to deduce that it should compare the map
to the reference data using default parameters.

I see, a simple but powerful approach, i have sometimes the feeling i
think much to complicated.

Best regards
Soeren

--
Glynn Clements <glynn@gclements.plus.com>

Soeren Gebbert wrote:

I was thinking about a similar approach, but the effort to parse the
modules XML interface description to identify the command line
arguments to compare the created data was to much effort for me.

I don't see a need to parse the command; just execute it and see what
files it creates.

Besides that, the handling of test description, module dependencies
and the comparison of multiple/timeseries outputs (r.sim.water)
bothered me too. I still have no simple (interface) answers to this
issues (maybe these are no issues??).

Dependencies aren't really an issue. You build all of GRASS first,
then test. Any modules which are used for generating test maps or
analysing data are assumed to be correct (they will have test cases of
their own; the most that's required is that such modules are marked as
"critical" so that any failure will be presumed to invalidate the
results of all other tests).

> E.g. in the above example, if a command creates a raster map named
> resmap_av and a file named resmap_av.ref exists, that should be
> sufficient for the framework to deduce that it should compare the map
> to the reference data using default parameters.

I see, a simple but powerful approach, i have sometimes the feeling i
think much to complicated.

I don't normally advocate such approaches, but testing is one of those
areas which (like documentation) is much harder to get people to work
on than e.g. programming, so minimising the effort involved is
important.

Minimising the learning curve is probably even more important. If you
can get people to start writing tests, they're more likely to put in
the effort to learn the less straightforward aspects as it becomes
necessary.

--
Glynn Clements <glynn@gclements.plus.com>

Hello Glynn,
i was thinking a lot about your and my approach and decided finally to
try your approach first with the hope it will be sufficient for any
kind of test cases. I still have concerns about the comparison of
floating point data regarding precision, i.e: coordinates, region
settings, FCELL and DCELL maps, vector attributes ... . More below:

2011/6/3 Glynn Clements <glynn@gclements.plus.com>:

Soeren Gebbert wrote:

I was thinking about a similar approach, but the effort to parse the
modules XML interface description to identify the command line
arguments to compare the created data was to much effort for me.

I don't see a need to parse the command; just execute it and see what
files it creates.

Ok, i see.

Besides that, the handling of test description, module dependencies
and the comparison of multiple/timeseries outputs (r.sim.water)
bothered me too. I still have no simple (interface) answers to this
issues (maybe these are no issues??).

Dependencies aren't really an issue. You build all of GRASS first,
then test. Any modules which are used for generating test maps or
analysing data are assumed to be correct (they will have test cases of
their own; the most that's required is that such modules are marked as
"critical" so that any failure will be presumed to invalidate the
results of all other tests).

I assume such critical modules are coded in the framework, not in the
test scripts? But this also means that the test scripts
must be interpreted and executed line by line by the framework to
identify critical modules used for data generation?

Example for a synthetic r.series test using r.mapcalc for data
generation. r.mapcalc is marked as critical in the framework:

{{{
# r.series synthetic average test with r.mapcalc generated data
# The r.series result is validated using the result.ref file in this
test directory

# Generate the data
r.mapcalc expression="input1 = 1"
r.mapcalc expression="input2 = 2"

# Test the average method of r.series
r.series input=input1,input2 output=result method=average
}}}

Here the assumed workflow:
The framework will read the test script and analyse it line by line.
In case r.mapcalc is marked as critical and the
framework finds the keyword "r.mapcalc" in the script, appearing as
first word outside of a comment, it checks if the r.mapcalc test(s)
already run
correctly and stop the r.series test if they not. In case r.mapcalc
tests are valid it starts the r.mapcalc commands and checks there
return values. If the return values are correct, then the rest of the
script is executed. After reaching the end of this script the
framework looks for any generated data in the current mapset (raster,
raster3d, vector, color, regions, ...) and looks for corresponding
validation files in the test directory. In this case it will find the
raster maps input1, input2 and result in the current mapset and
validation.ref in the test directory. It will use r.out.ascii on
result map choosing a low precision (dp=3??) and compares the output
with result.ref which was hopefully generated using the same
precision.

This example should cover many raster and voxel test cases.

I don't normally advocate such approaches, but testing is one of those
areas which (like documentation) is much harder to get people to work
on than e.g. programming, so minimising the effort involved is
important.

Minimising the learning curve is probably even more important. If you
can get people to start writing tests, they're more likely to put in
the effort to learn the less straightforward aspects as it becomes
necessary.

Ok, i will try to summarize this approach:

The test framework will be integrated in the source code of grass and
will use the make system to execute tests.
The make system should be used to:
* run single module or library tests
* run all module (raster|vector|general|db ...) tests
* run all library tests
* run all tests (library than modules)
* in case of an all-modules-test it should run critical module tests
automatically first

Two test locations (LL and UTM?) should be generated and added to the
grass sources. The test locations provide all kind of needed test data
-> raster maps of different type (elevation maps, images, maps of
CELL, FCELL and DCELL type, ...), vector maps (point, line, area,
mixed with and with out attribute data), voxel data, regions, raster
maps with different color tables, reclassified maps and so on ... .
The test data is only located in PERMANENT mapset. But the locations
should be small enough to fit in svn without performance issues.

Each module and library has its own test directory. The test
directories contain the test cases, reference text files and data for
import (for *.in.* modules). Validation of data is based on the
reference text files located in the test directories for each
module/library. Files implementing test cases must end with ".sh",
reference files must end with ".ref" . The test cases are based on
simple shell style text files, so they can be easily implemented and
executed on command line by non developers. Comments in the test case
files are used as documentation of the test in the test summary.

The framework itself should be implemented in Python. It should
provide the following functionality:
* Parsing and interpretation of test case files
* Logging of all executed test cases
* Simple but excellent presentation of test results in different
formats (text, html, xml?)
* Setting up the test location environment and create/remove temporary
mapsets for each test case run

* Comparison methods for all testable grass datatypes (raster, color,
raster3d, vector, db tables, region, ...) with text files
** test of equal data
** test of almost equal data (precision issue of floating point data
on different systems)
*** ! using *.out.ascii modules with precision flag should work?
** Equal and almost equal key value tests (g.region -g, r.univar, ...)
of text files <-- i am not sure how to realize this

* Execution of single test cases
** Reading and analyzing the test case
** Identification of critical modules
** Run of single modules logging stdout, stderr and return value
** Analysis of return values -> indicator if the module/test failed
*** ! this assumes that commands in the test cases make no use of pipes
** Recognition of all generated data by modules
*** Searching grass database for new raster, vector, raster3d,
regions, ... in the temporary mapset
*** Searching for new generated text or binary files in the test directory
** Recognition of validation data in the test directory
** Comparison of found data with available reference data
** Logging of the validation process
** Removing the temporary mapset and generated data in the test directory
* maybe much more ...

Here some test case which must be covered:

A simple g.region test with validation. A region.ref text file is
present in the test directory. It is a file with key-value pairs used
to validate the output of g.region -g.

g.region_test.sh
{{{
# This is the introduction text for the g.region test

# this is the description of the first module test run
g.region -g > region
}}}

The framework will recognize the new text file "region" and the
reference file "region.ref" in key value format in the test dir and
should use an almost equal key value test for validation. The same
approach should work for r.univar and similar modules with shell
output.

Now a simple v.random test.
Because the data is generated randomly the coordinates can not be
compared. We need to compare the meta information. A file named
result.ref is present in key value format.

v.random_test.sh
{{{
# This is a simple test of v.random
# validation is based on meta information

v.random output=random_points n=100
v.info -t random_points > result
}}}

As with g.region tests the framework should recognize the text file
key value validation.

A simple v.buffer test. The vector point map "points" is located in
the PERMANENT mapset of the test location. A file named result.ref is
located for validation in the test directory. The file was generated
with v.out.ascii dp=3 format=standard.

v.buffer_test.sh
{{{
# Test the generation of a buffer around points

# Buffer of 10m radius
v.buffer input=points output=result distance=10
}}}

In this case the framework recognize a new vector map and the
result.ref text file. It uses v.out.ascii with dp=3 to export the the
result vector map in "standard" format and compares it with
result.ref. The format "standard" is the default method to compare
vector data and cannot be changed in the test case scripts.

I think most of the test cases which we need can be covered with this
approach. But the test designer must know that the validation data
must be of specific type and precision.

I hope i was not to redundant in my thoughts and explanations. :slight_smile:

So what are you thinking, Glynn, Anne, Martin and all interested
developers? If this approach is ok, i will put it into the wiki.

Best regards
Soeren

--
Glynn Clements <glynn@gclements.plus.com>

Soeren Gebbert wrote:

> Dependencies aren't really an issue. You build all of GRASS first,
> then test. Any modules which are used for generating test maps or
> analysing data are assumed to be correct (they will have test cases of
> their own; the most that's required is that such modules are marked as
> "critical" so that any failure will be presumed to invalidate the
> results of all other tests).

I assume such critical modules are coded in the framework, not in the
test scripts?

I was thinking about a directive (e.g. "@critical") in the test script
so that any failure during the test would generate a more prominent
message. If any such errors occured as a result of "make test", you
would ignore all the other failures. In the same way that if error.log
has an error for e.g. lib/gis, you wouldn't bother about all of the
subsequent errors and focus on what was wrong with lib/gis.

But this also means that the test scripts
must be interpreted and executed line by line by the framework to
identify critical modules used for data generation?

Test failures should not occur for critical modules. If they do, you
deal with the critical module, and ignore everything else until that
has been dealt with.

The test scripts would need to be processed a command at a time for
other reasons (assuming that the framework is going to be doing more
than simply executing the commands).

Example for a synthetic r.series test using r.mapcalc for data
generation. r.mapcalc is marked as critical in the framework:

In case r.mapcalc is marked as critical and the framework finds the
keyword "r.mapcalc" in the script, appearing as first word outside
of a comment, it checks if the r.mapcalc test(s) already run
correctly and stop the r.series test if they not.

I wouldn't bother with this part. If the user runs "make test" from
the top level, r.mapcalc's tests will end up getting run. If they
fail, then the user will get an error message informing them that a
critical module failed and that they should ignore everything else
until that has been addressed.

If you're doing repeated tests on a specific module that you're
working on, you don't want to be re-running the r.mapcalc, r.out.ascii
etc tests every time.

In case r.mapcalc
tests are valid it starts the r.mapcalc commands and checks there
return values. If the return values are correct, then the rest of the
script is executed. After reaching the end of this script the
framework looks for any generated data in the current mapset (raster,
raster3d, vector, color, regions, ...) and looks for corresponding
validation files in the test directory. In this case it will find the
raster maps input1, input2 and result in the current mapset and
validation.ref in the test directory. It will use r.out.ascii on
result map choosing a low precision (dp=3??) and compares the output
with result.ref which was hopefully generated using the same
precision.

Only result.ref would exist, so there's no need to export and validate
input1 and input2. In general, you don't need to traverse the entire
mapset directory, but only test for specific files for which a
corresponding ".ref" file exists.

I'd export as much precision as is likely to be meaningful, erring on
the side of slightly too much precision. The default comparison
tolerance should be just large enough that it won't produce noise for
the majority of modules. Modules which require more tolerance (e.g.
due to numerical instability) should explicitly enlarge the tolerance
and/or set an "allowed failures" limit.

The test framework will be integrated in the source code of grass and
will use the make system to execute tests.
The make system should be used to:
* run single module or library tests
* run all module (raster|vector|general|db ...) tests
* run all library tests
* run all tests (library than modules)
* in case of an all-modules-test it should run critical module tests
automatically first

Any directory with a Makefile should support "make test" one way or
another. Usually via appropriate rules in Lib.make, Module.make,
Dir.make, etc. Dir.make would just run "make test" recursively (see
the %-recursive pattern rule in Dir.make); the others would look for a
test script then use the framework to execute it.

The top-level Makefile includes Dir.make, so "make test" would use the
recursive rule (a special rule for testing critical modules could be
added as a prerequisite). Testing the libraries would just be
"make -C lib test" (i.e. recursive in the "lib" directory); similarly
for raster, vector, etc.

Two test locations (LL and UTM?)

Possibly X-Y as well; even if we don't add any test data to them, a
test script can create a map easier than creating a location.

Each module and library has its own test directory. The test
directories contain the test cases, reference text files and data for
import (for *.in.* modules).

I'm not sure we need a separate subdirectory for the test data.

** Equal and almost equal key value tests (g.region -g, r.univar, ...)
of text files <-- i am not sure how to realize this

Equal is easy, but almost equal requires being able to isolate
numbers, and possibly determine their context so that context-specific
comparison parameters can be used.

v.random output=random_points n=100
v.info -t random_points > result

If the framework understands vector maps, it shouldn't be necessary to
use v.info; it should be able to compare the random_points map to
random_points.ref.

One thing that I hadn't thought about much until now is that maps can
have a lot of different components, different modules affect different
components, and the correct way to perform comparisons would vary
between components.

Having the framework auto-detect maps doesn't tell it which components
of the maps it should be comparing. But having the test script perform
export to text files doesn't tell the framework anything about how to
perform the comparison (unless the framework keeps track of which
commands generated which text file, and has specific rule for specific
commands).

The only "simple" solution (in terms of writing test scripts) is to
have the framework compare all components of the map against the
reference, which means that the export needs to be comprehensive (i.e.
include all metadata).

--
Glynn Clements <glynn@gclements.plus.com>

On 09/06/11 10:01, Glynn Clements wrote:

Soeren Gebbert wrote:

v.random output=random_points n=100
v.info -t random_points> result

If the framework understands vector maps, it shouldn't be necessary to
use v.info; it should be able to compare the random_points map to
random_points.ref.

But how to compare two randomly generated maps ? Unless the system is not really random, the maps _should_ be different.

Moritz

Hi,
Thats why I compare the topology information and not the position/coordinates. In any cases 100 points must be generated.

Best
Soeren

Am 09.06.2011 12:21 schrieb “Moritz Lennert” <mlennert@club.worldonline.be>:

On 09/06/11 10:01, Glynn Clements wrote:

Soeren Gebbert wrote:

v.random output=random_points n=100
v.info -t random_points> result

If the framework understands vector maps, it shouldn’t be necessary to
use v.info; it should be able to compare the random_points map to
random_points.ref.

But how to compare two randomly generated maps ? Unless the system is
not really random, the maps should be different.

Moritz

Moritz wrote:

But how to compare two randomly generated maps ? Unless the
system is not really random, the maps _should_ be different.

Good eye, but the system is not really random, only pseudo-random.

Use the GRASS_RND_SEED enviro variable to get the same result
each time by using the same seed. (currently only implemented for
r.mapcalc?)

Hamish

Moritz Lennert wrote:

>> v.random output=random_points n=100
>> v.info -t random_points> result
>
> If the framework understands vector maps, it shouldn't be necessary to
> use v.info; it should be able to compare the random_points map to
> random_points.ref.

But how to compare two randomly generated maps ? Unless the system is
not really random, the maps _should_ be different.

Ugh; v.random should have a seed= option (or something) to allow
repeatability. Or at least an option to disable seeding with the PID.

For comparison, r.mapcalc's rand() is seeded from the GRASS_RND_SEED
environment variable. If that isn't set, it isn't explicitly seeded
(i.e. the default seed is used). If you want different results for
each run, you need to explicitly change GRASS_RND_SEED each time.

--
Glynn Clements <glynn@gclements.plus.com>

On 10/06/11 02:29, Glynn Clements wrote:

Moritz Lennert wrote:

v.random output=random_points n=100 v.info -t random_points>
result

If the framework understands vector maps, it shouldn't be
necessary to use v.info; it should be able to compare the
random_points map to random_points.ref.

But how to compare two randomly generated maps ? Unless the system
is not really random, the maps _should_ be different.

Ugh; v.random should have a seed= option (or something) to allow
repeatability. Or at least an option to disable seeding with the
PID.

For comparison, r.mapcalc's rand() is seeded from the GRASS_RND_SEED
environment variable. If that isn't set, it isn't explicitly seeded
(i.e. the default seed is used). If you want different results for
each run, you need to explicitly change GRASS_RND_SEED each time.

So which of the two option (a seed= parameter / disabling seeding by PID
vs GRASS_RND_SEED) is preferrable ? I would think that we should try to
be consistent across modules, but r.mapcalc uses GRASS_RND_SEED and
v.perturb uses 'seed=' (but also doesn't use rand()). Maybe the occasion to unify all this ?

Moritz

Moritz Lennert wrote:

So which of the two option (a seed= parameter / disabling seeding by PID
vs GRASS_RND_SEED) is preferrable ? I would think that we should try to
be consistent across modules, but r.mapcalc uses GRASS_RND_SEED and
v.perturb uses 'seed=' (but also doesn't use rand()). Maybe the occasion
to unify all this ?

r.mapcalc uses an environment variable because the pre-7.0 version
doesn't use G_parser(); all arguments are concatenated and the result
is treated as an expression.

In 7.0, adding seed= to r.mapcalc would be straightfoward; in 6.x,
it's not really an option.

--
Glynn Clements <glynn@gclements.plus.com>

Hi,

wrt, but not limited to the test suite, see also wish #618 "rfe:
r.md5sum"

https://trac.osgeo.org/grass/ticket/618

maybe it is a useful tool to have in the mix.

Hamish

Hello,

I assume such critical modules are coded in the framework, not in the
test scripts?

I was thinking about a directive (e.g. "@critical") in the test script
so that any failure during the test would generate a more prominent

Such kind of annotations/directives are a great idea. I was thinking
about a similar approach. I have had in mind to add such directives
into test cases to identify data preprocessing steps, test calls and
critical modules. I would like to use them as part of the test case
documentation:

r.mapclac_test.sh
{{{
# This test case is designed to test r.mapcalc,
# which is @critical for many other tests.

# We need to perform a @preprocess step
# with g.region to set up a specific LL test region
g.region s=0 w=0 n=90 e = 180 res=1

# The first @test generates a CELL raster map with value 1
r.mapcalc expression="result1 = 1"
...
}}}

IMHO each test case should be well documented, so why not using
annotations as part of the documentation? Additionally i would like to
add the tests to the bottom of the HTML manual pages automatically as
examples.

message. If any such errors occured as a result of "make test", you
would ignore all the other failures. In the same way that if error.log
has an error for e.g. lib/gis, you wouldn't bother about all of the
subsequent errors and focus on what was wrong with lib/gis.

Is it possible to stop "make test" in case a library test failed or a
critical module?

But this also means that the test scripts
must be interpreted and executed line by line by the framework to
identify critical modules used for data generation?

Test failures should not occur for critical modules. If they do, you
deal with the critical module, and ignore everything else until that
has been dealt with.

Indeed. I would suggest to put critical modules on top of the
directory makefiles to assure
that they are executed first and recursive testing stops when one of
them fails.
In case any library test or critical module failed, no further module
test should performed.

The test scripts would need to be processed a command at a time for
other reasons (assuming that the framework is going to be doing more
than simply executing the commands).

I had in mind that the return value of each command is checked and
stderr is logged for further analysis. The framework must be able to
identify which command failed/succeeded and for which command data
validation was available and successful. This data should be available
in the detailed test case specific HTML log files.

Example for a synthetic r.series test using r.mapcalc for data
generation. r.mapcalc is marked as critical in the framework:

In case r.mapcalc is marked as critical and the framework finds the
keyword "r.mapcalc" in the script, appearing as first word outside
of a comment, it checks if the r.mapcalc test(s) already run
correctly and stop the r.series test if they not.

I wouldn't bother with this part. If the user runs "make test" from
the top level, r.mapcalc's tests will end up getting run. If they
fail, then the user will get an error message informing them that a
critical module failed and that they should ignore everything else
until that has been addressed.

If you're doing repeated tests on a specific module that you're
working on, you don't want to be re-running the r.mapcalc, r.out.ascii
etc tests every time.

I don't know if this can be avoided in an automated test system.
Especially when each time a test case gets executed a temporary mapset
is created.
Except the developer comments the preprocessing steps out and executes
the script manually in the test location.

In case r.mapcalc
tests are valid it starts the r.mapcalc commands and checks there
return values. If the return values are correct, then the rest of the
script is executed. After reaching the end of this script the
framework looks for any generated data in the current mapset (raster,
raster3d, vector, color, regions, ...) and looks for corresponding
validation files in the test directory. In this case it will find the
raster maps input1, input2 and result in the current mapset and
validation.ref in the test directory. It will use r.out.ascii on
result map choosing a low precision (dp=3??) and compares the output
with result.ref which was hopefully generated using the same
precision.

Only result.ref would exist, so there's no need to export and validate
input1 and input2. In general, you don't need to traverse the entire
mapset directory, but only test for specific files for which a
corresponding ".ref" file exists.

Thats indeed much more efficient.

I'd export as much precision as is likely to be meaningful, erring on
the side of slightly too much precision. The default comparison
tolerance should be just large enough that it won't produce noise for
the majority of modules. Modules which require more tolerance (e.g.
due to numerical instability) should explicitly enlarge the tolerance
and/or set an "allowed failures" limit.

Where to set the precision in the test case? As @precision directive
which will be used for each test in the test case file or as
environment variable? The former discussed hierarchical python class
solution would provide specific function for this case ... .

Any directory with a Makefile should support "make test" one way or
another. Usually via appropriate rules in Lib.make, Module.make,
Dir.make, etc. Dir.make would just run "make test" recursively (see
the %-recursive pattern rule in Dir.make); the others would look for a
test script then use the framework to execute it.

The top-level Makefile includes Dir.make, so "make test" would use the
recursive rule (a special rule for testing critical modules could be
added as a prerequisite). Testing the libraries would just be
"make -C lib test" (i.e. recursive in the "lib" directory); similarly
for raster, vector, etc.

Yes. Do we need special rules for critical modules or is the order of
the module directories in the Makefile's combined with a critical
annotation sufficient?

Two test locations (LL and UTM?)

Possibly X-Y as well; even if we don't add any test data to them, a
test script can create a map easier than creating a location.

Each module and library has its own test directory. The test
directories contain the test cases, reference text files and data for
import (for *.in.* modules).

I'm not sure we need a separate subdirectory for the test data.

I am sure we need several. I would suggest a separate test directory
for each test location: "test_UTM" and "test_LL". Several modules will
only work in UTM locations, other in booth. Each directory may contain
several test case files for different modules (r.univar/r3.univar) and
several .ref files.

** Equal and almost equal key value tests (g.region -g, r.univar, ...)
of text files <-- i am not sure how to realize this

Equal is easy, but almost equal requires being able to isolate
numbers, and possibly determine their context so that context-specific
comparison parameters can be used.

The framework should support almost equal comparison and
identification for shell style output available in several modules:

north=234532.45
south=5788374.45
...

Almost equal comparison for raster, voxel and vector maps must be
realized using the precision option of the *.out.ascii modules. In
case of database output i am not sure how to realize almost equal
comparison for floating point data.

v.random output=random_points n=100
v.info -t random_points > result

If the framework understands vector maps, it shouldn't be necessary to
use v.info; it should be able to compare the random_points map to
random_points.ref.

v.info may not be necessary in case a seed option is available for v.random.

One thing that I hadn't thought about much until now is that maps can
have a lot of different components, different modules affect different
components, and the correct way to perform comparisons would vary
between components.

Do you refer to different feature types in vector maps? Point, line,
border, area, centroid and so on?

Having the framework auto-detect maps doesn't tell it which components
of the maps it should be comparing. But having the test script perform
export to text files doesn't tell the framework anything about how to
perform the comparison (unless the framework keeps track of which
commands generated which text file, and has specific rule for specific
commands).

IMHO its not a good framework design if it should know specific rules
for specific commands.

The only "simple" solution (in terms of writing test scripts) is to
have the framework compare all components of the map against the
reference, which means that the export needs to be comprehensive (i.e.
include all metadata).

This is the solution which i have in mind. But in case of vector data
we may need to combine v.out.ascii type''=standard + v.info +
db.select.

Best regards
Soren