[Geoserver-devel] Idea for speeding up tests: why don't we stop writing the configuration to disk?

Hi,
the last series of changes to the build on master sped it up quite a bit,
for reference, a “-T3 -Prelease” build on my desktop took around 12
minutes some time ago, today I’ve seen one complete in 9m30s.
(btw, why -T3? cause it’s the setup that provides the best results on
my machine).

Not bad, but I’m wondering if we could do better.
One thing I’ve noticed is that the GeoTools build scales up much better,
I get the best times with -T2C (equivalent to -T16 on my machine,
4 cores + HT that Linux recognizes as another set of 4 cores).

My guess at the moment is that GeoServer one fails to scale because
most tests are writing out the data directory (and the disk is one…).
While we have to write out the data, I guess we could avoid writing out
the configuration, and have the test work fully from memory instead.

Maybe not all tests will work that way, but most should not really care.
Do you see any reason why this may not work?

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


On 17/02/13 22:54, Andrea Aime wrote:

While we have to write out the data, I guess we could avoid writing out
the configuration, and have the test work fully from memory instead.
Maybe not all tests will work that way, but most should not really care.
Do you see any reason why this may not work?

It depends on what we are trying to test. Skipping reading makes the tests faster but might reduce coverage of configuration loading (but unless the configuration is *changing*, we have plenty).

Are you using an SSD? Even with a huge amount of memory to cache, fileystem buffer flushing may be a bottleneck. Have you tried building in a ramdisk (like tmpfs)? On Debian I set /tmp to be a ramdisk in /etc/default/tmpfs . It grows as needed.

Kind regards,

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre

Makes sense, I can’t think of any issues except that components that expect a data directory on the disk probably won’t be happy. Anyways, easy enough to try. Should just be a matter of updating SystemTestData createCatalog() and createConfig() removing the xstream persister listeners.

···

On Sun, Feb 17, 2013 at 7:54 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
the last series of changes to the build on master sped it up quite a bit,
for reference, a “-T3 -Prelease” build on my desktop took around 12
minutes some time ago, today I’ve seen one complete in 9m30s.
(btw, why -T3? cause it’s the setup that provides the best results on
my machine).

Not bad, but I’m wondering if we could do better.
One thing I’ve noticed is that the GeoTools build scales up much better,
I get the best times with -T2C (equivalent to -T16 on my machine,
4 cores + HT that Linux recognizes as another set of 4 cores).

My guess at the moment is that GeoServer one fails to scale because
most tests are writing out the data directory (and the disk is one…).
While we have to write out the data, I guess we could avoid writing out
the configuration, and have the test work fully from memory instead.

Maybe not all tests will work that way, but most should not really care.
Do you see any reason why this may not work?

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it



The Go Parallel Website, sponsored by Intel - in partnership with Geeknet,
is your hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials, tech docs,
whitepapers, evaluation guides, and opinion stories. Check out the most
recent posts - join the conversation now. http://goparallel.sourceforge.net/


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Right, the idea is that there would be a property that specific tests can override if they really need the
configuration to be stored on disk (pretty much like tests now can say that they want the full
data dir setup to be re-created for each test)

Cheers
Andrea

On Mon, Feb 18, 2013 at 4:03 AM, Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com> wrote:

It depends on what we are trying to test. Skipping reading makes the tests faster but might reduce coverage of configuration loading (but unless the configuration is changing, we have plenty).

It surely will, but some tests will need the configuration to be actually written anyways, I plan to add a flag for that

Are you using an SSD? Even with a huge amount of memory to cache, fileystem buffer flushing may be a bottleneck. Have you tried building in a ramdisk (like tmpfs)? On Debian I set /tmp to be a ramdisk in /etc/default/tmpfs . It grows as needed.

I don’t have SSDs (anyone can sell me a 2TB SSD, and if that thing even exists, for which I don’t have to sell my car to buy?),
I’m aware of the ramdisk thing, but it’s not an option imho, I want something that works out of the box.
We cannot have N file systems, one for each target directory (the data dirs are created there)

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


On Tue, Feb 19, 2013 at 3:02 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Makes sense, I can’t think of any issues except that components that expect a data directory on the disk probably won’t be happy. Anyways, easy enough to try. Should just be a matter of updating SystemTestData createCatalog() and createConfig() removing the xstream persister listeners.

Followed up on these suggestions and came up with this patch:
https://github.com/aaime/geoserver/commit/932a1e2020d8e98be4aa469fe17d045f3c0076a5

Basically did what Justin suggested, and added an option in the test base class to write
out the configuration/catalog, or not.
The majority of tests did not blink, some did, in particular the following classes needed
the config to be written out:

  • all tests doing a catalog.reload() (ok, expected)
  • most security tests (have no clue why)
  • all app-schema tests (no idea here either)
    I’m pretty sure there is something still wrong, as it seems several tests mocking with
    the GeoServer service config/global settings do not work properly if we don’t write out
    the config… just don’t know what that is.

Build times imporved some from my previous 9m30s (with -Prelease -T3 -nsu), in particular:

  • -Prelease -T3 nsu → 9m10s
  • -Prelease -T4 nsu → 9m00s

Beyond that build times do not improve anymore, and with -T2C we’re back to 9m50s

Hmmm… I was kinda hoping for more… one thing that I’ve noticed during the build
looking at CPU graphs is that there are significant bits of time in which the CPUs are not
busy because only one module is building. In particualr, the WFS module seems to be
rather pivotal, lots of stuff depends on it… including the WMS one.
I’m guessing parallel build times would improve if we broke that dependency, allowing
WMS to build right after main finished.

Anyways… enough for today, if you’re interested in the topic please chime in

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it