[GRASS-dev] GRASS GIS nightly builds

Hi,

to test the efficiency (does 650% of the CPU go 6.5x as fast as
running 100% on a single core?) you can use the OMP_* environment
variables. from the bash command line:

# try running it serially:
OMP_NUM_THREADS=1
export OMP_NUM_THREADS
time g.module ...

# let OpenMP set number of concurrent threads to number of local CPU cores
unset OMP_NUM_THREADS
time g.module ...

then compare the overall & system time to complete.
see http://grasswiki.osgeo.org/wiki/OpenMP#Run_time

if that is horribly inefficient, it will probably be more
efficient to run multiple (different) jobs serially, at the same
time. The bash "wait" command is quite nice for that, waits
for all backgrounded jobs to complete before going on.

for r.in.{xyz|lidar|mb} this works quite well for generating
multiple statistics at the same time, as the jobs will all want
to read the same part of the input file at the about the same
time, so it will still be fresh in the disk cache keeping I/O
levels low. (see the r3.in.xyz scripts)

for v.surf.bspline my plan was to put each of the data subregions
in their own thread; for v.surf.rst my plan was to put each of
the quadtree squares into their own thread. Since each thread
introduces a finite amount of time to create and destroy, the
goal is to make fewer, longer running ones. Anything more than ~
an order of mangnitude more that the number of cores you have is
unneeded overhead.

e.g., processing all satellite bands at the same time is a nice
efficient win. If you process all 2000 rows of a raster map in
2000 just-an-instant-to-complete threads, the create/destroy
overhead to thread survival time really takes its toll.
Even as thread creation/destruction overheads become more
efficiently handled by the OSs and compilers, the situation will
still be the same. The interesting case is OpenCL, where your
video card can run 500 GPU units..

Hamish

On Mon, Feb 25, 2013 at 6:14 AM, Hamish <hamish_b@yahoo.com> wrote:

Hi,

for v.surf.bspline my plan was to put each of the data subregions
in their own thread

Be aware that the order of the subregions matters right now. You will
need to rewrite lib/lidar and all modules that use the lidarlib in
order to change that and be able to put subregions into their own
threads. Be aware that disk I/O is not thread safe, you would need to
read input data for each subregion into a separate temp file with a
unique file descriptor per thread.

Markus M

; for v.surf.rst my plan was to put each of

the quadtree squares into their own thread. Since each thread
introduces a finite amount of time to create and destroy, the
goal is to make fewer, longer running ones. Anything more than ~
an order of mangnitude more that the number of cores you have is
unneeded overhead.

e.g., processing all satellite bands at the same time is a nice
efficient win. If you process all 2000 rows of a raster map in
2000 just-an-instant-to-complete threads, the create/destroy
overhead to thread survival time really takes its toll.
Even as thread creation/destruction overheads become more
efficiently handled by the OSs and compilers, the situation will
still be the same. The interesting case is OpenCL, where your
video card can run 500 GPU units..

Hamish
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Ok folks,
I am a bit confused now. After setting OMP_NUM_THREADS=1 and exporting, I get

100%
v.surf.rst complete.

real 352m46.451s
user 341m14.196s
sys 2m16.477s

Over 100 minutes faster. So the multiple cores get in each other’s way…

Recompiling without OpenMP…

Thanks!

Doug

On Mon, Feb 25, 2013 at 12:14 AM, Hamish <hamish_b@yahoo.com> wrote:

Hi,

to test the efficiency (does 650% of the CPU go 6.5x as fast as
running 100% on a single core?) you can use the OMP_* environment
variables. from the bash command line:

try running it serially:

OMP_NUM_THREADS=1
export OMP_NUM_THREADS
time g.module …

let OpenMP set number of concurrent threads to number of local CPU cores

unset OMP_NUM_THREADS
time g.module …

then compare the overall & system time to complete.
see http://grasswiki.osgeo.org/wiki/OpenMP#Run_time

if that is horribly inefficient, it will probably be more
efficient to run multiple (different) jobs serially, at the same
time. The bash “wait” command is quite nice for that, waits
for all backgrounded jobs to complete before going on.

for r.in.{xyz|lidar|mb} this works quite well for generating
multiple statistics at the same time, as the jobs will all want
to read the same part of the input file at the about the same
time, so it will still be fresh in the disk cache keeping I/O
levels low. (see the r3.in.xyz scripts)

for v.surf.bspline my plan was to put each of the data subregions
in their own thread; for v.surf.rst my plan was to put each of
the quadtree squares into their own thread. Since each thread
introduces a finite amount of time to create and destroy, the
goal is to make fewer, longer running ones. Anything more than ~
an order of mangnitude more that the number of cores you have is
unneeded overhead.

e.g., processing all satellite bands at the same time is a nice
efficient win. If you process all 2000 rows of a raster map in
2000 just-an-instant-to-complete threads, the create/destroy
overhead to thread survival time really takes its toll.
Even as thread creation/destruction overheads become more
efficiently handled by the OSs and compilers, the situation will
still be the same. The interesting case is OpenCL, where your
video card can run 500 GPU units…

Hamish

Doug Newcomb
USFWS
Raleigh, NC
919-856-4520 ext. 14 doug_newcomb@fws.gov

The opinions I express are my own and are not representative of the official policy of the U.S.Fish and Wildlife Service or Dept. of the Interior. Life is too short for undocumented, proprietary data formats.