[GRASS-dev] [GRASS GIS] #50: g.copy segfaults (debian.gfoss.it package)

#50: g.copy segfaults (debian.gfoss.it package)
---------------------+------------------------------------------------------
Reporter: steko | Owner: grass-dev@lists.osgeo.org
     Type: defect | Status: new
Priority: major | Milestone: 6.3.0
Component: default | Version: 6.3.0 RCs
Keywords: |
---------------------+------------------------------------------------------
{{{
GRASS 6.3.0svn (snicolo):~ > g.copy rast=fracz,frecz
Copy raster <fracz@PERMANENT> to current mapset as <frecz>
Segmentation fault

}}}

on grass frankie suggested this should be caused by optimization. However
IMHO this is a bug: optimization shouldn't result in a segfault :slight_smile:

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/50&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#50: g.copy segfaults (debian.gfoss.it package)
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Changes (by marisn):

  * keywords: => g.copy
  * milestone: 6.3.0 => 6.4.0

Comment:

This is a known problem and it manifests itself only when g.copy is
compiled with optimisation and system does not have stack protection. Your
GRASS was also compiled with GCC-4.x?

More info here:
http://wald.intevation.org/tracker/index.php?func=detail&aid=431&group_id=21&atid=204

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/50#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#50: g.copy segfaults (debian.gfoss.it package)
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Comment (by steko):

Yes, that package should have been compiled with a recent GCC 4.2.3, as
per Debian sid default. AFAIK the default compiler for Lenny should be
even GCC 4.3.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/50#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Changes (by dylan):

  * summary: g.copy segfaults (debian.gfoss.it package) => g.copy
              segfaults (debian.gfoss.it package) [and latest
              SVN]

Comment:

Ouch. This bug has been biting me for a while now. How can optimization
cause a segfault for {{{g.copy rast=rast1,rast2}}} but not for {{{g.copy
vect=vect1,vect2}}} ? This sounds like a nasty bug to me.

I have noticed this problem with the latest SVN on x86 and AMD platforms,
using the standard compile options:

{{{
# options for opteron:
export CFLAGS="-march=opteron -O2"
export CXXFLAGS="-march=opteron -O2"
#optimization for P4:
export CFLAGS="-march=pentium4 -O2"
export CXXFLAGS="-march=pentium4 -O2"
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/50#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

GRASS GIS wrote:

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Changes (by dylan):

  * summary: g.copy segfaults (debian.gfoss.it package) => g.copy
              segfaults (debian.gfoss.it package) [and latest
              SVN]

Comment:

Ouch. This bug has been biting me for a while now. How can optimization
cause a segfault for {{{g.copy rast=rast1,rast2}}} but not for {{{g.copy
vect=vect1,vect2}}} ?

For vectors, the copying is performed by calling Vect_copy(). For all
other types, g.copy copies the files itself.

This sounds like a nasty bug to me.

Unfortunately, it isn't likely to get fixed unless someone who can get
the segfault to occur can debug it, at least to the extent of
identifying exactly where the segfault occurs.

As it only occurs with optimisation enabled, it will probably need to
be debugged in assembler ("set language asm"), as GDB cannot reliably
relate execution to the source code for optimised code (the structure
of optimised code often bears little resemblence to the source code).

--
Glynn Clements <glynn@gclements.plus.com>

On Friday 11 April 2008, Glynn Clements wrote:

GRASS GIS wrote:
> #50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
> ----------------------+--------------------------------------------------
>--- Reporter: steko | Owner: grass-dev@lists.osgeo.org
> Type: defect | Status: new
> Priority: major | Milestone: 6.4.0
> Component: default | Version: 6.3.0 RCs
> Resolution: | Keywords: g.copy
> ----------------------+--------------------------------------------------
>--- Changes (by dylan):
>
> * summary: g.copy segfaults (debian.gfoss.it package) => g.copy
> segfaults (debian.gfoss.it package) [and latest
> SVN]
>
> Comment:
>
> Ouch. This bug has been biting me for a while now. How can optimization
> cause a segfault for {{{g.copy rast=rast1,rast2}}} but not for {{{g.copy
> vect=vect1,vect2}}} ?

For vectors, the copying is performed by calling Vect_copy(). For all
other types, g.copy copies the files itself.

OK.

> This sounds like a nasty bug to me.

Unfortunately, it isn't likely to get fixed unless someone who can get
the segfault to occur can debug it, at least to the extent of
identifying exactly where the segfault occurs.

Dang! I have seen the bug for a number of months now, across operating systems
and architectures. For starters, here are the GCC optimization flags:

http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

As it only occurs with optimisation enabled, it will probably need to
be debugged in assembler ("set language asm"), as GDB cannot reliably
relate execution to the source code for optimised code (the structure
of optimised code often bears little resemblence to the source code).

Yikes. Can't really do that.

I will post some details to the original ticket.

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Comment (by dylan):

Some updates:

Glyn says that this can only really be debugged by looking at assembly.
Since I cannot do that, another approach might be toggling various
optimization flags in order to get an idea of what may be causing the
segfault. Below are some details on my system / optimization flags /
results.

GCC details:

{{{
gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --enable-
languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-
shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --enable-nls --with-gxx-include-
dir=/usr/include/c++/4.1.3 --program-suffix=-4.1 --enable-__cxa_atexit
--enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --with-
tune=i686 --enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.1.3 20070601 (prerelease) (Debian 4.1.2-12)
}}}

compile flags that cause a segfault:
-O3 (probably segfault)
-O2 (segfault)
-O1 (or) -O (segfault)
-O0 (works as expected)

The actual CFLAGS for no optimization were: {{{export
CFLAGS="-march=opteron -O0 -ggdb -Wall -Werror-implicit-function-
declaration" ; export CXXFLAGS="-march=opteron -O0" ./configure ...}}} .

So it is probably one of these flags that is causing the optimization-
related segfault:
{{{
-fauto-inc-dec
           -fcprop-registers
           -fdce
           -fdefer-pop
           -fdelayed-branch
           -fdse
           -fguess-branch-probability
           -fif-conversion2
           -fif-conversion
           -finline-small-functions
           -fipa-pure-const
           -fipa-reference
           -fmerge-constants
           -fsplit-wide-types
           -ftree-ccp
           -ftree-ch
           -ftree-copyrename
           -ftree-dce
           -ftree-dominator-opts
           -ftree-dse
           -ftree-fre
           -ftree-sra
           -ftree-ter
           -funit-at-a-time
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/50#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Comment (by dylan):

I have attached two strace logs, one from the working version of g.copy
(-O0), and one from the non-working version (-O1). It looks like the first
instruction executed by the working copy has something to do with the
'cellhd' file:

{{{
access("/data/ogeen_lab_shared/grass/sfrec/rtk/cellhd", F_OK) = 0
}}}

This appears to be where the segfault is occuring in the broken version of
g.copy-- maybe?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/50#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Comment (by glynn):

Replying to [comment:5 dylan]:
> I have attached two strace logs, one from the working version of g.copy
(-O0), and one from the non-working version (-O1).

strace doesn't provide enough detail for this kind of problem. All that
can be deduced from the backtraces is that the segfault occurs sometime
after the "cell" files are closed (recursive_copy(),
general/manage/lib/do_copy.c:152) but before the access() test for whether
the cellhd file exists (G__make_mapset_element(), lib/gis/mapset_msc.c:60,
called from do_copy(), general/manage/lib/do_copy.c:42).

AFAICT, the segfault could be occurring in recursive_copy(), G_verbose(),
G__make_mapset_element() or do_copy().

A gdb C backtrace would be more useful, even if it isn't enough to
pinpoint the problem. A C backtrace will still report the correct
function; it just won't necessarily report the correct line number, or the
actual state of any variables (it *might* get this information right, but
as you can't tell whether the information is accurate, it doesn't
necessarily help).

It might be useful to add some printf() statements to do_copy() to print
out variable values. OTOH, adding those statements might just make the bug
disappear.

Also, it would be useful to know whether the problem occurs when lib/gis
is compiled with optimisation or when general/manage is compiled with
optimisation.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/50#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Comment (by dylan):

Replying to [comment:6 glynn]:
> Replying to [comment:5 dylan]:
> > I have attached two strace logs, one from the working version of
g.copy (-O0), and one from the non-working version (-O1).
>
> strace doesn't provide enough detail for this kind of problem. All that
can be deduced from the backtraces is that the segfault occurs sometime
after the "cell" files are closed (recursive_copy(),
general/manage/lib/do_copy.c:152) but before the access() test for whether
the cellhd file exists (G__make_mapset_element(), lib/gis/mapset_msc.c:60,
called from do_copy(), general/manage/lib/do_copy.c:42).
>
> AFAICT, the segfault could be occurring in recursive_copy(),
G_verbose(), G__make_mapset_element() or do_copy().
>
> A gdb C backtrace would be more useful, even if it isn't enough to
pinpoint the problem. A C backtrace will still report the correct
function; it just won't necessarily report the correct line number, or the
actual state of any variables (it *might* get this information right, but
as you can't tell whether the information is accurate, it doesn't
necessarily help).

Thanks for the tips Glynn. Any further hints on how to do the above?

> Also, it would be useful to know whether the problem occurs when lib/gis
is compiled with optimisation or when general/manage is compiled with
optimisation.

After the configure step, how can I disable optimization for specific
parts of the source tree?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/50#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Comment (by marisn):

Hunting down this bug is not as easy. I once spent some days trying but
without luck.(1)[[BR]]
I enabled/disabled all -O1 flags one-by-one, but none of optimisation
flags cause segfault. It must be combination of them;(1) [[BR]]
Using valgrind on optimised code is not usefull;(1)[[BR]]
Changing G_lstat() with lstat() in general/manage/lib/do_copy.c fixes
problem;(1)[[BR]]
Currently all segfault reporters where using GCC 4.x. I have not tested
grass_trunk for some time on this bug, but when it was first reported, I
was able to reproduce segfault on two 32bit Gentoo systems with GCC-4.x
without stack protection. Glynn - try to disable stack protection before
running g.copy. Also please explain how to get proper asm backtrace or any
other variables You are interested into to add printf's, as when I used
printf's, I was not able to spot any strange values.

1.
http://wald.intevation.org/tracker/index.php?func=detail&aid=431&group_id=21&atid=204

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/50#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Comment (by hamish):

fyi, debugging howto hints can be viewed/added here:
   http://grass.gdf-hannover.de/wiki/GRASS_Debugging

If GDB is not an option I have found that just adding really basic printfs
can help to split the problem in half, then in half again, and so on until
you isolate the line the problem is on.
   printf("--> made it this far. fn():line xxxx\n");

the raster libs could use a few new permanent G_debug()s at key spots.

Hamish

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/50#comment:9&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

GRASS GIS wrote:

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]

----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Comment (by dylan):

Replying to [comment:6 glynn]:
> Replying to [comment:5 dylan]:
> > I have attached two strace logs, one from the working version of
g.copy (-O0), and one from the non-working version (-O1).
>
> strace doesn't provide enough detail for this kind of problem. All that
can be deduced from the backtraces is that the segfault occurs sometime
after the "cell" files are closed (recursive_copy(),
general/manage/lib/do_copy.c:152) but before the access() test for whether
the cellhd file exists (G__make_mapset_element(), lib/gis/mapset_msc.c:60,
called from do_copy(), general/manage/lib/do_copy.c:42).
>
> AFAICT, the segfault could be occurring in recursive_copy(),
G_verbose(), G__make_mapset_element() or do_copy().
>
> A gdb C backtrace would be more useful, even if it isn't enough to
pinpoint the problem. A C backtrace will still report the correct
function; it just won't necessarily report the correct line number, or the
actual state of any variables (it *might* get this information right, but
as you can't tell whether the information is accurate, it doesn't
necessarily help).

Thanks for the tips Glynn. Any further hints on how to do the above?

  $ gdb g.copy
  > run rast=t1,t1_copy --o

When it stops due to the segfault:

  > where
  > quit

> Also, it would be useful to know whether the problem occurs when lib/gis
is compiled with optimisation or when general/manage is compiled with
optimisation.

After the configure step, how can I disable optimization for specific
parts of the source tree?

First, compile everything. Then, re-compile specific parts with
different flags using e.g.:

  make -C lib/gis clean
  make -C lib/gis CFLAGS1='-g -O0'

--
Glynn Clements <glynn@gclements.plus.com>

GRASS GIS wrote:

Currently all segfault reporters where using GCC 4.x. I have not tested
grass_trunk for some time on this bug, but when it was first reported, I
was able to reproduce segfault on two 32bit Gentoo systems with GCC-4.x
without stack protection.

I'll install gcc 4.x and see if I can reproduce it.

Glynn - try to disable stack protection before
running g.copy.

What is the option for that?

Also please explain how to get proper asm backtrace or any
other variables You are interested into to add printf's, as when I used
printf's, I was not able to spot any strange values.

Useful commands for assembler debugging:

"disassemble" prints disassembled code, by default the function
corresponding to the selected stack frame.

"info registers" prints the CPU registers.

"x" displays arbitrary regions of memory.

Use "help" within gdb for more information; for full details on the
"x" command, see section 8.5 of the gdb Info file.

--
Glynn Clements <glynn@gclements.plus.com>

On Sat, Apr 12, 2008 at 1:15 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:

GRASS GIS wrote:

> #50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]

> ----------------------+-----------------------------------------------------
> Reporter: steko | Owner: grass-dev@lists.osgeo.org
> Type: defect | Status: new
> Priority: major | Milestone: 6.4.0
> Component: default | Version: 6.3.0 RCs
> Resolution: | Keywords: g.copy
> ----------------------+-----------------------------------------------------
> Comment (by dylan):
>
> Replying to [comment:6 glynn]:
> > Replying to [comment:5 dylan]:
> > > I have attached two strace logs, one from the working version of
> g.copy (-O0), and one from the non-working version (-O1).
> >
> > strace doesn't provide enough detail for this kind of problem. All that
> can be deduced from the backtraces is that the segfault occurs sometime
> after the "cell" files are closed (recursive_copy(),
> general/manage/lib/do_copy.c:152) but before the access() test for whether
> the cellhd file exists (G__make_mapset_element(), lib/gis/mapset_msc.c:60,
> called from do_copy(), general/manage/lib/do_copy.c:42).
> >
> > AFAICT, the segfault could be occurring in recursive_copy(),
> G_verbose(), G__make_mapset_element() or do_copy().
> >
> > A gdb C backtrace would be more useful, even if it isn't enough to
> pinpoint the problem. A C backtrace will still report the correct
> function; it just won't necessarily report the correct line number, or the
> actual state of any variables (it *might* get this information right, but
> as you can't tell whether the information is accurate, it doesn't
> necessarily help).
>
> Thanks for the tips Glynn. Any further hints on how to do the above?

        $ gdb g.copy
        > run rast=t1,t1_copy --o

When it stops due to the segfault:

        > where
        > quit

> > Also, it would be useful to know whether the problem occurs when lib/gis
> is compiled with optimisation or when general/manage is compiled with
> optimisation.
>
> After the configure step, how can I disable optimization for specific
> parts of the source tree?

First, compile everything. Then, re-compile specific parts with
different flags using e.g.:

        make -C lib/gis clean
        make -C lib/gis CFLAGS1='-g -O0'

--
Glynn Clements <glynn@gclements.plus.com>

Thanks for the tips Gylnn.

Some tests:

1. everything _but_ lib/gis compiled with "-O2"
make -C lib/gis clean
make -C lib/gis CFLAGS1='-g -O0'

(segmentation fault)

2. make everything _but_ lib/gis and general/manage with "-O2":
make -C lib/gis clean
make -C lib/gis CFLAGS1='-g -O0'

make -C general/manage/ clean
make -C general/manage/ CFLAGS1='-g -O0'

(NO segfault)

3. make everything _but_ general/manage with "-O2":
make -C lib/gis clean
make -C lib/gis CFLAGS1='-g -O2'

make -C general/manage/ clean
make -C general/manage/ CFLAGS1='-g -O0'

(NO segfault)

4. enable optimization on general/manage: (everything else compiled with -O2)

make -C general/manage clean
make -C general/manage CFLAGS1='-g -O2'

(segfault)

5. test with -O1:
make -C general/manage clean
make -C general/manage CFLAGS1='-g -O1'

(segfault)

... so it looks like something in general/manage is screwed up by
optimization. Incrementally adding optimization:

make -C general/manage/cmd/ clean
make -C general/manage/cmd/ CFLAGS1='-g -O0'

(segfault)

# test separately:
make -C general/manage/cmd/ clean
make -C general/manage/cmd/ CFLAGS1='-g -O2'

make -C general/manage/lib/ clean
make -C general/manage/lib/ CFLAGS1='-g -O0'

(segfault)

# reset
make -C general/manage clean
make -C general/manage CFLAGS1='-g -O0'

(NO segfault)

Final thoughts: something in 'general/manage' . I can't really
speculate any further.

Dylan Beaudette wrote:

> > #50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]

> > > Also, it would be useful to know whether the problem occurs when lib/gis
> > is compiled with optimisation or when general/manage is compiled with
> > optimisation.
> >
> > After the configure step, how can I disable optimization for specific
> > parts of the source tree?
>
> First, compile everything. Then, re-compile specific parts with
> different flags using e.g.:
>
> make -C lib/gis clean
> make -C lib/gis CFLAGS1='-g -O0'

Some tests:

Final thoughts: something in 'general/manage' . I can't really
speculate any further.

Probably something in general/manage/lib (probably do_copy.c).

The library is a static library, so just rebuilding the library
doesn't help, as the g.copy command will still be using the old
library code.

You can confirm this with:

  make -C general/manage clean
  make -C general/manage CFLAGS1='-g -O0'
  rm general/manage/lib/OBJ.*/do_copy.o
  rm dist.*/bin/g.copy
  make -C general/manage CFLAGS1='-g -O2'

The last "make" should re-compile do_copy.c (with -O2) and re-link
g.copy.

If the problem is in do_copy.c, the resulting g.copy will segfault.

--
Glynn Clements <glynn@gclements.plus.com>

On Sat, Apr 12, 2008 at 4:51 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Dylan Beaudette wrote:

> > > #50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]

> > > > Also, it would be useful to know whether the problem occurs when lib/gis
> > > is compiled with optimisation or when general/manage is compiled with
> > > optimisation.
> > >
> > > After the configure step, how can I disable optimization for specific
> > > parts of the source tree?
> >
> > First, compile everything. Then, re-compile specific parts with
> > different flags using e.g.:
> >
> > make -C lib/gis clean
> > make -C lib/gis CFLAGS1='-g -O0'

> Some tests:

> Final thoughts: something in 'general/manage' . I can't really
> speculate any further.

Probably something in general/manage/lib (probably do_copy.c).

The library is a static library, so just rebuilding the library
doesn't help, as the g.copy command will still be using the old
library code.

You can confirm this with:

        make -C general/manage clean
        make -C general/manage CFLAGS1='-g -O0'
        rm general/manage/lib/OBJ.*/do_copy.o
        rm dist.*/bin/g.copy

        make -C general/manage CFLAGS1='-g -O2'

The last "make" should re-compile do_copy.c (with -O2) and re-link
g.copy.

If the problem is in do_copy.c, the resulting g.copy will segfault.

--

Glynn Clements <glynn@gclements.plus.com>

Thanks for the excellent advice Glynn.

Based on the above set of commands, I get a segfaulting g.copy
rast=..., suggesting that there is a problem in "do_copy.c".

I will update the ticket with these details.

Thanks,

Dylan

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Comment (by dylan):

Following Glynn's advice to try isolating the segfault to something in
[http://trac.osgeo.org/grass/browser/grass/trunk/general/manage/lib/do_copy.c
do_copy.c] :

> The last "make" should re-compile do_copy.c (with -O2) and re-link
g.copy.
> If the problem is in do_copy.c, the resulting g.copy will segfault.

{{{
make -C general/manage clean
make -C general/manage CFLAGS1='-g -O0'
rm general/manage/lib/OBJ.*/do_copy.o
rm dist.*/bin/g.copy
make -C general/manage CFLAGS1='-g -O2'
}}}

This results in a segfault when executing g.copy rast=...

It
[http://trac.osgeo.org/grass/log/grass/trunk/general/manage/lib/do_copy.c?rev=23024
looks like] g.copy was made less UNIX-dependent at some point, with the
introduction of recursive_copy replacing cp (?). Could an out of control
recursion be broken by compiler optimization?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/50#comment:10&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Comment (by glynn):

Replying to [comment:8 marisn]:

> Changing G_lstat() with lstat() in general/manage/lib/do_copy.c fixes
problem;(1)[[BR]]

Okay, I've installed gcc 4.1.2, can reproduce the segfault, and have
identified the problem: LFS.

When --enable-largefile is used, libgis is built with
-D_FILE_OFFSET_BITS=64. This causes the "struct stat" used by e.g.
G_lstat() in lib/gis/paths.c to be the 64-bit version (which is 96 bytes),
and lstat() to be an alias for lstat64(). But general/manage (and, in
fact, most of GRASS) is built without that flag, so "struct stat" is the
32-bit version (which is only 88 bytes).

The end result is that "sb" in recursive_copy is only 88 bytes, but
G_lstat() writes 96 bytes, overflowing the buffer by 8 bytes. This trashes
the saved copy of the %esi register, which holds the "dirp" variable. When
recursive_copy() returns, the trashed value is restored to %esi, meaning
that "dirp" has been trashed, resulting in the segfault.

In practice, exactly what gets trashed depends upon what is above "sb" on
the stack, which depends upon the compiler version and compilation
switches. Right now, the only files which use G_stat() or G_lstat() are:

{{{
grass=> select * from obj_imp where symbol in ('G_stat', 'G_lstat') ;
                        object | symbol
----------------------------------------------------+---------
  general/manage/lib/OBJ.i686-pc-linux-gnu/do_copy.o | G_lstat
  lib/gis/OBJ.i686-pc-linux-gnu/mapset_msc.o | G_stat
  lib/gis/OBJ.i686-pc-linux-gnu/remove.o | G_lstat
  lib/gis/OBJ.i686-pc-linux-gnu/unix_socks.o | G_lstat
  lib/gis/OBJ.i686-pc-linux-gnu/user_config.o | G_lstat
}}}

Four of them are part of libgis, so they will use the correct "struct
stat" size. The other is do_copy.c, which loses.

As for possible solutions: one option is to change G_[l]stat() to allocate
the "struct stat" with e.g. G_malloc(). Another is to move
recursive_copy() into libgis.

The problem with the latter option is that, so long as G_[l]stat() exist
in their current form, any code which uses them but which has a different
off_t size to libgis will likely get bitten by the same issue (and by the
time that it actually gets discovered, we will probably have forgotten
about all of this).

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/50#comment:11&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#50: g.copy segfaults (debian.gfoss.it package) [and latest SVN]
----------------------+-----------------------------------------------------
  Reporter: steko | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: default | Version: 6.3.0 RCs
Resolution: | Keywords: g.copy
----------------------+-----------------------------------------------------
Comment (by glynn):

Replying to [comment:11 glynn]:

> As for possible solutions: one option is to change G_[l]stat() to
allocate the "struct stat" with e.g. G_malloc().

Drat. That isn't entirely safe, as the fields of "struct stat" may be at
different offsets depending up the _FILE_OFFSET_BITS value. For the
specific case of recursive_copy() on Linux, we'll get away with it, as
it's only using the st_mode field, and that doesn't move.

OTOH, for functions which only need to test for a directory, we could add
e.g. G_is_directory(path) to libgis, which avoids having to deal with
"struct stat" outside of libgis.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/50#comment:12&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/