[GRASS-dev] [grass-code I][431] g.copy segmentation fault

code I item #431, was opened at 2007-06-20 13:53
Status: Open
Priority: 3
Submitted By: Otto Dassau (dassau)
Assigned to: Nobody (None)
Summary: g.copy segmentation fault
Issue type: module bug
Issue status: None
GRASS version: CVS HEAD
GRASS component: general
Operating system: Linux
Operating system version: debian testing
GRASS CVS checkout date, if applies (YYMMDD): 070620

Initial Comment:
the module g.copy segfaults.

example
g.copy rast=objekthoehen20,muell10
Copy raster <objekthoehen20@testdaten> to current mapset as <muell10>
Speicherzugriffsfehler

with gdb I get following message (maybe helpful):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1242368304 (LWP 12305)]
do_copy (n=0, old=0x80506b8 "objekthoehen20", mapset=0x8050800 "testdaten", new=0x80506d0 "muell") at do_copy.c:40
/data/software/grass6/general/manage/lib/do_copy.c:40:964:beg:0x8049abd
(gdb)

I have another CVS version running from 20070604 and it works there. And I start the segfaulting grass version from cvs directory (no make install). Could that be a problem?

thanks,
Otto

----------------------------------------------------------------------

You can respond by visiting:
http://wald.intevation.org/tracker/?func=detail&atid=204&aid=431&group_id=21

example
g.copy rast=objekthoehen20,muell10
Copy raster <objekthoehen20@testdaten> to current mapset as <muell10>
Speicherzugriffsfehler

It showed to be problem due to two conflicting GRASS 6.3 instalations.
Case closed.

Maciek

Hi,

discussing this topic off-list it showed up that this topic is NOT solved so far. I encounter the same error with a cvs install from scratch (about 2 weeks old).

I tried some debugging, that's what I found:
do_copy.c (in .../general/manage/lib) seems to produce the segfault, especially the function recursive_copy. I can't catch the return value of this function as the segfault happens immediately after returning from this function.
Within recursive_copy the /* src is a file */ section is entered, and all "if" queries and "while" loops are simply passed. The variables that get set have the following values:
fd = 3
fd2 = 4
len = 0

Any ideas?
Thanks,
Volker

Maciej Sieczka wrote:

example
g.copy rast=objekthoehen20,muell10
Copy raster <objekthoehen20@testdaten> to current mapset as <muell10>
Speicherzugriffsfehler
    
It showed to be problem due to two conflicting GRASS 6.3 instalations.
Case closed.

Maciek

_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev
  

Digging a little deeper, I found that the return values of the two 'open' commands (variables fd and fd2) are reasonable - but what about the return value of the read command (line 146, do_copy.c):
len = read(fd, buf, 1024)
Why is a 0 returned at the first call - my dataset is a valid raster, so why an EOF before reading anything?

Maybe this is an issue of file permission? Any help is appreciated!
Thanks,
Volker

Volker Wichmann wrote:

Hi,

discussing this topic off-list it showed up that this topic is NOT solved so far. I encounter the same error with a cvs install from scratch (about 2 weeks old).

I tried some debugging, that's what I found:
do_copy.c (in .../general/manage/lib) seems to produce the segfault, especially the function recursive_copy. I can't catch the return value of this function as the segfault happens immediately after returning from this function.
Within recursive_copy the /* src is a file */ section is entered, and all "if" queries and "while" loops are simply passed. The variables that get set have the following values:
fd = 3
fd2 = 4
len = 0

Any ideas?
Thanks,
Volker

Maciej Sieczka wrote:

example
g.copy rast=objekthoehen20,muell10
Copy raster <objekthoehen20@testdaten> to current mapset as <muell10>
Speicherzugriffsfehler
    
It showed to be problem due to two conflicting GRASS 6.3 instalations.
Case closed.

Maciek

_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev
  
_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev

Volker Wichmann wrote:

Digging a little deeper, I found that the return values of the two
'open' commands (variables fd and fd2) are reasonable - but what about
the return value of the read command (line 146, do_copy.c):
len = read(fd, buf, 1024)
Why is a 0 returned at the first call - my dataset is a valid raster, so
why an EOF before reading anything?

Some of the files comprising the map may be empty, e.g. FP maps have
an empty "cell" file (the actual data is in the "fcell" file).

Maybe this is an issue of file permission?

If you don't have the necessary permissions, the open() call will
fail; read() and write() can't fail due to permissions.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

Volker Wichmann wrote:

Digging a little deeper, I found that the return values of the two 'open' commands (variables fd and fd2) are reasonable - but what about the return value of the read command (line 146, do_copy.c):
len = read(fd, buf, 1024)
Why is a 0 returned at the first call - my dataset is a valid raster, so why an EOF before reading anything?
    
Some of the files comprising the map may be empty, e.g. FP maps have
an empty "cell" file (the actual data is in the "fcell" file).

ok, thanks for that hint. I was indeed using a FP map ...
trying to copy a map of type CELL reads and writes the map until EOF but
the segmentation fault on return remains. Capturing the return value I
sometimes get a corrupted output (e.g. ��������������������u����؁Č
������u����؁Č) in other cases I get none before the seg fault.

I get these results on Fedora 6 and 7 installs from scratch. I know of
others having the same problem (using Fedora 6) that they use a g.copy version from 6.2 as a workaround. I had a look at the diffs but didn't find anything I would put the blame on.

Any help is appreciated,
thanks,
Volker

By the way, trying to display the copied/corrupted map I get the
following errors:
WARNING: Can't open header file for [test@PERMANENT in PERMANENT]
WARNING: category support for [test@PERMANENT] in mapset [PERMANENT]
missing
WARNING: can't get history information for [test@PERMANENT] in mapset
[PERMANENT]
WARNING: can't read range file for [test@PERMANENT in PERMANENT]
ERROR: Unable to read range file

... but I have to remove the map before each new debug attempt as g.copy
states the map is already in existence

Maybe this is an issue of file permission?
    
If you don't have the necessary permissions, the open() call will
fail; read() and write() can't fail due to permissions.

Volker Wichmann wrote:

>> Digging a little deeper, I found that the return values of the two
>> 'open' commands (variables fd and fd2) are reasonable - but what about
>> the return value of the read command (line 146, do_copy.c):
>> len = read(fd, buf, 1024)
>> Why is a 0 returned at the first call - my dataset is a valid raster, so
>> why an EOF before reading anything?
>>
>
> Some of the files comprising the map may be empty, e.g. FP maps have
> an empty "cell" file (the actual data is in the "fcell" file).

ok, thanks for that hint. I was indeed using a FP map ...
trying to copy a map of type CELL reads and writes the map until EOF but
the segmentation fault on return remains. Capturing the return value I
sometimes get a corrupted output (e.g. e$(0!;!;e(Be$(0!;!;e(Be$(0!;!;!;!;e(Be$(0!;!;e(Be$(0!;!;e(Be$(0!;!;!;!;e(Be$(0!;!;e(Be$(0!;!;e(Bue$(0!;!;!;!;e(B?e-BÈe-A
e$(0!;!;e(Be$(0!;!;e(Be$(0!;!;e(Bue$(0!;!;!;!;e(B?e-BÈ) in other cases I get none before the seg fault.e-A

You will need to provide more detailed information, i.e. the complete
backtrace at the point of the segfault and the values of any relevant
variables.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

You will need to provide more detailed information, i.e. the complete
backtrace at the point of the segfault and the values of any relevant
variables.

I did a backtrace, placed a break point a few lines before the crash and did a re-run. Here is the output:

-------------------------------------------------------------------------------------------------
Starting program: /usr/local/grass-6.3.cvs/bin/g.copy rast=last.min,test5
[Thread debugging using libthread_db enabled]
[New Thread -1208260912 (LWP 4913)]
Copy raster <last.min@PERMANENT> to current mapset as <test5>
[Switching to Thread -1208260912 (LWP 4913)]

Breakpoint 1, do_copy (n=0, old=0x897c270 "last.min",
mapset=0x897c2b8 "PERMANENT", new=0x897c280 "test5") at do_copy.c:31
31 hold_signals(1);
(gdb) next
32 if ( G_strcasecmp (list[n].alias, "vect") == 0 ) {
(gdb) next
40 for (i = 0; i < list[n].nelem; i++)
(gdb) next
35 G_warning ("Cannot copy <%s> to current mapset as <%s>",
(gdb) next
42 G__make_mapset_element (list[n].element[i]);
(gdb) next
43 G__file_name (path, list[n].element[i], old, mapset);
(gdb) next
44 if (access (path, 0) != 0)
(gdb) next
52 G__file_name (path2, list[n].element[i], new, G_mapset());
(gdb) next
53 if ( recursive_copy(path, path2) == 1 )
(gdb) print path
$1 = "/data/grassdb/demo/PERMANENT/cell/last.min\000\217\221\000\001\000\000\000\b\000\000\000��������\000\000\000\000\200s�\000��\227\b\003\000\000\000\000\000\000\000���\000\000\000\000\000Q���", '\0' <repeats 16 times>, "�o�\000\000\000\000\000ص\227\b\001\000\000\000\000\000\000\000\002\000\000\000ص\227\b���\000����8���\202B\025\000�{�\000ص\227\b \000\000\000�xP�\000\000\000\000ص\227\b\202��\000K֨\000\202��\000 \201�\000Q\000\000\000L\201�\000\000\000\000\000"...
(gdb) print path2
$2 = "/data/grassdb/demo/PERMANENT/cell/test5\000\227\b\001\000\000\000 \000\000\000\002\000\000\000�輿3d.view:3dview:3D viewing parameters:3D view parameters\000\000n files\000\000���\217\221\000x\221��L鼿\200鼿�h \220\000\000\000\000\000kI \024\000㰬\000�\217\221\000�'���q�\000nȫ\000\ f\226\227\b\000\000\000\000 \212\224\227\b�o�\000W\030\031\000\000\000\000\000T�"...
(gdb) next
59 if (G_verbose() == G_verbose_max())
(gdb) next
40 for (i = 0; i < list[n].nelem; i++)
(gdb) next

Program received signal SIGSEGV, Segmentation fault.
do_copy (n=0, old=0x897c270 "last.min", mapset=0x897c2b8 "PERMANENT",
new=0x897c280 "test5") at do_copy.c:40
40 for (i = 0; i < list[n].nelem; i++)
(gdb) print i
$3 = 0
(gdb) print n
$4 = 0
(gdb) print list[n].nelem
$5 = 8
(gdb) print path
$6 = "/data/grassdb/demo/PERMANENT/cell/last.min\000\217\221\000\001\000\000\000\b\000\000\000��������\000\000\000\000\200s�\000��\227\b\003\000\000\000\000\000\000\000���\000\000\000\000\000Q���", '\0' <repeats 16 times>, "�o�\000\000\000\000\000ص\227\b\001\000\000\000\000\000\000\000\002\000\000\000ص\227\b���\000����8���\202B\025\000�{�\000ص\227\b \000\000\000�xP�\000\000\000\000ص\227\b\202��\000K֨\000\202��\000 \201�\000Q\000\000\000L\201�\000\000\000\000\000"...
(gdb) print path2
$7 = "/data/grassdb/demo/PERMANENT/cell/test5\000\227\b\001\000\000\000 \000\000\000\002\000\000\000�輿3d.view:3dview:3D viewing parameters:3D view parameters\000\000n files\000\000���\217\221\000x\221��L鼿\200鼿�h \220\000\000\000\000\000kI \024\000㰬\000�\217\221\000�'���q�\000nȫ\000\ f\226\227\b\000\000\000\000 \212\224\227\b�o�\000W\030\031\000\000\000\000\000T�"...
(gdb) bt full
#0 do_copy (n=0, old=0x897c270 "last.min", mapset=0x897c2b8 "PERMANENT",
new=0x897c280 "test5") at do_copy.c:40
i = 0
ret = <value optimized out>
path = "/data/grassdb/demo/PERMANENT/cell/last.min\000\217\221\000\001\000\000\000\b\000\000\000��������\000\000\000\000\200s�\000��\227\b\003\000\000\000\000\000\000\000���\000\000\000\000\000Q���", '\0' <repeats 16 times>, "�o�\000\000\000\000\000ص\227\b\001\000\000\000\000\000\000\000\002\000\000\000ص\227\b���\000����8���\202B\025\000�{�\000ص\227\b \000\000\000�xP�\000\000\000\000ص\227\b\202��\000K֨\000\202��\000 \201�\000Q\000\000\000L\201�\000\000\000\000\000"...
path2 = "/data/grassdb/demo/PERMANENT/cell/test5\000\227\b\001\000\000\000 \000\000\000\002\000\000\000�輿3d.view:3dview:3D viewing parameters:3D view parameters\000\000n files\000\000���\217\221\000x\221��L鼿\200鼿�h \220\000\000\000\000\000kI \024\000㰬\000�\217\221\000�'���q�\000nȫ\000\ f\226\227\b\000\000\000\000 \212\224\227\b�o�\000W\030\031\000\000\000\000\000T�"...
result = 0
#1 0x080493eb in main (argc=2, argv=0xbfbcf1a4) at copy.c:95
n = <value optimized out>
mapset = 0x897c2b8 "PERMANENT"
module = (struct GModule *) 0x199b90
parm = (struct Option **) 0x897b5d8
p = <value optimized out>
---Type <return> to continue, or q <return> to quit---
to = 0x897c280 "test5"
result = 0
(gdb) continue
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
-------------------------------------------------------------------------------------------------

Seems to me that path and path2 are corrupted?
What I'm also unsure about is why line 35 is entered while stepping through the code - a gdb issue I can forget about?

Thanks,
Volker

Volker Wichmann wrote:

> You will need to provide more detailed information, i.e. the complete
> backtrace at the point of the segfault and the values of any relevant
> variables.
>

I did a backtrace, placed a break point a few lines before the crash and
did a re-run. Here is the output:

Seems to me that path and path2 are corrupted?

No; they're both NUL-terminated. When printing the contents of a
char, gdb prints the entire array; it doesn't stop at the first NUL.

What I'm also unsure about is why line 35 is entered while stepping
through the code - a gdb issue I can forget about?

The code was compiled with optimisation enabled (the default CFLAGS
are '-g -O2'), so the object code doesn't directly correspond to the
source code.

It might help if you re-compile the general/manage directory without
optimisation, e.g.:

  make -C general/manage clean
  make -C general/manage CFLAGS1='-g'

That will certainly make it easier to debug; OTOH, it might simply
make the bug disappear.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

The code was compiled with optimisation enabled (the default CFLAGS
are '-g -O2'), so the object code doesn't directly correspond to the
source code.

It might help if you re-compile the general/manage directory without
optimisation, e.g.:

  make -C general/manage clean
  make -C general/manage CFLAGS1='-g'

That will certainly make it easier to debug; OTOH, it might simply
make the bug disappear.

Crazy, that did the trick - no segmentation fault anymore!
Thanks a lot Glynn!!

As I know of several people that they have the same problem, I would like to keep this solution for the record. Glynn, would you be so kind to give me some more background why this can happen? Does this 'feature' relate to specific CPUs, operating systems etc. ?

Volker

Volker Wichmann wrote:

> The code was compiled with optimisation enabled (the default CFLAGS
> are '-g -O2'), so the object code doesn't directly correspond to the
> source code.
>
> It might help if you re-compile the general/manage directory without
> optimisation, e.g.:
>
> make -C general/manage clean
> make -C general/manage CFLAGS1='-g'
>
> That will certainly make it easier to debug; OTOH, it might simply
> make the bug disappear.

Crazy, that did the trick - no segmentation fault anymore!
Thanks a lot Glynn!!

That isn't necessarily a good thing; it may just be fixing the symptom
rather than the problem. It's quite possible that the bug is still
there, but we can't find it.

[A bug which "disappears" when you start looking for it (e.g. by
compiling with options suitable for debugging) is sometimes referred
to as a "Heisenbug", in reference to the quantum physics concept that
a system can be changed by simply observing it.]

As I know of several people that they have the same problem, I would
like to keep this solution for the record. Glynn, would you be so kind
to give me some more background why this can happen? Does this 'feature'
relate to specific CPUs, operating systems etc. ?

Segmentation faults often depend upon exactly how variables are
arranged in memory. Disabling optimisation may change the layout
(typically making it less dense; optimisation tends to reduce the
number of variables which are actually stored in memory).

OTOH, it's possible that the problem is due to a bug in the compiler's
optimisation code. An optimising compiler is *much* more complex than
a "dumb" compiler; if you check the change logs for a compiler, you
normally find that the vast majority of bug fixes apply to bugs which
only occur when optimisation is enabled.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

Volker Wichmann wrote:

The code was compiled with optimisation enabled (the default CFLAGS
are '-g -O2'), so the object code doesn't directly correspond to the
source code.

It might help if you re-compile the general/manage directory without
optimisation, e.g.:

  make -C general/manage clean
  make -C general/manage CFLAGS1='-g'

That will certainly make it easier to debug; OTOH, it might simply
make the bug disappear.
      

Crazy, that did the trick - no segmentation fault anymore!
Thanks a lot Glynn!!
    
That isn't necessarily a good thing; it may just be fixing the symptom
rather than the problem. It's quite possible that the bug is still
there, but we can't find it.

[A bug which "disappears" when you start looking for it (e.g. by
compiling with options suitable for debugging) is sometimes referred
to as a "Heisenbug", in reference to the quantum physics concept that
a system can be changed by simply observing it.]

Thanks a lot for that background information, nice metaphor.

As I know of several people that they have the same problem, I would like to keep this solution for the record. Glynn, would you be so kind to give me some more background why this can happen? Does this 'feature' relate to specific CPUs, operating systems etc. ?
    
Segmentation faults often depend upon exactly how variables are
arranged in memory. Disabling optimisation may change the layout
(typically making it less dense; optimisation tends to reduce the
number of variables which are actually stored in memory).

OTOH, it's possible that the problem is due to a bug in the compiler's
optimisation code. An optimising compiler is *much* more complex than
a "dumb" compiler; if you check the change logs for a compiler, you
normally find that the vast majority of bug fixes apply to bugs which
only occur when optimisation is enabled.

So what to do now? Is there any possibility to trace the bug or at least
to assure it isn't related to GRASS (which I assume as most people don't have problems with g.copy)?
- I'm using gcc version 4.1.2

Thanks a lot for your support,
Volker

Glynn Clements wrote:

It might help if you re-compile the general/manage directory without
optimisation, e.g.:

  make -C general/manage clean
  make -C general/manage CFLAGS1='-g'

That will certainly make it easier to debug; OTOH, it might simply
make the bug disappear.
      
... just as a follow up: discussion off-list showed that this solved the
problem for several people who encountered this error too.

Volker

--
View this message in context: http://www.nabble.com/-grass-code-I--431--g.copy-segmentation-fault-tf3952533.html#a11943808
Sent from the Grass - Dev mailing list archive at Nabble.com.