[GRASS-dev] segmentation fault with temporal commands

Hi devs,

I get “segmentation fault” when running temporal commands, i.e., t.list, t.info, t.create, t.register (haven’t tested them all). However, the command works as expected. It’s just that I get the message “Segmentation fault (core dumped)” whenever I run a temporal command. I use grass trunk r73007 in Fedora 28.

Here I paste the final part of the strace output (dunno if that’s the part useful):

strace t.info LST_Day_monthly

[…]
stat(“/home/veroandreo/grassdata/nc_spm_08_grass7/modis_lst/tgis/sqlite.db”, {st_mode=S_IFREG|0644, st_size=585728, …}) = 0
close(3) = 0
getpid() = 7519
wait4(7527, 0x7fff654f8644, WNOHANG, NULL) = 0
write(5, “\0\0\0\17\200\2]q\1U\4STOPq\2a.”, 19) = 19
getpid() = 7519
wait4(7527, 0x7fff654f83d4, WNOHANG, NULL) = 0
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = ? ERESTARTNOHAND (To be restarted if no handler)
— SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=7527, si_uid=1000, si_status=1, si_utime=0, si_stime=0} —
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)
wait4(7527, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], WNOHANG, NULL) = 7527
close(5) = 0
futex(0x55b0471463a0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff) = 0
getpid() = 7519
wait4(7528, 0x7fff654f8644, WNOHANG, NULL) = 0
write(7, “\0\0\0\t\200\2]q\1K\0a.”, 13) = 13
kill(7528, SIGTERM) = 0
close(7) = 0
wait4(7528, 0x7fff654f8644, WNOHANG, NULL) = 0
kill(7528, SIGTERM) = 0
wait4(7528, 0x7fff654f8644, WNOHANG, NULL) = 0
getpid() = 7519
wait4(7528, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 7528
— SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=7528, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} —
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=, sa_flags=SA_RESTORER, sa_restorer=0x7f78a8694fc0}, {sa_handler=0x7f78a8a2c880, sa_mask=, sa_flags=SA_RESTORER, sa_restorer=0x7f78a8694fc0}, 8) = 0
munmap(0x7f78a8f16000, 32) = 0
close(9) = 0
munmap(0x7f78a8f17000, 32) = 0
close(6) = 0
close(8) = 0
futex(0x7f7897973574, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7f78966421e4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
munmap(0x7f787bda5000, 33554496) = 0
munmap(0x7f7879da4000, 33554496) = 0
munmap(0x7f7877da3000, 33554496) = 0
munmap(0x7f7875da2000, 33554496) = 0
munmap(0x7f7873da1000, 33554496) = 0
munmap(0x7f7871da0000, 33554496) = 0
munmap(0x7f786fd9f000, 33554496) = 0
munmap(0x7f786dd9e000, 33554496) = 0
— SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f787bda5008} —
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

best,

Vero

On Sat, Jul 28, 2018 at 11:40 PM, Veronica Andreo <veroandreo@gmail.com> wrote:

Hi devs,

I get "segmentation fault" when running temporal commands, i.e., t.list,
t.info, t.create, t.register (haven't tested them all). However, the command
works as expected. It's just that I get the message "Segmentation fault
(core dumped)" whenever I run a temporal command. I use grass trunk r73007
in Fedora 28.

I get the same segfault:

Using
https://grasswiki.osgeo.org/wiki/GRASS_Debugging#Python_debugging_with_pdb

I found this:

GRASS 7.4.1svn (nc_spm_08_grass7):~ > t.list raster
----------------------------------------------
Time stamped raster maps with absolute time available in mapset <user1>:
a_beam_rad_08.00@user1
a_beam_rad_09.00@user1
a_beam_rad_10.00@user1
a_beam_rad_11.00@user1
a_beam_rad_12.00@user1
a_beam_rad_13.00@user1
a_beam_rad_14.00@user1
a_beam_rad_15.00@user1
a_beam_rad_16.00@user1
a_beam_rad_17.00@user1
a_beam_rad_18.00@user1

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/scripts/t.list(182)main()

-> if outpath:
(Pdb) s

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/scripts/t.list(184)main()

-> dbif.close()
(Pdb) s
--Call--

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py(921)close()

-> def close(self):
(Pdb) s

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py(927)close()

-> for key in self.unique_connections.keys():
(Pdb) s

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py(928)close()

-> self.unique_connections[key].close()
(Pdb) s
--Call--

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py(1133)close()

-> def close(self):
(Pdb) s

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py(1140)close()

-> self.connection.commit()
(Pdb) s

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py(1141)close()

-> self.cursor.close()
(Pdb) s

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py(1142)close()

-> self.connected = False
(Pdb) s
--Return--

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py(1142)close()->None

-> self.connected = False
(Pdb) s

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py(927)close()

-> for key in self.unique_connections.keys():
(Pdb) s

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py(930)close()

-> self.connected = False
(Pdb) s
--Return--

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py(930)close()->None

-> self.connected = False
(Pdb) s
--Return--

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/scripts/t.list(184)main()->None

-> dbif.close()
(Pdb) s
--Return--

/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/scripts/t.list(188)<module>()->None

-> main()
(Pdb) s
Exception AttributeError: "'NoneType' object has no attribute 'path'"
in <function _remove at 0x7f81b64a68c0> ignored
Segmentation fault (core dumped)

Interestingly, there is no "path" at all in scripts/t.list.

Filling up the script with print statements:

...
    print("bla1")
    dbif.close()
    print("bla2")

if __name__ == "__main__":
    options, flags = gscript.parser()
    main()
    print("bla3")

I get

t.list raster
...
bla1
bla2
bla3
Segmentation fault (core dumped)

So the segfault happens when t.list (or other t.* scripts) close.

?

Markus

Hi again,

On Mon, Jul 30, 2018 at 12:33 PM, Markus Neteler <neteler@osgeo.org> wrote:
...

So the segfault happens when t.list (or other t.* scripts) close.

?

Now using gdb to debug the script (and G74svn recompiled from scratch):

GRASS 7.4.2svn (nc_spm_08_grass7):~/software/grass74 > gdb python
GNU gdb (GDB) Fedora 8.1-19.fc28
...
(gdb) run /home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/scripts/t.list
raster
Starting program: /usr/bin/python
/home/mneteler/software/grass74/dist.x86_64-pc-linux-gnu/scripts/t.list
raster
[Thread debugging using libthread_db enabled]
...
[New Thread 0x7fffbebaf700 (LWP 18929)]
----------------------------------------------
Time stamped raster maps with absolute time available in mapset <user1>:
a_beam_rad_08.00@user1
...
a_beam_rad_17.00@user1
a_beam_rad_18.00@user1
[Thread 0x7fffbebaf700 (LWP 18929) exited]

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffd7ac305b in blas_shutdown () from /lib64/libopenblaso.so.0
(gdb) bt full
#0 0x00007fffd7ac305b in blas_shutdown () from /lib64/libopenblaso.so.0
No symbol table info available.
#1 0x00007fffd788f015 in gotoblas_quit () from /lib64/libopenblaso.so.0
No symbol table info available.
#2 0x00007ffff7de58e6 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#3 0x00007ffff6c4466c in __run_exit_handlers () from /lib64/libc.so.6
No symbol table info available.
#4 0x00007ffff6c4479c in exit () from /lib64/libc.so.6
No symbol table info available.
#5 0x00007ffff6c2e192 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#6 0x000055555555484a in _start ()
No symbol table info available.
(gdb)

# query RPM package
rpm -qf /lib64/libopenblaso.so.0
openblas-openmp-0.3.1-1.fc28.x86_64

This is how I configured G74svn here:
./configure \
  --with-cxx \
  --enable-largefile \
  --with-proj --with-proj-share=/usr/share/proj \
  --with-gdal=/usr/bin/gdal-config \
  --with-python \
  --with-geos \
  --with-sqlite \
  --with-nls \
  --with-liblas \
  --with-cairo --with-cairo-ldflags=-lfontconfig \
  --with-freetype --with-freetype-includes=/usr/include/freetype2 \
  --with-wxwidgets \
  --with-fftw \
  --with-motif \
  --with-x \
  --with-postgres --with-postgres-includes="/usr/include/pgsql" \
  --without-netcdf \
  --without-mysql \
  --without-odbc \
  --without-openmp \
  --without-ffmpeg

Something is pulling in BLAS/openMP, though.

Do we perhaps have a namespace collision?

Markus

Hi again,

On Mon, Jul 30, 2018 at 1:05 PM, Markus Neteler <neteler@osgeo.org> wrote:
...

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffd7ac305b in blas_shutdown () from /lib64/libopenblaso.so.0
(gdb) bt full
#0 0x00007fffd7ac305b in blas_shutdown () from /lib64/libopenblaso.so.0
No symbol table info available.
#1 0x00007fffd788f015 in gotoblas_quit () from /lib64/libopenblaso.so.0
No symbol table info available.
#2 0x00007ffff7de58e6 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#3 0x00007ffff6c4466c in __run_exit_handlers () from /lib64/libc.so.6
No symbol table info available.
#4 0x00007ffff6c4479c in exit () from /lib64/libc.so.6
No symbol table info available.
#5 0x00007ffff6c2e192 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#6 0x000055555555484a in _start ()
No symbol table info available.
(gdb)

...

Something is pulling in BLAS/openMP, though.

Of course GDAL comes to mind:

ldd `which gdalinfo` | grep openbla
    libopenblaso.so.0 => /lib64/libopenblaso.so.0 (0x00007f4058bdc000)
    libopenblasp.so.0 => /lib64/libopenblasp.so.0 (0x00007f4050b9f000)

So, could it be related to lib/raster/* and GDAL related calls? I am
not sure how to debug further.

Markus

On Mon, Jul 30, 2018 at 1:37 PM, Markus Neteler <neteler@osgeo.org> wrote:

Hi again,

On Mon, Jul 30, 2018 at 1:05 PM, Markus Neteler <neteler@osgeo.org> wrote:

Thread 1 “python” received signal SIGSEGV, Segmentation fault.
0x00007fffd7ac305b in blas_shutdown () from /lib64/libopenblaso.so.0
(gdb) bt full
#0 0x00007fffd7ac305b in blas_shutdown () from /lib64/libopenblaso.so.0
No symbol table info available.
#1 0x00007fffd788f015 in gotoblas_quit () from /lib64/libopenblaso.so.0
No symbol table info available.
#2 0x00007ffff7de58e6 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#3 0x00007ffff6c4466c in __run_exit_handlers () from /lib64/libc.so.6
No symbol table info available.
#4 0x00007ffff6c4479c in exit () from /lib64/libc.so.6
No symbol table info available.
#5 0x00007ffff6c2e192 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#6 0x000055555555484a in _start ()
No symbol table info available.
(gdb)

Something is pulling in BLAS/openMP, though.

Of course GDAL comes to mind:

ldd which gdalinfo | grep openbla
libopenblaso.so.0 => /lib64/libopenblaso.so.0 (0x00007f4058bdc000)
libopenblasp.so.0 => /lib64/libopenblasp.so.0 (0x00007f4050b9f000)

So, could it be related to lib/raster/* and GDAL related calls? I am

not sure how to debug further.

It could also be python/numpy…

Markus M

Markus


grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Markus Neteler <neteler@osgeo.org> schrieb am Mo., 30. Juli 2018, 13:37:

Hi again,

On Mon, Jul 30, 2018 at 1:05 PM, Markus Neteler <neteler@osgeo.org> wrote:

Thread 1 “python” received signal SIGSEGV, Segmentation fault.
0x00007fffd7ac305b in blas_shutdown () from /lib64/libopenblaso.so.0
(gdb) bt full
#0 0x00007fffd7ac305b in blas_shutdown () from /lib64/libopenblaso.so.0
No symbol table info available.
#1 0x00007fffd788f015 in gotoblas_quit () from /lib64/libopenblaso.so.0
No symbol table info available.
#2 0x00007ffff7de58e6 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#3 0x00007ffff6c4466c in __run_exit_handlers () from /lib64/libc.so.6
No symbol table info available.
#4 0x00007ffff6c4479c in exit () from /lib64/libc.so.6
No symbol table info available.
#5 0x00007ffff6c2e192 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#6 0x000055555555484a in _start ()
No symbol table info available.
(gdb)

Something is pulling in BLAS/openMP, though.

Confirmed:

https://bugzilla.redhat.com/show_bug.cgi?id=1605231

Fixed today in Fedora and not a GRASS bug.

Markus