[GRASS-dev] Open/close db drivers order

I found, that db drivers must be closed in reverse order to the order
in which were opened, otherwise it hangs (Linux). I vaguely remember
that this was a problem years ago. Is it still an issue, anybody knows
more about the problem?

Radim

On Tue, Oct 20, 2015 at 7:18 PM, Radim Blazek <radim.blazek@gmail.com> wrote:

I found, that db drivers must be closed in reverse order to the order
in which were opened, otherwise it hangs (Linux). I vaguely remember
that this was a problem years ago. Is it still an issue, anybody knows
more about the problem?

I found a note in copy_tab.c: "Warning: driver opened as second must
be closed as first, otherwise it hangs, not sure why."

It is hanging in db_shutdown_driver on
  G_wait(driver->pid) == -1 ? -1 : 0;
both driver processes are still running. So fclose(driver->send) does
not close the pipe if another one was opened after?

Radim

I have created https://trac.osgeo.org/grass/ticket/2775

It works if DB_PROC_SHUTDOWN_DRIVER is sent like on Windows. Patch
(very simple) is supplied.

Radim

On Wed, Oct 21, 2015 at 10:16 AM, Radim Blazek <radim.blazek@gmail.com> wrote:

On Tue, Oct 20, 2015 at 7:18 PM, Radim Blazek <radim.blazek@gmail.com> wrote:

I found, that db drivers must be closed in reverse order to the order
in which were opened, otherwise it hangs (Linux). I vaguely remember
that this was a problem years ago. Is it still an issue, anybody knows
more about the problem?

I found a note in copy_tab.c: "Warning: driver opened as second must
be closed as first, otherwise it hangs, not sure why."

It is hanging in db_shutdown_driver on
  G_wait(driver->pid) == -1 ? -1 : 0;
both driver processes are still running. So fclose(driver->send) does
not close the pipe if another one was opened after?

Radim

Radim Blazek wrote:

I found, that db drivers must be closed in reverse order to the order
in which were opened, otherwise it hangs (Linux). I vaguely remember
that this was a problem years ago. Is it still an issue, anybody knows
more about the problem?

It sounds like an issue with inheriting file descriptors. I.e.:

1. Opening the first driver causes the main process to get some file
descriptors for the pipes connecting it to the driver

2. Opening the second driver will result in the driver inheriting
copies of those descriptors from the main process.

3. When the main process closes the write end of the pipe whose read
end is the first driver's stdin, the first driver won't get EOF on its
stdin because the second driver still has a copy of the descriptor for
the write end of the pipe and hasn't closed it (it doesn't even know
it has it).

If that's what's happening, setting the close-on-exec flag for each
descriptor should fix it. E.g.

        void close_on_exec(int fd)
        {
        #ifndef __MINGW32__
            int flags = fcntl(fd, F_GETFD);
            fcntl(fd, F_SETFD, flags | FD_CLOEXEC);
        #endif
        }

  close_on_exec(p1[READ]);
  close_on_exec(p1[WRITE]);
  close_on_exec(p2[READ]);
  close_on_exec(p2[WRITE]);

p1[WRITE] is the one which is most likely to matter; that's the one
which needs to be closed (in all processes) for the child to get EOF
on stdin.

p1[READ] and p2[WRITE] are closed in the parent after the child has
been spawned, so they won't be inherited by subsequent child
processes.

--
Glynn Clements <glynn@gclements.plus.com>

On Thu, Oct 22, 2015 at 4:16 AM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Radim Blazek wrote:

I found, that db drivers must be closed in reverse order to the order
in which were opened, otherwise it hangs (Linux). I vaguely remember
that this was a problem years ago. Is it still an issue, anybody knows
more about the problem?

It sounds like an issue with inheriting file descriptors. I.e.:

1. Opening the first driver causes the main process to get some file
descriptors for the pipes connecting it to the driver

2. Opening the second driver will result in the driver inheriting
copies of those descriptors from the main process.

3. When the main process closes the write end of the pipe whose read
end is the first driver's stdin, the first driver won't get EOF on its
stdin because the second driver still has a copy of the descriptor for
the write end of the pipe and hasn't closed it (it doesn't even know
it has it).

If that's what's happening, setting the close-on-exec flag for each
descriptor should fix it. E.g.

        void close_on_exec(int fd)
        {
        #ifndef __MINGW32__
            int flags = fcntl(fd, F_GETFD);
            fcntl(fd, F_SETFD, flags | FD_CLOEXEC);
        #endif
        }

        close_on_exec(p1[READ]);
        close_on_exec(p1[WRITE]);
        close_on_exec(p2[READ]);
        close_on_exec(p2[WRITE]);

p1[WRITE] is the one which is most likely to matter; that's the one
which needs to be closed (in all processes) for the child to get EOF
on stdin.

p1[READ] and p2[WRITE] are closed in the parent after the child has
been spawned, so they won't be inherited by subsequent child
processes.

This works, thanks. I have attached the patch also to the issue.

But there may be other file descriptors not related to db drivers. Is
it possible to close all of them in driver process?

Radim

Radim Blazek wrote:

But there may be other file descriptors not related to db drivers. Is
it possible to close all of them in driver process?

Maybe.

When fork() clones a process, the descriptor table is cloned along
with everything else. When the process changes its program with
execve(), any descriptors without the FD_CLOEXEC flag remain open.

But the new program won't know of their existence, where they came
from, what they refer to, etc. The "knowledge" about the descriptors
only exists in the original program, so closing specific descriptors
(or setting their close-on-exec flags) has to be done prior to the
execve().

Once the program is changed and that information is lost, about the
only option available to the driver is to close *every* open
descriptor except for 0, 1 and 2 (standard input/output/error). This
needs to be done before calling any library function which might
create additional descriptors (which in practice means almost any
library function).

Even that is complicated by the fact that there's no standard
mechanism for discovering the highest open descriptor number. You just
have to iterate until reaching some arbitrary maximum, or until some
(again, arbitrary) number of consecutive descriptors are found to be
unused (by close() failing with EBADF), and hope that's all of them.

The only robust solution is for the original program to systematically
set the close-on-exec flag on every descriptor *except* those which
are meant to be inherited.

The fact that descriptors are always inherited across fork() and
preserved across execve() by default is generally considered to be a
mistake in the POSIX API, but we're now stuck with it.

--
Glynn Clements <glynn@gclements.plus.com>

On Thu, Oct 22, 2015 at 11:58 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:

But there may be other file descriptors not related to db drivers. Is
it possible to close all of them in driver process?

Maybe.

When fork() clones a process, the descriptor table is cloned along
with everything else. When the process changes its program with
execve(), any descriptors without the FD_CLOEXEC flag remain open.

But the new program won't know of their existence, where they came
from, what they refer to, etc. The "knowledge" about the descriptors
only exists in the original program, so closing specific descriptors
(or setting their close-on-exec flags) has to be done prior to the
execve().

Once the program is changed and that information is lost, about the
only option available to the driver is to close *every* open
descriptor except for 0, 1 and 2 (standard input/output/error). This
needs to be done before calling any library function which might
create additional descriptors (which in practice means almost any
library function).

Even that is complicated by the fact that there's no standard
mechanism for discovering the highest open descriptor number. You just
have to iterate until reaching some arbitrary maximum, or until some
(again, arbitrary) number of consecutive descriptors are found to be
unused (by close() failing with EBADF), and hope that's all of them.

Should it better go to the original program between fork() and execve()
or into the driver executable?

The only robust solution is for the original program to systematically
set the close-on-exec flag on every descriptor *except* those which
are meant to be inherited.

The original program may be using third party libraries which will
never care of.

Radim

The fact that descriptors are always inherited across fork() and
preserved across execve() by default is generally considered to be a
mistake in the POSIX API, but we're now stuck with it.

--
Glynn Clements <glynn@gclements.plus.com>

Radim Blazek wrote:

Should it better go to the original program between fork() and execve()
or into the driver executable?

Ideally, it should be done by whatever creates the descriptor in the
first place.

Doing it when spawning a DBMI driver (whether in the parent or in the
child) only avoids the specific case where the process holding onto
the descriptor is a DBMI driver.

There are other types of child process which might be spawned
(possibly invisibly to GRASS, e.g. an external library spawning a
"helper" process), and the consequences are always the same.

If I was going to add a mechanism for closing all descriptors other
than 0/1/2, it would probably be as an option to G_spawn_ex().

Also, the situation is potentially worse on Windows, as handles can't
be enumerated (and even if they could, process creation is atomic;
there's no way to execute code in the child process prior to the start
of the new program). If CreateProcess() is called with bInheritHandles
set to TRUE (which is required for stdin/stdout/stderr to be
inherited), all inheritable handles are inherited.

> The only robust solution is for the original program to systematically
> set the close-on-exec flag on every descriptor *except* those which
> are meant to be inherited.

The original program may be using third party libraries which will
never care of.

For libraries, it's even more important that they set the
close-on-exec flag on "internal" descriptors, because nothing else
can.

Ultimately, you can have one library which creates "internal"
descriptors and another library which spawns "internal" helper
processes, and there's not much that the main program can do about the
interaction between these.

It's generally considered the responsibility of the code creating
descriptors to set the close-on-exec flag, because that's easier (and
safer) than expecting code which spawns child processes to close
descriptors which it knows nothing about.

In this case, the key issue is that the DBMI client library needs to
set the close-on-exec flag on p1[WRITE], as it's important that this
descriptor doesn't get duplicated.

--
Glynn Clements <glynn@gclements.plus.com>