[GRASS5] R_open_driver() after fork()

Hi,

I have one problem: if module creates child process by fork(),
it is impossible open driver by R_open_driver() in parent.
In sync_driver() it hangs on read(). SIGALRM is recieved and dead()
executed, but module doesn't continue in execution - still hanging on read(),
until child is killed.

Could you help me? Thanks.

Radim

Radim Blazek wrote:

I have one problem: if module creates child process by fork(),
it is impossible open driver by R_open_driver() in parent.

Are you calling R_open_driver() in both parent and child? That would
explain why the read() call hangs.

In sync_driver() it hangs on read(). SIGALRM is recieved and dead()
executed, but module doesn't continue in execution - still hanging on read(),
until child is killed.

The timeout code relies upon signal() having the SysV semantics
(signals interrupt system calls); it won't work if signal() has the
BSD semantics (which is the case on Linux with glibc 2.x).

Using sigaction() (without SA_RESTART) works; try the attached patch.

However, I'm unsure as to the portability issues involved. E.g.
signal() is specified by ANSI (but its semantics aren't well defined),
while sigaction() is POSIX. OTOH, the new r.mapcalc uses sigaction()
to catch SIGFPE, and no-one has complained about that yet (but that
doesn't really say much; we only get regular feedback for Linux and
MacOSX, and occasional feedback for (specific versions of) Cygwin,
Irix and Solaris).

--
Glynn Clements <glynn.clements@virgin.net>

(attachments)

io.c-signal.diff (1.33 KB)

On Thu, Feb 13, 2003 at 11:23:29AM +0100, Radim Blazek wrote:

Hi,

I have one problem: if module creates child process by fork(),
it is impossible open driver by R_open_driver() in parent.
In sync_driver() it hangs on read(). SIGALRM is recieved and dead()
executed, but module doesn't continue in execution - still hanging on read(),
until child is killed.

The following seems to work okay for me. Perhaps you are trying to have
both parent and child connected to the same server at the same time?
The display server architecture only will process one connection at a
time.

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include "gis.h"

int main (int argc, char *argv)
{
   pid_t pid;
   G_gisinit(argv[0]);

   if ((pid = fork()) == 0) {
        /* child */
        sleep(5);
   } else if (pid > 0) {
        /* parent */
        if (R_open_driver() == 0) {
            fprintf (stderr, "Driver Opened\n");
            R_close_driver();
            fprintf (stderr, "Driver Closed\n");
        } else {
            /* failed to open driver ... */
            G_fatal_error("Failed to open driver\n");
        }
   } else {
       /* fork() failed */
       G_fatal_error ("Fork failed!\n");
   }

   if (pid > 0) {
       int status;
       waitpid (pid, &status, 0);
   }

   return 0;
}

--
echo ">gra.fcw@2ztr< eryyvZ .T pveR" | rot13 | reverse

Thank you for your prompt help,

On Thursday 13 February 2003 12:08 pm, you wrote:

Radim Blazek wrote:
> I have one problem: if module creates child process by fork(),
> it is impossible open driver by R_open_driver() in parent.

Are you calling R_open_driver() in both parent and child? That would
explain why the read() call hangs.

No, in parent only.

> In sync_driver() it hangs on read(). SIGALRM is recieved and dead()
> executed, but module doesn't continue in execution - still hanging on
> read(), until child is killed.

The timeout code relies upon signal() having the SysV semantics
(signals interrupt system calls); it won't work if signal() has the
BSD semantics (which is the case on Linux with glibc 2.x).

Using sigaction() (without SA_RESTART) works; try the attached patch.

However, I'm unsure as to the portability issues involved. E.g.
signal() is specified by ANSI (but its semantics aren't well defined),
while sigaction() is POSIX. OTOH, the new r.mapcalc uses sigaction()
to catch SIGFPE, and no-one has complained about that yet (but that
doesn't really say much; we only get regular feedback for Linux and
MacOSX, and occasional feedback for (specific versions of) Cygwin,
Irix and Solaris).

No, it did not help, but I was not precise enough in my first mail,
I discovered later that the problem appeares if fork() is executed
when driver is opened:

R_open_driver();
pid fork();
if (pid > 0) {
    R_close_driver();
    R_open_driver();
}

I can avoid this by closing monitor before fork() is called -
but is it final solution?

Thanks

Radim

Thanks for help,

On Thursday 13 February 2003 12:34 pm, Eric G. Miller wrote:

On Thu, Feb 13, 2003 at 11:23:29AM +0100, Radim Blazek wrote:
> Hi,
>
> I have one problem: if module creates child process by fork(),
> it is impossible open driver by R_open_driver() in parent.
> In sync_driver() it hangs on read(). SIGALRM is recieved and dead()
> executed, but module doesn't continue in execution - still hanging on
> read(), until child is killed.

The following seems to work okay for me. Perhaps you are trying to have
both parent and child connected to the same server at the same time?
The display server architecture only will process one connection at a
time.

No, just from parent.

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include "gis.h"

int main (int argc, char *argv)
{
   pid_t pid;
   G_gisinit(argv[0]);

   if ((pid = fork()) == 0) {
        /* child */
        sleep(5);
   } else if (pid > 0) {
        /* parent */
        if (R_open_driver() == 0) {
            fprintf (stderr, "Driver Opened\n");
            R_close_driver();
            fprintf (stderr, "Driver Closed\n");
        } else {
            /* failed to open driver ... */
            G_fatal_error("Failed to open driver\n");
        }
   } else {
       /* fork() failed */
       G_fatal_error ("Fork failed!\n");
   }

   if (pid > 0) {
       int status;
       waitpid (pid, &status, 0);
   }

   return 0;
}

Yes, this works, problem is if fork() is called with opened monitor:

int main (int argc, char *argv)
{
    int i;
    pid_t pid;
    G_gisinit(argv[0]);

    if (R_open_driver() == 0) fprintf (stderr, "Driver Opened\n");
    else G_fatal_error("Failed to open driver\n");

    if ((pid = fork()) == 0) { /* child */
        sleep(5);
        fprintf (stderr, "Child end\n");
    } else if (pid > 0) { /* parent */
        R_close_driver();
        fprintf (stderr, "Driver Closed\n");

        if (R_open_driver() == 0) {
            fprintf (stderr, "Driver Opened\n");
            R_close_driver();
            fprintf (stderr, "Driver Closed\n");
        } else {
            G_fatal_error("Failed to open driver\n");
        }
    } else { /* fork() failed */
       G_fatal_error ("Fork failed!\n");
    }

    if (pid > 0) {
       int status;
       waitpid (pid, &status, 0);
    }

    return 0;
}

Thank you anyway, with help of your example I discovered where
the problem is.

Radim

Does it work correctly to have the child also call R_close_driver()?

--
echo ">gra.fcw@2ztr< eryyvZ .T pveR" | rot13 | reverse

On Thu, Feb 13, 2003 at 03:04:54PM +0100, Radim Blazek wrote:

No, it did not help, but I was not precise enough in my first mail,
I discovered later that the problem appeares if fork() is executed
when driver is opened:

R_open_driver();
pid fork();
if (pid > 0) {
    R_close_driver();
    R_open_driver();
}

I can avoid this by closing monitor before fork() is called -
but is it final solution?

Maybe, just call R_close_driver() in the process that won't be using it.

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include "gis.h"
#include "raster.h"

int main (int argc, char *argv) {
    
   pid_t pid;
   G_gisinit(argv[0]);

   if (R_open_driver() != 0) {
       G_fatal_error ("R_open_driver");
   }
   if ((pid = fork()) == 0) {
        /* child */
        R_close_driver();
        fprintf (stderr, "Driver Closed in child\n");
        sleep(5);
   } else if (pid > 0) {
        /* parent */
       if (R_color(1) == 0) {
           fprintf (stderr, "R_color(1) called in parent\n");
       } else {
           fprintf (stderr, "R_color(1) failed\n");
       }
       R_close_driver();
   } else {
       /* fork() failed */
       G_fatal_error ("Fork failed!\n");
   }

   if (pid > 0) {
       int status;
       waitpid (pid, &status, 0);
   }

   return 0;

}

--
echo ">gra.fcw@2ztr< eryyvZ .T pveR" | rot13 | reverse

Yes, this works, but in fact I do not call fork() directly in module.
I call F_open() to open GUI form and this function creates child,
so child doesn't know if driver is open.
Can I get info if driver is open?

Radim

On Thursday 13 February 2003 04:58 pm, Eric G. Miller wrote:

Maybe, just call R_close_driver() in the process that won't be using it.

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include "gis.h"
#include "raster.h"

int main (int argc, char *argv) {

   pid_t pid;
   G_gisinit(argv[0]);

   if (R_open_driver() != 0) {
       G_fatal_error ("R_open_driver");
   }
   if ((pid = fork()) == 0) {
        /* child */
        R_close_driver();
        fprintf (stderr, "Driver Closed in child\n");
        sleep(5);
   } else if (pid > 0) {
        /* parent */
       if (R_color(1) == 0) {
           fprintf (stderr, "R_color(1) called in parent\n");
       } else {
           fprintf (stderr, "R_color(1) failed\n");
       }
       R_close_driver();
   } else {
       /* fork() failed */
       G_fatal_error ("Fork failed!\n");
   }

   if (pid > 0) {
       int status;
       waitpid (pid, &status, 0);
   }

   return 0;

}

Radim Blazek wrote:

> > In sync_driver() it hangs on read(). SIGALRM is recieved and dead()
> > executed, but module doesn't continue in execution - still hanging on
> > read(), until child is killed.
>
> The timeout code relies upon signal() having the SysV semantics
> (signals interrupt system calls); it won't work if signal() has the
> BSD semantics (which is the case on Linux with glibc 2.x).
>
> Using sigaction() (without SA_RESTART) works; try the attached patch.
>
> However, I'm unsure as to the portability issues involved. E.g.
> signal() is specified by ANSI (but its semantics aren't well defined),
> while sigaction() is POSIX. OTOH, the new r.mapcalc uses sigaction()
> to catch SIGFPE, and no-one has complained about that yet (but that
> doesn't really say much; we only get regular feedback for Linux and
> MacOSX, and occasional feedback for (specific versions of) Cygwin,
> Irix and Solaris).

No, it did not help,

With the patch, R_open_driver() shouldn't hang indefinitely in
sync_driver(); you should get a warning after 5 seconds then failure
after a further 10 seconds (15 seconds total). Basically, the second
R_open_driver() call should eventually fail, rather than blocking
indefinitely.

If SIGALRM fails to interrupt the read() call even with the patch,
then your OS kernel is broken.

There are two distinct issues here.

The first is that the child is holding the connection open, preventing
further connection attempts. This is essentially a problem with your
program, although the libraster API could be changed so as to prevent
it.

The second is that the existing timeout code doesn't work if signal()
has BSD semantics.

The patch only attempts to fix the second problem (which is a bug in
libraster), not the first (which is both a bug in your program and a
limitation of the current libraster API).

but I was not precise enough in my first mail,
I discovered later that the problem appeares if fork() is executed
when driver is opened:

R_open_driver();
pid fork();
if (pid > 0) {
    R_close_driver();
    R_open_driver();
}

I can avoid this by closing monitor before fork() is called -
but is it final solution?

At present, yes; otherwise the child will hold the connection open,
preventing subsequent connections.

Radim Blazek wrote (subsequently):

Yes, this works, but in fact I do not call fork() directly in module.
I call F_open() to open GUI form and this function creates child,
so child doesn't know if driver is open.

Presumably the child calls one of the exec*() functions shortly after
the fork()? In which case, we should probably set the close-on-exec
flag on monitor connections:

  flags = fcntl(_wfd, F_GETFD);
  flags |= FD_CLOEXEC;
  fcntl(_wfd, F_SETFD, flags);

This will force the descriptor to be closed automatically upon
exec*().

Can I get info if driver is open?

No.

Eric G. Miller wrote:

Does it work correctly to have the child also call R_close_driver()?

In general, it won't; R_close_driver() calls R_stabilize(), which
sends data to the monitor; that data may end up being interleaved with
data from the other process.

This is similar to the problem of using fork() with ANSI I/O; if any
output streams have buffered data, it may end up being written twice.

More generally, fork() and high-level APIs don't mix.

--
Glynn Clements <glynn.clements@virgin.net>

On Thursday 13 February 2003 06:25 pm, Glynn Clements wrote:

With the patch, R_open_driver() shouldn't hang indefinitely in
sync_driver(); you should get a warning after 5 seconds then failure
after a further 10 seconds (15 seconds total). Basically, the second
R_open_driver() call should eventually fail, rather than blocking
indefinitely.

If SIGALRM fails to interrupt the read() call even with the patch,
then your OS kernel is broken.

Yes, I have got:
Warning - no response from graphics monitor <x0>.
Check to see if the mouse is still active.
ERROR - no response from graphics monitor <x0>.
Monitor <x0>: Caught SIGPIPE

Presumably the child calls one of the exec*() functions shortly after
the fork()?

Yes.

In which case, we should probably set the close-on-exec
flag on monitor connections:

  flags = fcntl(_wfd, F_GETFD);
  flags |= FD_CLOEXEC;
  fcntl(_wfd, F_SETFD, flags);

This will force the descriptor to be closed automatically upon
exec*().

OK. Could you do that as xdriver expert?

More generally, fork() and high-level APIs don't mix.

Yes, I see, it is old story - better framework for GUI.

Thank you a lot.
Radim

Radim Blazek wrote:

> With the patch, R_open_driver() shouldn't hang indefinitely in
> sync_driver(); you should get a warning after 5 seconds then failure
> after a further 10 seconds (15 seconds total). Basically, the second
> R_open_driver() call should eventually fail, rather than blocking
> indefinitely.
>
> If SIGALRM fails to interrupt the read() call even with the patch,
> then your OS kernel is broken.

Yes, I have got:
Warning - no response from graphics monitor <x0>.
Check to see if the mouse is still active.
ERROR - no response from graphics monitor <x0>.
Monitor <x0>: Caught SIGPIPE

OK; the patch works as intended, then.

In which case, should this be changed for 5.0.x? I'm not sure that it
should. While it does technically fix a bug, it's a bug that shouldn't
be triggered in normal use; the situation (where a client tries to
connect to a monitor while it is in use by another client) shouldn't
happen.

> Presumably the child calls one of the exec*() functions shortly after
> the fork()?

Yes.

> In which case, we should probably set the close-on-exec
> flag on monitor connections:
>
> flags = fcntl(_wfd, F_GETFD);
> flags |= FD_CLOEXEC;
> fcntl(_wfd, F_SETFD, flags);
>
> This will force the descriptor to be closed automatically upon
> exec*().

OK. Could you do that as xdriver expert?

Well, it's a libraster issue rather than anything to do with the
monitors themselves.

As for implementation, again: should this go into 5.0.x?

> More generally, fork() and high-level APIs don't mix.

Yes, I see, it is old story - better framework for GUI.

I wasn't referring to GRASS APIs in particular; many third-party
libraries have highly "undefined" behaviour w.r.t. fork().

--
Glynn Clements <glynn.clements@virgin.net>

On Thursday 13 February 2003 08:58 pm, Glynn Clements wrote:

OK; the patch works as intended, then.

In which case, should this be changed for 5.0.x? I'm not sure that it
should. While it does technically fix a bug, it's a bug that shouldn't
be triggered in normal use; the situation (where a client tries to
connect to a monitor while it is in use by another client) shouldn't
happen.

I think that if GUI (TclTk) is used, it may happen often. Question is
if in that case better is when module crashes or waits until the
second (first) one finishes.
Raster lib is probably stable now and no changes are expected in 5.0
so we could add patched io.c to 5.1 (where the problem appeares).

> > In which case, we should probably set the close-on-exec
> > flag on monitor connections:
> >
> > flags = fcntl(_wfd, F_GETFD);
> > flags |= FD_CLOEXEC;
> > fcntl(_wfd, F_SETFD, flags);
> >
> > This will force the descriptor to be closed automatically upon
> > exec*().
>
> OK. Could you do that as xdriver expert?

Well, it's a libraster issue rather than anything to do with the
monitors themselves.

As for implementation, again: should this go into 5.0.x?

As above, add patched io.c to 5.1.

> > More generally, fork() and high-level APIs don't mix.
>
> Yes, I see, it is old story - better framework for GUI.

I wasn't referring to GRASS APIs in particular; many third-party
libraries have highly "undefined" behaviour w.r.t. fork().

DBMI drivers are also started by fork()+execl(), may it be a problem?
I found for example, that it is impossible to start and close more drivers
in 'FILO' order, driver opened as last must be closed first.
Is it bug in library? Somewhere in db_shutdown_driver (driver):?
/* wait for the driver to finish */
    status = -1;
    while ((pid = wait(&status)) > 0 && pid != driver->pid) {}

Radim

Radim Blazek wrote:

> OK; the patch works as intended, then.
>
> In which case, should this be changed for 5.0.x? I'm not sure that it
> should. While it does technically fix a bug, it's a bug that shouldn't
> be triggered in normal use; the situation (where a client tries to
> connect to a monitor while it is in use by another client) shouldn't
> happen.

I think that if GUI (TclTk) is used, it may happen often. Question is
if in that case better is when module crashes or waits until the
second (first) one finishes.
Raster lib is probably stable now and no changes are expected in 5.0
so we could add patched io.c to 5.1 (where the problem appeares).

Well, you can *make* the problem appear on 5.0.x, e.g. by running an
interactive display command in the background, then running another
display command.

The point is that sync_driver() is clearly intended to time out
eventually, but the code relies upon signal() having the SysV
semantics (which isn't the case for Linux/glibc2, which is probably
~90% of GRASS' user base).

> > > More generally, fork() and high-level APIs don't mix.
> >
> > Yes, I see, it is old story - better framework for GUI.
>
> I wasn't referring to GRASS APIs in particular; many third-party
> libraries have highly "undefined" behaviour w.r.t. fork().

DBMI drivers are also started by fork()+execl(), may it be a problem?

Well, I consider a native Win32 port to be desirable, so *all* use of
fork() is problematic in that sense.

Give some consideration to redesigning the DBMI driver interface to
use e.g. dlopen() (LoadLibrary() on Windows); most modern OSes have
equivalent functionality (besides dlopen(), XFree86 4.x includes code
for dynamically loading a.out, ELF and COFF files).

I found for example, that it is impossible to start and close more drivers
in 'FILO' order, driver opened as last must be closed first.
Is it bug in library? Somewhere in db_shutdown_driver (driver):?
/* wait for the driver to finish */
    status = -1;
    while ((pid = wait(&status)) > 0 && pid != driver->pid) {}

It looks like the same problem; when you spawn the second driver, it
will inherit from the parent the descriptors for the client's end of
the connection to the first driver. The first driver won't receive EOF
until *all* copies of the other end of the pipe have been closed, i.e.
until the parent closes its copy *and* the second driver is dead.

Again, one solution would be to set the close-on-exec flag on the
descriptors, in db_start_driver().

Basically, when spawning a subprocess, you normally want to close all
descriptors except those which are actually meant to be inherited. Of
course, that means that the code which performs the fork() must
actually be capable of enumerating the set of open descriptors. This
is why fork() and high-level APIs don't mix.

Using the close-on-exec flag only works if the library which allocates
the descriptor sets it; and there isn't a close-on-fork flag, so
fork-without-exec is problematic.

--
Glynn Clements <glynn.clements@virgin.net>