[GRASS-dev] native WinGRASS and attribute data deadlock, next try

mlennert · September 16, 2007, 3:26pm

Hello,

I have tried to get some more debug output for the wingrass db deadlock
problem. I started from Benjamin's work, and tried to apply Glynn's
suggestions. I have to admit, though, that I still do not completely
understand xdr, so I'm not sure if what I did makes sense.

In order to not have different debug outputs mingle, I had to create 8 log
files, one for each function and for each stream. See xdr_logs.zip.

I also attached the original xdr_stdio.c and a diff with everything I added.

When I now run our test v.out.ogr line:

GRASS 6.3.cvs (qgis-test):C:\GRASSSRC\grass6\bin.i686-pc-mingw32

v.out.ogr points type=point dsn=c:\test

I get the message:

Exporting 2898 points/lines...

And then nothing else. The percentage counter never appears.

Please tell me if these logs are in any way helpful and, if not, what I
could do to improve them.

Moritz

(attachments)

xdr_logs.zip (12.2 KB)
xdr_stdio.zip (3.54 KB)

Glynn_Clements1 · September 18, 2007, 7:38pm

Moritz Lennert wrote:

I have tried to get some more debug output for the wingrass db deadlock
problem. I started from Benjamin's work, and tried to apply Glynn's
suggestions. I have to admit, though, that I still do not completely
understand xdr, so I'm not sure if what I did makes sense.

In order to not have different debug outputs mingle, I had to create 8 log
files, one for each function and for each stream. See xdr_logs.zip.

I also attached the original xdr_stdio.c and a diff with everything I added.

When I now run our test v.out.ogr line:

GRASS 6.3.cvs (qgis-test):C:\GRASSSRC\grass6\bin.i686-pc-mingw32
>v.out.ogr points type=point dsn=c:\test

I get the message:

Exporting 2898 points/lines...

And then nothing else. The percentage counter never appears.

Please tell me if these logs are in any way helpful and, if not, what I
could do to improve them.

The first things which spring to mind are:

1. There needs to be a single log file for each process, i.e. one for
the client, one for the driver. It should be possible to display the
files side-by-side and match each "write" from the client with the
corresponding "read" from the driver and vice-versa.

2. The debug code needs to call fflush() after each record. Otherwise,
data will be missing from the end of the file if the process
terminates abnormally (see getlong1.txt). Moreover, the part which is
missing is likely to be the most useful.

3. The actual data which is read or written should be included in the
log entry.

--
Glynn Clements <glynn@gclements.plus.com>

mlennert · September 19, 2007, 9:40am

Thanks for the help, Glynn.

On 18/09/07 21:38, Glynn Clements wrote:

The first things which spring to mind are:

1. There needs to be a single log file for each process, i.e. one for
the client, one for the driver. It should be possible to display the
files side-by-side and match each "write" from the client with the
corresponding "read" from the driver and vice-versa.

As I said, I'm not sure I understand xdr enough to grasp everything...(nor C for that matter).

Just to make sure I do grasp enought to go on:

We have two streams (*1 and *2) and in each stream we have 'puts' and 'gets'. Thus, if you look at putlong1.txt and getlong1.txt you see the corresponding writes and reads. However, in each stream it is either longs or bytes that are moved and I separated these. I will try to rather create "putstream1", "getstream1" logs which mix bytes and longs.

Is this what you mean ?

2. The debug code needs to call fflush() after each record. Otherwise,
data will be missing from the end of the file if the process
terminates abnormally (see getlong1.txt). Moreover, the part which is
missing is likely to be the most useful.

I thought the setbuf(debug, NULL); took care of that. Will add fflush().

3. The actual data which is read or written should be included in the
log entry.

Isn't this what is in the buffer= field ?

For longs we have

xdrstdio_getlong(xdrs, lp)
XDR *xdrs;
register long *lp;

So lp points to the value, or ?

I have

fprintf(debug, "\tbuffer = %x", *lp)

which I adapted from Benjamin's code:

fprintf ( stderr, "\t addr (%i); new val = %i.\n", lp, *lp );

Should this be &lp instead of *lp ?

For bytes we have:

xdrstdio_getbytes(xdrs, addr, len)
  XDR *xdrs;
  caddr_t addr;
  u_int len;

So here, addr is the address of the value, or ? How do I access the actual value ?

Moritz

Glynn_Clements1 · September 19, 2007, 8:19pm

Moritz Lennert wrote:

> The first things which spring to mind are:
>
> 1. There needs to be a single log file for each process, i.e. one for
> the client, one for the driver. It should be possible to display the
> files side-by-side and match each "write" from the client with the
> corresponding "read" from the driver and vice-versa.

As I said, I'm not sure I understand xdr enough to grasp
everything...(nor C for that matter).

Just to make sure I do grasp enought to go on:

We have two streams (*1 and *2) and in each stream we have 'puts' and
'gets'. Thus, if you look at putlong1.txt and getlong1.txt you see the
corresponding writes and reads. However, in each stream it is either
longs or bytes that are moved and I separated these. I will try to
rather create "putstream1", "getstream1" logs which mix bytes and longs.

Is this what you mean ?

That's part of it.

There are two processes, the client and driver, connected by a pair of
pipes. Each process has a descriptor for the write end of one pipe and
the read end of another.

There should be one log file per process, containing both reads and
writes. If the log files are viewed side by side, you would expect to
see e.g.:

client driver

  putlong STREAM2 4 bytes getlong STREAM1 4 bytes
  getbytes STREAM1 99 bytes putbytes STREAM2 99 bytes
  ...

Each process will need to use a different name for its logfile.

> 2. The debug code needs to call fflush() after each record. Otherwise,
> data will be missing from the end of the file if the process
> terminates abnormally (see getlong1.txt). Moreover, the part which is
> missing is likely to be the most useful.

I thought the setbuf(debug, NULL); took care of that. Will add fflush().

Hmm; the setbuf() should suffice. There may be other reasons why
getlong1.txt was truncated, e.g. both processes writing to the same
log file.

> 3. The actual data which is read or written should be included in the
> log entry.

Isn't this what is in the buffer= field ?

For longs we have

xdrstdio_getlong(xdrs, lp)
XDR *xdrs;
register long *lp;

So lp points to the value, or ?

I have

fprintf(debug, "\tbuffer = %x", *lp)

which I adapted from Benjamin's code:

fprintf ( stderr, "\t addr (%i); new val = %i.\n", lp, *lp );

Should this be &lp instead of *lp ?

No, this part is okay.

For bytes we have:

xdrstdio_getbytes(xdrs, addr, len)
  XDR *xdrs;
  caddr_t addr;
  u_int len;

So here, addr is the address of the value, or ? How do I access the
actual value ?

  int i;
  for (i = 0; i < len; i++)
    fprintf(debug, "%02x ", ((unsigned char *)addr)[i]);

--
Glynn Clements <glynn@gclements.plus.com>

mlennert · September 20, 2007, 10:03pm

Here's a new try at getting useful log output and actually I do get
strange results.

Running 'v.out.ogr points type=point dsn=c:\test' I still don't get any
message (except for once where '1%' appeared). I can see dbf.exe in the
windows task manager, but no I/O activity whatsoever. However, when I open
the resulting log file in oocalc, it is 'read-only' which makes me suppose
that the file is not closed yet. It seems as if the process hangs in the
middle of a function, which might explain the uncomplete lines we've
already seen towards the end of the log file. I always have to Ctrl-C to
get out of the command and to get write access to the log files.

I have no idea how to identify which is the driver and which the client -
and also no idea if this makes any difference - so I just called them
side1 and side2 in the code. In the attached log file I already put them
next to each other.

Some of the log files do not show anything interesting (AFAICS), but in
some one of the processes (the one on the sending side in stream 1)
creates much more output than the other, and just before the other (i.e.
the receiving side) stops, the values begin to differ with the receiving
side truncating a few zeros of the value (see lines 23ff in log file).

The layout of the log file is

Function,Stream,Bytes,Value,Function,Stream,Bytes,Value,Bytestest,Valuetest

where the first 4 are from one side and the second 4 from the other side.

Bytestest and Valuetest give the result of a simple =if(x<>y;"ERROR";"OK")
test in oocalc for the "Bytes" and "Values" columns. Just makes it faster
to spot the errors.

Hope this helps identify the problem.

I've also attached the diff to xdr_stdio.c if anyone else wants to try on
a different machine.

Moritz

(attachments)

xdr_stdio.zip (2.08 KB)

Glynn_Clements1 · September 22, 2007, 3:27am

Moritz Lennert wrote:

Here's a new try at getting useful log output and actually I do get
strange results.

Running 'v.out.ogr points type=point dsn=c:\test' I still don't get any
message (except for once where '1%' appeared). I can see dbf.exe in the
windows task manager, but no I/O activity whatsoever. However, when I open
the resulting log file in oocalc, it is 'read-only' which makes me suppose
that the file is not closed yet. It seems as if the process hangs in the
middle of a function, which might explain the uncomplete lines we've
already seen towards the end of the log file. I always have to Ctrl-C to
get out of the command and to get write access to the log files.

I have no idea how to identify which is the driver and which the client -

The communication starts with the driver writing a success/failure
code. So, in your log.csv file, the left-hand columns are the client
and the right-hand columns are the driver.

and also no idea if this makes any difference - so I just called them
side1 and side2 in the code. In the attached log file I already put them
next to each other.

Some of the log files do not show anything interesting (AFAICS), but in
some one of the processes (the one on the sending side in stream 1)
creates much more output than the other, and just before the other (i.e.
the receiving side) stops, the values begin to differ with the receiving
side truncating a few zeros of the value (see lines 23ff in log file).

Yep. Converting them to formatted text, the server side has:

20 putbytes STREAM1 3 63 61 74
21 putbytes STREAM1 1 00
22 putlong STREAM1 4 00 00 00 01
23 putlong STREAM1 4 00 00 00 00
24 putlong STREAM1 4 00 00 00 03
25 putlong STREAM1 4 00 00 00 02

while the client has:

20 getbytes STREAM1 3 63 61 74
21 getbytes STREAM1 1 00
22 getlong STREAM1 4 00 00 01 00
23 getlong STREAM1 4 00 00 00 00
24 getlong STREAM1 4 00 00 03 00
25 getlong STREAM1 4 00 00 02 00

At line 22, a zero goes missing from the stream. From there on, all
bets are off.

It doesn't stop there; at line 27 or 28, another zero goes missing.

My next suggestion is to re-write xdr_stdio to use read() and write()
instead of fread() and fwrite(), e.g.:

- if (fread((caddr_t)lp, sizeof(long), 1, (FILE *)xdrs->x_private) != 1)
+ if (read(fileno((FILE *)xdrs->x_private), (caddr_t)lp, sizeof(long)) != sizeof(long))

That should tell us whether the problem is with the MSVCRT stdio
implementation or a lower level. If it's stdio, we can replace that
with read/write. If it still manifests itself with those functions, we
may need to use ReadFile/WriteFile instead. Or a named pipe.

--
Glynn Clements <glynn@gclements.plus.com>

mlennert · September 23, 2007, 7:04pm

On Sat, September 22, 2007 05:27, Glynn Clements wrote:

My next suggestion is to re-write xdr_stdio to use read() and write()
instead of fread() and fwrite(), e.g.:

- if (fread((caddr_t)lp, sizeof(long), 1, (FILE *)xdrs->x_private) != 1)
+ if (read(fileno((FILE *)xdrs->x_private), (caddr_t)lp, sizeof(long)) !=
sizeof(long))

That seems to work !
I just changed the four occurrences of fread/fwrite to read/write (see
diff attached) and now v.out.ogr runs smoothly both with Benjamin's test
point data and with the spearfish landcover areas.

I'll post new binaries as soon as compilation is over, so that others can
test as well.

Thanks a lot, Glynn !

Moritz

(attachments)

xdr_stdio.diff (1.13 KB)

Glynn_Clements1 · September 24, 2007, 1:41am

Moritz Lennert wrote:

> My next suggestion is to re-write xdr_stdio to use read() and write()
> instead of fread() and fwrite(), e.g.:
>
> - if (fread((caddr_t)lp, sizeof(long), 1, (FILE *)xdrs->x_private) != 1)
> + if (read(fileno((FILE *)xdrs->x_private), (caddr_t)lp, sizeof(long)) !=
> sizeof(long))

That seems to work !

Now that's strange. I was expecting it to provide more information on
the errors, not to work.

On Unix, read/write can return short counts on pipes. If a process
tries to write 4 bytes to a pipe, and there's only enough space for 2
bytes, write() will return 2 rather than waiting for the pipe to
drain. Similarly if you try to read more data than is available in the
pipe.

If you want to block until the requested amount of data has been read
or written, you need to either use readn/writen (which are
non-standard), or implement them yourself, e.g.:

ssize_t readn(int fd, void *buf, size_t count)
{
ssize_t total = 0;

  while (total < count)
  {
    ssize_t n = read(fd, (char *) buf + total, count - total)
    if (n < 0)
      return n;
    if (n == 0)
      break;
    total += n;
  }

return total;
}

ssize_t writen(int fd, const void *buf, size_t count)
{
ssize_t total = 0;

  while (total < count)
  {
    ssize_t n = write(fd, (const char *) buf + total, count - total)
    if (n < 0)
      return n;
    if (n == 0)
      break;
    total += n;
  }

return total;
}

The above assumes that non-blocking I/O hasn't been enabled for the
descriptor. If it has, you need to either disable it for the duration
of the function or use select() or poll() to wait until the descriptor
is ready.

I'll post new binaries as soon as compilation is over, so that others can
test as well.

If you want to test it, I'd suggest reducing the size of the pipes in
lib/db/dbmi_client/start.c. A larger pipe won't prevent errors, but it
may make them less likely to show up.

--
Glynn Clements <glynn@gclements.plus.com>

mlennert · October 4, 2007, 4:38pm

On 24/09/07 03:41, Glynn Clements wrote:

Moritz Lennert wrote:

My next suggestion is to re-write xdr_stdio to use read() and write()
instead of fread() and fwrite(), e.g.:

- if (fread((caddr_t)lp, sizeof(long), 1, (FILE *)xdrs->x_private) != 1)
+ if (read(fileno((FILE *)xdrs->x_private), (caddr_t)lp, sizeof(long)) !=
sizeof(long))

That seems to work !

Now that's strange. I was expecting it to provide more information on
the errors, not to work.

On Unix, read/write can return short counts on pipes. If a process
tries to write 4 bytes to a pipe, and there's only enough space for 2
bytes, write() will return 2 rather than waiting for the pipe to
drain. Similarly if you try to read more data than is available in the
pipe.

If you want to block until the requested amount of data has been read
or written, you need to either use readn/writen (which are
non-standard), or implement them yourself, e.g.:

ssize_t readn(int fd, void *buf, size_t count)
{
  ssize_t total = 0;

  while (total < count)
  {
    ssize_t n = read(fd, (char *) buf + total, count - total)
    if (n < 0)
      return n;
    if (n == 0)
      break;
    total += n;
  }

  return total;
}

ssize_t writen(int fd, const void *buf, size_t count)
{
  ssize_t total = 0;

  while (total < count)
  {
    ssize_t n = write(fd, (const char *) buf + total, count - total)
    if (n < 0)
      return n;
    if (n == 0)
      break;
    total += n;
  }

  return total;
}

The above assumes that non-blocking I/O hasn't been enabled for the
descriptor. If it has, you need to either disable it for the duration
of the function or use select() or poll() to wait until the descriptor
is ready.

I'll post new binaries as soon as compilation is over, so that others can
test as well.

If you want to test it, I'd suggest reducing the size of the pipes in
lib/db/dbmi_client/start.c. A larger pipe won't prevent errors, but it
may make them less likely to show up.

I changes the values in lines 146 & 147 and went down all the way to 2 ( from the current setting of 250000): I cannot reproduce the deadlock...v.out.ogr works as it should.

So, where should we go from here ? Is it still wiser to implement readn/writen as you suggest above ? If yes, where and how do I see if non-blocking I/O is enabled or not (on MSDN it says: "In multithreaded programs, no locking is performed. The file descriptors returned are newly opened and should not be referenced by any thread until after the _pipe call is complete." - Is this what you mean ?) ?

Benjamin, could you try the simple write/read (instead of fwrite/fread) solution, so that we can make sure that this works for someone else ?

Moritz

Benjamin_Ducke3 · October 4, 2007, 6:43pm

I'd love to give this a spin on a Win2000 and XP box.
Unfortunately, I don't have acccess to one right now.

Moritz, could you email all the files you changed?
I will try to get hold of a Windows machine to compile
and test as soon as possible.

Benjamin

Moritz Lennert wrote:

On 24/09/07 03:41, Glynn Clements wrote:

Moritz Lennert wrote:

My next suggestion is to re-write xdr_stdio to use read() and write()
instead of fread() and fwrite(), e.g.:

- if (fread((caddr_t)lp, sizeof(long), 1, (FILE *)xdrs->x_private) != 1)
+ if (read(fileno((FILE *)xdrs->x_private), (caddr_t)lp, sizeof(long)) !=
sizeof(long))

That seems to work !

Now that's strange. I was expecting it to provide more information on
the errors, not to work.

On Unix, read/write can return short counts on pipes. If a process
tries to write 4 bytes to a pipe, and there's only enough space for 2
bytes, write() will return 2 rather than waiting for the pipe to
drain. Similarly if you try to read more data than is available in the
pipe.

If you want to block until the requested amount of data has been read
or written, you need to either use readn/writen (which are
non-standard), or implement them yourself, e.g.:

ssize_t readn(int fd, void *buf, size_t count)
{
    ssize_t total = 0;

    while (total < count)
    {
        ssize_t n = read(fd, (char *) buf + total, count - total)
        if (n < 0)
            return n;
        if (n == 0)
            break;
        total += n;
    }

    return total;
}

ssize_t writen(int fd, const void *buf, size_t count)
{
    ssize_t total = 0;

    while (total < count)
    {
        ssize_t n = write(fd, (const char *) buf + total, count - total)
        if (n < 0)
            return n;
        if (n == 0)
            break;
        total += n;
    }

    return total;
}

The above assumes that non-blocking I/O hasn't been enabled for the
descriptor. If it has, you need to either disable it for the duration
of the function or use select() or poll() to wait until the descriptor
is ready.

I'll post new binaries as soon as compilation is over, so that others can
test as well.

If you want to test it, I'd suggest reducing the size of the pipes in
lib/db/dbmi_client/start.c. A larger pipe won't prevent errors, but it
may make them less likely to show up.

I changes the values in lines 146 & 147 and went down all the way to 2 ( from the current setting of 250000): I cannot reproduce the deadlock...v.out.ogr works as it should.

So, where should we go from here ? Is it still wiser to implement readn/writen as you suggest above ? If yes, where and how do I see if non-blocking I/O is enabled or not (on MSDN it says: "In multithreaded programs, no locking is performed. The file descriptors returned are newly opened and should not be referenced by any thread until after the _pipe call is complete." - Is this what you mean ?) ?

Benjamin, could you try the simple write/read (instead of fwrite/fread) solution, so that we can make sure that this works for someone else ?

Moritz

--
Benjamin Ducke, M.A.
Archäoinformatik
(Archaeoinformation Science)
Institut für Ur- und Frühgeschichte
(Inst. of Prehistoric and Historic Archaeology)
Christian-Albrechts-Universität zu Kiel
Johanna-Mestorf-Straße 2-6
D 24098 Kiel
Germany

Tel.: ++49 (0)431 880-3378 / -3379
Fax : ++49 (0)431 880-7300
www.uni-kiel.de/ufg

Glynn_Clements1 · October 4, 2007, 9:28pm

Moritz Lennert wrote:

So, where should we go from here ? Is it still wiser to implement
readn/writen as you suggest above ?

This is where it gets awkward.

GRASS uses XDR/RPC for two purposes: libgis uses it for reading and
writing FP raster maps, and DBMI uses it for communication between the
client and driver.

Changing the XDR library to use read/write will probably kill
performance for raster I/O, as you lose the buffering.

DBMI disables buffering on the streams, so the lack of buffering won't
matter for DBMI.

OTOH, the raster I/O only uses XDR on files, so the problems with
fread/fwrite on pipes don't apply there.

My preferred approach would be to change lib/db/dbmi_base to simply
not use XDR (that isn't anywhere near as much work as it might sound).

As the driver and client always run on the same system, it doesn't
matter if the protocol is platform-dependent (I have no idea what
Radim was thinking when he decided to use XDR for the DBMI
communication).

The dbmi_base library uses the following functions from XDR:

  xdr_char
  xdr_double
  xdr_float
  xdr_int
  xdr_short

xdr_string

xdrstdio_create

The first five all simply read/write the specified value in a fixed
(i.e. non-platform-dependent) format; i.e. convert to big-endian
order, and convert FP values to IEEE format. For DBMI, we can just use
the host's native format, so the first five all amount to calling
read/write on the value.

xdr_string is slightly more complex: read/write the length (including
the terminating NUL) as an unsigned int, followed by the bytes.

xdrstdio_create just sets everything up so that the read/write
operations go through fread/fwrite (as opposed to e.g. xdrmem_create
which sets up for reading/writing from memory).

IOW, the actual implementation of an XDR replacement is trivial. So
trivial that you would just inline most of it into the
db__{send,recv}_* functions.

It's changing the dbmi_base library to use it which will be most of
the work. You would probably want separate put/get functions rather
than the XDR mechanism of setting the "direction" when creating the
XDR object and having a single function for both read and write.

E.g. db__send_int() would change from:

  int
  db__send_int(int n)
  {
      XDR xdrs;
      int stat;

      stat = DB_OK;

      xdr_begin_send (&xdrs);
      if(!xdr_int (&xdrs, &n))
          stat = DB_PROTOCOL_ERR;
      xdr_end_send (&xdrs);

      if (stat == DB_PROTOCOL_ERR)
          db_protocol_error();
      return stat;
  }

to something like:

  int
  db__send_int(int n)
  {
      int stat = DB_OK;

      if (!db__send(&n, sizeof(n)))
          stat = DB_PROTOCOL_ERR;

      if (stat == DB_PROTOCOL_ERR)
          db_protocol_error();
      return stat;
  }

with db__send() defined as:

  int
  db__send(void *buf, size_t size)
  {
      return writen(_send_fd, buf, size) == size;
  }

[Actually, you would probably inline writen() here, as this is the
only place it would be used.]

If yes, where and how do I see if non-blocking I/O is enabled or not

There is no need; the descriptors for the DBMI pipes will never be
non-blocking. I was just commenting that a "real" readn/writen
implementation (as is found on some Unices) is a bit more complex, but
we don't need that for our purposes.

(on MSDN it says: "In multithreaded
programs, no locking is performed. The file descriptors returned are
newly opened and should not be referenced by any thread until after the
_pipe call is complete." - Is this what you mean ?) ?

No, that's something else, and isn't relevant here.

--
Glynn Clements <glynn@gclements.plus.com>

mlennert · October 5, 2007, 8:16am

On 04/10/07 23:28, Glynn Clements wrote:

Moritz Lennert wrote:

So, where should we go from here ? Is it still wiser to implement readn/writen as you suggest above ?

This is where it gets awkward.

Reading on below it actually seems quite straightforward.

GRASS uses XDR/RPC for two purposes: libgis uses it for reading and writing FP raster maps, and DBMI uses it for communication between
the client and driver.

So we should probably thoroughly test libgis' uses of it in windows. What would be the best way to do this (i.e. in which circumstances/modules is the xdr part of libgis called) ?

[...]

My preferred approach would be to change lib/db/dbmi_base to simply not use XDR (that isn't anywhere near as much work as it might
sound).

As the driver and client always run on the same system, it doesn't matter if the protocol is platform-dependent (I have no idea what Radim was thinking when he decided to use XDR for the DBMI communication).

I always had the same question, but (in my ignorance) I thought that it
might make a difference if the actual database was on a different
system. If it doesn't, then let's drop xdr for DBMI (we do really need
it for libgis, I suppose ?).

The dbmi_base library uses the following functions from XDR:

xdr_char xdr_double xdr_float xdr_int xdr_short

xdr_string

xdrstdio_create

The first five all simply read/write the specified value in a fixed (i.e. non-platform-dependent) format; i.e. convert to big-endian order, and convert FP values to IEEE format. For DBMI, we can just
use the host's native format, so the first five all amount to calling
read/write on the value.

xdr_string is slightly more complex: read/write the length (including
the terminating NUL) as an unsigned int, followed by the bytes.

xdrstdio_create just sets everything up so that the read/write operations go through fread/fwrite (as opposed to e.g. xdrmem_create which sets up for reading/writing from memory).

IOW, the actual implementation of an XDR replacement is trivial. So trivial that you would just inline most of it into the db__{send,recv}_* functions.

It's changing the dbmi_base library to use it which will be most of the work. You would probably want separate put/get functions rather than the XDR mechanism of setting the "direction" when creating the XDR object and having a single function for both read and write.

E.g. db__send_int() would change from:

int db__send_int(int n) { XDR xdrs; int stat; stat = DB_OK; xdr_begin_send (&xdrs); if(!xdr_int (&xdrs, &n)) stat =
DB_PROTOCOL_ERR; xdr_end_send (&xdrs); if (stat == DB_PROTOCOL_ERR) db_protocol_error(); return stat; }

to something like:

int db__send_int(int n) { int stat = DB_OK; if (!db__send(&n,
sizeof(n))) stat = DB_PROTOCOL_ERR; if (stat == DB_PROTOCOL_ERR) db_protocol_error(); return stat; }

with db__send() defined as:

int db__send(void *buf, size_t size) { return writen(_send_fd, buf,
size) == size; }

[Actually, you would probably inline writen() here, as this is the only place it would be used.]

I will need a while understanding this, especially understanding which files need replacement.

Currently we have the following files in dbmi_base (ls -1 xdr*):

xdr.h
xdr.c
xdrchar.c
xdrcolumn.c
xdrdatetime.c
xdrdouble.c
xdrfloat.c
xdrhandle.c
xdrindex.c
xdrint.c
xdrprocedure.c
xdrshort.c
xdrstring.c
xdrtable.c
xdrtoken.c
xdrvalue.c

of these only the following reference xdr.h ( grep -l xdr.h *.c)

xdr.c
xdrchar.c
xdrdouble.c
xdrfloat.c
xdrint.c
xdrprocedure.c
xdrshort.c
xdrstring.c

xdrchar.c, xdrdouble.c, xdrfloat.c, xdrint.c, xdrshort.c, xdrstring.c contain the functions you explain how to change above. The only place where they are called is in the macros.h in the same directory. I imagine that xdr.c and xdrprocedure.c can just go to the bin.

So "all" we need to do is create new files containing the functions called in macros.h according to the model you describe above.

What about the other xdr* files which do not actually directly use any xdr calls (xdrhandle.c, xdrindex.c, xdrtable.c, xdrtoken.c, xdrvalue.c). Should we just leave them as they are ? Should we rename them to avoid future confusion ?

Moritz

mlennert · October 5, 2007, 8:19am

On 04/10/07 20:43, Benjamin Ducke wrote:

I'd love to give this a spin on a Win2000 and XP box.
Unfortunately, I don't have acccess to one right now.

Moritz, could you email all the files you changed?
I will try to get hold of a Windows machine to compile
and test as soon as possible.

All you need is the attached diff to xdr_stdio.c.

However, Glynn has suggested dropping xdr altogether for dbmi communication, so this will become obsolete.

Moritz

(attachments)

xdr_stdio.diff (1.13 KB)

Benjamin_Ducke3 · October 5, 2007, 10:35am

OK, how about putting some defs into xdr_stdio.c, so that
unbuffered i/o only gets compiled on Win32?
For the time being, this would mean losing raster performance
on native win, but in exchange having full vector functionality.
Not a bad deal until xdr_stdio.c gets a full re-write, I would
think.

Ben

On 04/10/07 20:43, Benjamin Ducke wrote:

I'd love to give this a spin on a Win2000 and XP box.
Unfortunately, I don't have acccess to one right now.

Moritz, could you email all the files you changed?
I will try to get hold of a Windows machine to compile
and test as soon as possible.

All you need is the attached diff to xdr_stdio.c.

However, Glynn has suggested dropping xdr altogether for dbmi
communication, so this will become obsolete.

Moritz

Glynn_Clements1 · October 5, 2007, 3:04pm

Moritz Lennert wrote:

>> So, where should we go from here ? Is it still wiser to implement
>> readn/writen as you suggest above ?
>
> This is where it gets awkward.

Reading on below it actually seems quite straightforward.

> GRASS uses XDR/RPC for two purposes: libgis uses it for reading and
> writing FP raster maps, and DBMI uses it for communication between
> the client and driver.

So we should probably thoroughly test libgis' uses of it in windows.

I don't think that this is necessary. AFAICT, stdio works okay on
files; it's only on pipes where it misbehaves.

Actually, libgis uses xdrmem_create() to create a memory stream; it
doesn't use xdrstdio at all. So ignore what I said about removing the
buffering being a performance hit for raster I/O.

What would be the best way to do this (i.e. in which
circumstances/modules is the xdr part of libgis called) ?

XDR is used for reading and writing FP maps.

> My preferred approach would be to change lib/db/dbmi_base to simply
> not use XDR (that isn't anywhere near as much work as it might
> sound).
>
> As the driver and client always run on the same system, it doesn't
> matter if the protocol is platform-dependent (I have no idea what
> Radim was thinking when he decided to use XDR for the DBMI
> communication).

I always had the same question, but (in my ignorance) I thought that it
might make a difference if the actual database was on a different
system.

No; the driver does the actual database I/O. XDR is only used for the
communication between the client and the driver.

[The vector library has its own "portable" I/O routines for reading
and writing vector files, which are entirely unrelated to XDR or
DBMI.]

If it doesn't, then let's drop xdr for DBMI (we do really need
it for libgis, I suppose ?).

The libgis raster I/O functions use it to ensure that the format of FP
rasters isn't platform-dependent (it uses its own encoding/decoding
routines for integers).

Having said that, if you only care about platforms which use IEEE-754
format for floating-point (which is every common architecture except
for VAX), the only thing that XDR actually does is to convert 32- and
64-bit quantities to big-endian.

The only thing that using XDR really gets us is the ability to have FP
maps which are portable between VAX and non-VAX systems.

> The dbmi_base library uses the following functions from XDR:
>
> xdr_char xdr_double xdr_float xdr_int xdr_short
>
> xdr_string
>
> xdrstdio_create
>
> The first five all simply read/write the specified value in a fixed
> (i.e. non-platform-dependent) format; i.e. convert to big-endian
> order, and convert FP values to IEEE format. For DBMI, we can just
> use the host's native format, so the first five all amount to calling
> read/write on the value.
>
> xdr_string is slightly more complex: read/write the length (including
> the terminating NUL) as an unsigned int, followed by the bytes.
>
> xdrstdio_create just sets everything up so that the read/write
> operations go through fread/fwrite (as opposed to e.g. xdrmem_create
> which sets up for reading/writing from memory).
>
> IOW, the actual implementation of an XDR replacement is trivial. So
> trivial that you would just inline most of it into the
> db__{send,recv}_* functions.
>
> It's changing the dbmi_base library to use it which will be most of
> the work. You would probably want separate put/get functions rather
> than the XDR mechanism of setting the "direction" when creating the
> XDR object and having a single function for both read and write.
>
> E.g. db__send_int() would change from:
>
> int db__send_int(int n) { XDR xdrs; int stat; stat = DB_OK;
> xdr_begin_send (&xdrs); if(!xdr_int (&xdrs, &n)) stat =
> DB_PROTOCOL_ERR; xdr_end_send (&xdrs); if (stat == DB_PROTOCOL_ERR)
> db_protocol_error(); return stat; }
>
> to something like:
>
> int db__send_int(int n) { int stat = DB_OK; if (!db__send(&n,
> sizeof(n))) stat = DB_PROTOCOL_ERR; if (stat == DB_PROTOCOL_ERR)
> db_protocol_error(); return stat; }
>
> with db__send() defined as:
>
> int db__send(void *buf, size_t size) { return writen(_send_fd, buf,
> size) == size; }
>
> [Actually, you would probably inline writen() here, as this is the
> only place it would be used.]

I will need a while understanding this, especially understanding which
files need replacement.

Currently we have the following files in dbmi_base (ls -1 xdr*):

xdr.h
xdr.c
xdrchar.c
xdrcolumn.c
xdrdatetime.c
xdrdouble.c
xdrfloat.c
xdrhandle.c
xdrindex.c
xdrint.c
xdrprocedure.c
xdrshort.c
xdrstring.c
xdrtable.c
xdrtoken.c
xdrvalue.c

of these only the following reference xdr.h ( grep -l xdr.h *.c)

xdr.c
xdrchar.c
xdrdouble.c
xdrfloat.c
xdrint.c
xdrprocedure.c
xdrshort.c
xdrstring.c

xdrchar.c, xdrdouble.c, xdrfloat.c, xdrint.c, xdrshort.c, xdrstring.c
contain the functions you explain how to change above. The only place
where they are called is in the macros.h in the same directory. I
imagine that xdr.c and xdrprocedure.c can just go to the bin.

xdrprocedure.c can stay. db__start_procedure_call() uses the DB_*
macros rather than XDR. db__recv_procnum() is almost the same as
db__recv_int(), except that it passes the success/failure code back to
the caller rather than processing it itself.

[When the client terminates the communication, the driver will
typically be waiting for db__recv_procnum() to return the next request
from the client, so in this specific case EOF needs to be treated as a
termination request rather than as an error. Getting EOF anywhere else
is an error.]

xdr.c would be where db__send() and db__recv() go in place of the
xdr_{begin,end}_{send,recv} functions..

So "all" we need to do is create new files containing the functions
called in macros.h according to the model you describe above.

I would be inclined to modify the existing files to use the new
db__{send,recv} functions in place of XDR functions. There's no harm
in leaving the "xdr" prefix in their names for now (once we move to
SVN, they can be renamed without losing the change history).

What about the other xdr* files which do not actually directly use any
xdr calls (xdrhandle.c, xdrindex.c, xdrtable.c, xdrtoken.c, xdrvalue.c).
Should we just leave them as they are ? Should we rename them to avoid
future confusion ?

Leave them alone.

--
Glynn Clements <glynn@gclements.plus.com>

mlennert · October 6, 2007, 8:41pm

On Fri, October 5, 2007 12:35, benjamin.ducke@ufg.uni-kiel.de wrote:

OK, how about putting some defs into xdr_stdio.c, so that
unbuffered i/o only gets compiled on Win32?

No need as we supply a special static version of libxdr with wingrass. So
whatever we do there does not affect other versions of GRASS.

For the time being, this would mean losing raster performance
on native win, but in exchange having full vector functionality.
Not a bad deal until xdr_stdio.c gets a full re-write, I would
think.

Actually, as Glynn points out, we do not even have a performance loss
since libgis does not use pipes. So the current wingrass version should
work without any problems.

However, I agree with Glynn that the ideal solution is to just get rid of
xdr for DBMI communication.

Moritz

Glynn_Clements1 · October 7, 2007, 9:04pm

Glynn Clements wrote:

My preferred approach would be to change lib/db/dbmi_base to simply
not use XDR (that isn't anywhere near as much work as it might sound).

I have attached a patch to do this. I haven't committed it yet in case
it breaks anything (and if it breaks anything, it may break a lot of
things).

Could someone familiar with DBMI and vectors test it out?

--
Glynn Clements <glynn@gclements.plus.com>

(attachments)

files.diff (19.7 KB)

mlennert · October 8, 2007, 7:38am

On 07/10/07 23:04, Glynn Clements wrote:

Glynn Clements wrote:

My preferred approach would be to change lib/db/dbmi_base to simply
not use XDR (that isn't anywhere near as much work as it might sound).

I have attached a patch to do this. I haven't committed it yet in case
it breaks anything (and if it breaks anything, it may break a lot of
things).

Could someone familiar with DBMI and vectors test it out?

Thanks a lot !

I will test as soon as I find the time, latest tonight.

Moritz

mlennert · October 8, 2007, 7:27pm

On Sun, October 7, 2007 23:04, Glynn Clements wrote:

Glynn Clements wrote:

My preferred approach would be to change lib/db/dbmi_base to simply
not use XDR (that isn't anywhere near as much work as it might sound).

I have attached a patch to do this. I haven't committed it yet in case
it breaks anything (and if it breaks anything, it may break a lot of
things).

Could someone familiar with DBMI and vectors test it out?

Index: lib/db/dbmi_base/xdr.c

RCS file: /grassrepository/grass6/lib/db/dbmi_base/xdr.c,v
retrieving revision 1.5
diff -u -r1.5 xdr.c
--- lib/db/dbmi_base/xdr.c 26 Apr 2007 16:44:44 -0000 1.5
+++ lib/db/dbmi_base/xdr.c 7 Oct 2007 20:50:35 -0000
@@ -15,47 +15,84 @@
*****************************************************************************/
#include "xdr.h"

+#ifdef __MINGW32__
+#define USE_STDIO 0
+#define USE_READN 1
+#else
+#define USE_STDIO 1
+#define USE_READN 0
+#endif
+
+#ifndef USE_STDIO
+#include <unistd.h>
+#endif
+
static FILE *_send, *_recv;

-void
-db__set_protocol_fds (FILE *send, FILE *recv)
-{
- _send = send;
- _recv = recv;
-}
+#if USE_READN

-int
-xdr_begin_send(XDR *xdrs)
+static ssize_t readn(int fd, void *buf, size_t count)
{
- xdrstdio_create (xdrs, _send, XDR_ENCODE);
+ ssize_t total = 0;
+
+ while (total < count)
+ {
+ ssize_t n = read(fd, (char *) buf + total, count - total)

the MINGW compiler complains about a missing ';' at the end here. gcc
under linux doesn't...don't know why.

+ ssize_t n = write(fd, (const char *) buf + total, count - total)

same here

Moritz

mlennert · October 8, 2007, 8:46pm

On Sun, October 7, 2007 23:04, Glynn Clements wrote:

Glynn Clements wrote:

My preferred approach would be to change lib/db/dbmi_base to simply
not use XDR (that isn't anywhere near as much work as it might sound).

I have attached a patch to do this. I haven't committed it yet in case
it breaks anything (and if it breaks anything, it may break a lot of
things).

Could someone familiar with DBMI and vectors test it out?

Rapid testing in Debian (v.out/in.ogr, v.univar, d.vect.thematic, g.copy)
with dbf, sqlite, postgres does not show any problems. Will test in
windows as soon as I solve the compile problems.

Moritz