[GRASS-dev] native WinGRASS and attribute data deadlock, next try

Benjamin wrote:

>> So, how are we going to go ahead?
>

Glynn answered:

> Figure out how to debug the processes. If you can't get gdb to work, I
> can only suggest logging every significant event at the lowest level,
> i.e. log every read/write operation: the arguments, the return code,
> and the complete data (i.e. the buffer contents before read and after
> write). This is all done in the RPC/XDR library, in xdr_stdio.c. It
> will probably help to also log the beginning/end of each procedure
> call (i.e. lib/db/dbmi_base/xdrprocedure.c).

I would really like this to be solved, so I am willing to try to find some time to do the logging effort. Benjamin, have you advanced on this ?

I will need some time understanding the xdr logic and code, but hope to be able to help with this.

Moritz

On Mon, 3 Sep 2007, Moritz Lennert wrote:

Benjamin wrote:

So, how are we going to go ahead?

Glynn answered:

Figure out how to debug the processes. If you can't get gdb to work, I
can only suggest logging every significant event at the lowest level,
i.e. log every read/write operation: the arguments, the return code,
and the complete data (i.e. the buffer contents before read and after
write). This is all done in the RPC/XDR library, in xdr_stdio.c. It
will probably help to also log the beginning/end of each procedure
call (i.e. lib/db/dbmi_base/xdrprocedure.c).

I would really like this to be solved, so I am willing to try to find some time to do the logging effort. Benjamin, have you advanced on this ?

I will need some time understanding the xdr logic and code, but hope to be able to help with this.

I have a feeling there is more than one C implementation of the XDR standard available on the 'net - perhaps trying an alternative one might be something to try, in case there's a bug in the version we're using. We already had undetectable (for a long time) problems caused by the current version needing to be compiled statically.

I'd like to try this too if I had time but am unfortunately really busy right now.

Paul

On Mon, September 3, 2007 17:38, Moritz Lennert wrote:

Benjamin wrote:

>> So, how are we going to go ahead?
>

Glynn answered:

> Figure out how to debug the processes. If you can't get gdb to work, I
> can only suggest logging every significant event at the lowest level,
> i.e. log every read/write operation: the arguments, the return code,
> and the complete data (i.e. the buffer contents before read and after
> write). This is all done in the RPC/XDR library, in xdr_stdio.c. It
> will probably help to also log the beginning/end of each procedure
> call (i.e. lib/db/dbmi_base/xdrprocedure.c).

I would really like this to be solved, so I am willing to try to find
some time to do the logging effort. Benjamin, have you advanced on this ?

I will need some time understanding the xdr logic and code, but hope to
be able to help with this.

Ok, very first simple debugging effort seems to confirm timin issue.
Here's what I did:

diff -u dbmi_base dbmi_base_debug/
Common subdirectories: dbmi_base/CVS and dbmi_base_debug/CVS
Only in dbmi_base_debug/: OBJ.i686-pc-mingw32
diff -u dbmi_base/xdrint.c dbmi_base_debug/xdrint.c
--- dbmi_base/xdrint.c Thu Oct 5 06:13:28 2006
+++ dbmi_base_debug/xdrint.c Mon Sep 3 20:17:35 2007
@@ -10,10 +10,12 @@

     stat = DB_OK;

+ G_debug(1, "xdrint.c: Begin send");
     xdr_begin_send (&xdrs);
     if(!xdr_int (&xdrs, &n))
        stat = DB_PROTOCOL_ERR;
     xdr_end_send (&xdrs);
+ G_debug(1, "xdrint.c: End send");

     if (stat == DB_PROTOCOL_ERR)
        db_protocol_error();
diff -u dbmi_base/xdrprocedure.c dbmi_base_debug/xdrprocedure.c
--- dbmi_base/xdrprocedure.c Thu Oct 5 06:13:28 2006
+++ dbmi_base_debug/xdrprocedure.c Mon Sep 3 20:17:35 2007
@@ -40,10 +40,12 @@

     stat = DB_OK;

+ G_debug(1, "xdrprocedure.c: Begin receive");
     xdr_begin_recv (&xdrs);
     if(!xdr_int (&xdrs, n))
        stat = DB_EOF;
     xdr_end_recv (&xdrs);
+ G_debug(1, "xdrprocedure.c: End receive");

     return stat;
}

and now, after setting 'g.gisenv set=DEBUG=1', I cannot reproduce the
deadlock anymore, using Benjamin's test data, except when I do other
things on the machine (open other windows, type an email, etc). When I
just run the command and stare at the screen I get no deadlock. With
DEBUG=0 I get the same irregular deadlock.

I'll dig into xdrstdio.c now.

Moritz

Moritz Lennert wrote:

> >> So, how are we going to go ahead?
> >
>
> Glynn answered:
>
> > Figure out how to debug the processes. If you can't get gdb to work, I
> > can only suggest logging every significant event at the lowest level,
> > i.e. log every read/write operation: the arguments, the return code,
> > and the complete data (i.e. the buffer contents before read and after
> > write). This is all done in the RPC/XDR library, in xdr_stdio.c. It
> > will probably help to also log the beginning/end of each procedure
> > call (i.e. lib/db/dbmi_base/xdrprocedure.c).
>
> I would really like this to be solved, so I am willing to try to find
> some time to do the logging effort. Benjamin, have you advanced on this ?
>
> I will need some time understanding the xdr logic and code, but hope to
> be able to help with this.
>

Ok, very first simple debugging effort seems to confirm timin issue.
Here's what I did:

diff -u dbmi_base dbmi_base_debug/
Common subdirectories: dbmi_base/CVS and dbmi_base_debug/CVS
Only in dbmi_base_debug/: OBJ.i686-pc-mingw32
diff -u dbmi_base/xdrint.c dbmi_base_debug/xdrint.c
--- dbmi_base/xdrint.c Thu Oct 5 06:13:28 2006
+++ dbmi_base_debug/xdrint.c Mon Sep 3 20:17:35 2007
@@ -10,10 +10,12 @@

     stat = DB_OK;

+ G_debug(1, "xdrint.c: Begin send");
     xdr_begin_send (&xdrs);
     if(!xdr_int (&xdrs, &n))
        stat = DB_PROTOCOL_ERR;
     xdr_end_send (&xdrs);
+ G_debug(1, "xdrint.c: End send");

     if (stat == DB_PROTOCOL_ERR)
        db_protocol_error();
diff -u dbmi_base/xdrprocedure.c dbmi_base_debug/xdrprocedure.c
--- dbmi_base/xdrprocedure.c Thu Oct 5 06:13:28 2006
+++ dbmi_base_debug/xdrprocedure.c Mon Sep 3 20:17:35 2007
@@ -40,10 +40,12 @@

     stat = DB_OK;

+ G_debug(1, "xdrprocedure.c: Begin receive");
     xdr_begin_recv (&xdrs);
     if(!xdr_int (&xdrs, n))
        stat = DB_EOF;
     xdr_end_recv (&xdrs);
+ G_debug(1, "xdrprocedure.c: End receive");

     return stat;
}

and now, after setting 'g.gisenv set=DEBUG=1', I cannot reproduce the
deadlock anymore, using Benjamin's test data, except when I do other
things on the machine (open other windows, type an email, etc). When I
just run the command and stare at the screen I get no deadlock. With
DEBUG=0 I get the same irregular deadlock.

I'll dig into xdrstdio.c now.

If it is a timing issue, then you'll need to log the fread/fwrite
return values, along with the actual data, for both ends (client and
server).

Then, find out where the two diverge (i.e. what is received isn't what
was sent).

However, I have a strong suspicion that the eventual answer will be
"MSVCRT's stdio implementation sucks". I already know this to be true;
what I don't know is whether it's the cause of the DBMI problems and,
if so, how much stuff we will need to re-write.

--
Glynn Clements <glynn@gclements.plus.com>