[GRASS-dev] locking on a raster

Ivan Shmakov <ivan@theory.asu.ru> writes:
Glynn Clements <glynn@gclements.plus.com> writes:

>>> It seems that the reasonable behaviour would be to make a
>>> back-reference if possible, and issue a warning if not.

>> A warning isn't sufficent. If the GC relies upon back-references,
>> but you can't create one, the data could disappear while a process
>> is trying to use it.

> Still it's an improvement, since currently the data could just as
> well be overwritten. The transition to tmp/ + rename () alone
> doesn't help much, as a support file may be renamed while the other
> is being read.

  Thinking on it further, the change to the
  LOCATION/MAPSET/raster/RASTER/ scheme doesn't actually improve
  the situation, as the directory may be renamed while the other
  party reads the support files; e. g., reading the raster `foo':

    raster/foo/bar (old) is read;
    [raster/foo/ is replaced with a newer version;]
    raster/foo/baz (new) is read.

[...]

>> Essentially, if you can't ensure that the back-references can be
>> created, you can't design a GC which relies upon them.

> Well, yes. That's why I've suggested some solutions in my previous
> post. To simplify, they are:

> * hardlink the entire maps into a place where you can create
> back-references, e. g. the current mapset;

> Pro's: could be done with `g.copy' (changed); becomes possible as
> soon as both the ``in place'' and ``atomic replace'' issues have
> been dealt with, no need for an inventory scheme;

  Actually no, as long as the raster might be replaced while
  `g.copy' is being run. And since the other mapset is read-only,
  there's no way to prevent it.

  It thus seems that among the RASTER/ and the inventory scheme,
  only the latter provides good support for concurrency, and not
  the former.

[...]

Ivan Shmakov wrote:

>>> It seems that the reasonable behaviour would be to make a
>>> back-reference if possible, and issue a warning if not.

>> A warning isn't sufficent. If the GC relies upon back-references,
>> but you can't create one, the data could disappear while a process
>> is trying to use it.

> Still it's an improvement, since currently the data could just as
> well be overwritten. The transition to tmp/ + rename () alone
> doesn't help much, as a support file may be renamed while the other
> is being read.

  Thinking on it further, the change to the
  LOCATION/MAPSET/raster/RASTER/ scheme doesn't actually improve
  the situation, as the directory may be renamed while the other
  party reads the support files

That can be handled by locking, and it can probably be done without
significant changes to the API.

If modules are changed to always read support files before reading the
data, the library functions can acquire the lock on G_open_cell_old()
and release it upon the first call to G_get_raster_row().

For output maps, the situation is slightly more tricky, but it may be
possible to use a similar mechanism.

The module would write any support files after the last call to
G_put_raster_row() but before the G_close_cell() call. When opening
for write any element of a map which is open for write, the call would
open a new file in the output map's temporary directory. The
G_close_cell() call would lock the destination; upon acquiring the
lock, the entire directory would be replace with the new version, and
the lock released.

Also, locking doesn't have any permission issues; if you can read a
file, then you can acquire a lock on it.

>> Essentially, if you can't ensure that the back-references can be
>> created, you can't design a GC which relies upon them.

> Well, yes. That's why I've suggested some solutions in my previous
> post. To simplify, they are:

> * hardlink the entire maps into a place where you can create
> back-references, e. g. the current mapset;

> Pro's: could be done with `g.copy' (changed); becomes possible as
> soon as both the ``in place'' and ``atomic replace'' issues have
> been dealt with, no need for an inventory scheme;

  Actually no, as long as the raster might be replaced while
  `g.copy' is being run. And since the other mapset is read-only,
  there's no way to prevent it.

  It thus seems that among the RASTER/ and the inventory scheme,
  only the latter provides good support for concurrency, and not
  the former.

Even with no changes to the directory structure, you can eliminate the
race conditions with enough locking. A single directory just makes
locking simpler.

The inventory system can prevent inconsistencies without any locking.
However, there's still the potential for updates to be discarded if
you don't have locking. I.e. if two process attempt to update a module
concurrently, the update which completes first will be lost.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements <glynn@gclements.plus.com> writes:

>>>>> It seems that the reasonable behaviour would be to make a
>>>>> back-reference if possible, and issue a warning if not.

>>>> A warning isn't sufficent. If the GC relies upon back-references,
>>>> but you can't create one, the data could disappear while a process
>>>> is trying to use it.

>>> Still it's an improvement, since currently the data could just as
>>> well be overwritten. The transition to tmp/ + rename () alone
>>> doesn't help much, as a support file may be renamed while the other
>>> is being read.

>> Thinking on it further, the change to the
>> LOCATION/MAPSET/raster/RASTER/ scheme doesn't actually improve
>> the situation, as the directory may be renamed while the other
>> party reads the support files

> That can be handled by locking, and it can probably be done without
> significant changes to the API.

  Oh, I see that you mean fcntl () locks here. But weren't there
  issues with them on certain OS' (W32) and FS' (NFS, in
  particular with -o nolock)?

[...]

> The module would write any support files after the last call to
> G_put_raster_row() but before the G_close_cell() call. When opening
> for write any element of a map which is open for write, the call would
> open a new file in the output map's temporary directory. The
> G_close_cell() call would lock the destination; upon acquiring the
> lock, the entire directory would be replace with the new version, and
> the lock released.

> Also, locking doesn't have any permission issues; if you can read a
> file, then you can acquire a lock on it.

  So, a read lock could be used to create a ``reference'' for the
  matters of GC.

>>>> Essentially, if you can't ensure that the back-references can be
>>>> created, you can't design a GC which relies upon them.

>>> Well, yes. That's why I've suggested some solutions in my previous
>>> post. To simplify, they are:

>>> * hardlink the entire maps into a place where you can create
>>> back-references, e. g. the current mapset;

>>> Pro's: could be done with `g.copy' (changed); becomes possible as
>>> soon as both the ``in place'' and ``atomic replace'' issues have
>>> been dealt with, no need for an inventory scheme;

>> Actually no, as long as the raster might be replaced while `g.copy'
>> is being run. And since the other mapset is read-only, there's no
>> way to prevent it.

>> It thus seems that among the RASTER/ and the inventory scheme, only
>> the latter provides good support for concurrency, and not the
>> former.

> Even with no changes to the directory structure, you can eliminate
> the race conditions with enough locking. A single directory just
> makes locking simpler.

> The inventory system can prevent inconsistencies without any locking.
> However, there's still the potential for updates to be discarded if
> you don't have locking. I.e. if two process attempt to update a
> module concurrently, the update which completes first will be lost.

  Agreed. Though I'm not sure that the only reasonable way to
  prevent this is to make any concurrent write access to fail.

  FWIW, the appropriate locking to a writable mapset could be
  implemented without fcntl ().

Ivan Shmakov wrote:

> That can be handled by locking, and it can probably be done without
> significant changes to the API.

  Oh, I see that you mean fcntl () locks here. But weren't there
  issues with them on certain OS' (W32) and FS' (NFS, in
  particular with -o nolock)?

Win32 has locking primitives.

And NFS loses for more reasons than just locking, e.g. rename() isn't
guaranteed to be atomic. It is for good reason that NFS is widely
believed to stand for "Not a File System" (as it violates a
significant portion of POSIX).

If you use NFS, either you ensure that locking works, or you
essentially give up the ability to safely perform concurrent file
access; this problem isn't limited to GRASS.

> The inventory system can prevent inconsistencies without any locking.
> However, there's still the potential for updates to be discarded if
> you don't have locking. I.e. if two process attempt to update a
> module concurrently, the update which completes first will be lost.

  Agreed. Though I'm not sure that the only reasonable way to
  prevent this is to make any concurrent write access to fail.

I'm not suggesting having concurrent access fail, just waiting until
the operation is safe (e.g. F_SETLKW rather than F_SETLK).

If modules are structured correctly (i.e. all support files are read
in a burst before starting to read the data, or written in a burst
after writing the data), both the probability of contention and the
resulting delay will be minimal.

In the case of multiple concurrent writes, you're bound to lose one of
the two writes whichever mechanism is used. And if you want to block
reads while a write is in progress (i.e. you specifically want the new
data rather than just consistent data), you have to wait until the
write operation completes.

I'm primarily concerned with consistency, i.e. a read operation won't
see a mix of data from multiple versions of a map, and it won't find
that the map doesn't exist if it tries to read it at the moment that
it's being replaced.

IOW, ensuring individual read and write operations behave as an atomic
transaction. If you want a complete read-process-write operation to
behave as an atomic transaction, delays (or "busy" errors) cannot be
eliminated (and there's also the potential for deadlock).

  FWIW, the appropriate locking to a writable mapset could be
  implemented without fcntl ().

There are other ways to do it (e.g. lock files), but the OS' locking
primitives are the simplest to use (no chance of stale lock files, no
hacks to deal with NFS' non-atomicity, built-in deadlock detection,
etc).

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements <glynn@gclements.plus.com> writes:

>>> That can be handled by locking, and it can probably be done without
>>> significant changes to the API.

>> Oh, I see that you mean fcntl () locks here. But weren't there
>> issues with them on certain OS' (W32) and FS' (NFS, in particular
>> with -o nolock)?

> Win32 has locking primitives.

  ACK.

> And NFS loses for more reasons than just locking, e.g. rename() isn't
> guaranteed to be atomic. It is for good reason that NFS is widely
> believed to stand for "Not a File System" (as it violates a
> significant portion of POSIX).

> If you use NFS, either you ensure that locking works, or you
> essentially give up the ability to safely perform concurrent file
> access; this problem isn't limited to GRASS.

  As I've noted before, the inventory scheme allows for lock-less
  read-only access to be performed over NFS in presence of
  concurring writes, but without GC.

  Unfortunately, enabling locking protocol seems to impact the
  reliability. In particular, I wonder, could there be a way for
  the server to know whether the client that acquired the lock is
  still alive? Seemingly this one reduces to the two generals'
  problem.

>>> The inventory system can prevent inconsistencies without any
>>> locking. However, there's still the potential for updates to be
>>> discarded if you don't have locking. I.e. if two process attempt to
>>> update a module concurrently, the update which completes first will
>>> be lost.

>> Agreed. Though I'm not sure that the only reasonable way to prevent
>> this is to make any concurrent write access to fail.

> I'm not suggesting having concurrent access fail, just waiting until
> the operation is safe (e.g. F_SETLKW rather than F_SETLK).

> If modules are structured correctly (i.e. all support files are read
> in a burst before starting to read the data, or written in a burst
> after writing the data), both the probability of contention and the
> resulting delay will be minimal.

> In the case of multiple concurrent writes, you're bound to lose one
> of the two writes whichever mechanism is used.

  I don't understand how the schemes being discussed are different
  with respect to this point? The only way not to lose the data
  just computed will be to make the attempt to open the target
  GRASS object for write to fail in presence of a concurrent
  process.

  OTOH, a mechanism could be designed that would allow for
  arbitrary code to be run when certain operations are requested
  on the database (creation, update and removal of the maps, for
  example.) With such a mechanism it may become possible to alter
  the name of the GRASS object being written instead of
  overwriting the one appeared in the middle of processing and
  bearing the same name.

  Though the solution of such a hard to imagine problem alone will
  hardly justify the effort.

[...]

> IOW, ensuring individual read and write operations behave as an
> atomic transaction. If you want a complete read-process-write
> operation to behave as an atomic transaction, delays (or "busy"
> errors) cannot be eliminated (and there's also the potential for
> deadlock).

  Forking a ``parallel existence'' at the beginning of the
  processing apparently eliminates the possibility of deadlocks,
  though there certainly will be some problems with merging it
  back. E. g., in PostgreSQL, a transaction is discarded with an
  error if it conflicts with changes already merged from other
  finished transactions by the time of merge.

>> FWIW, the appropriate locking to a writable mapset could be
>> implemented without fcntl ().

> There are other ways to do it (e.g. lock files), but the OS' locking
> primitives are the simplest to use (no chance of stale lock files, no
> hacks to deal with NFS' non-atomicity, built-in deadlock detection,
> etc).

  Yes.

  From the discussion, I could conclude that the inventory scheme
  doesn't necessary imply any better concurrency (other than by
  allowing certain ways of lock-less access), nor does it prevent
  the concurrency to be improved (by means of the OS locking.)

  Still, I believe the inventory scheme to be more flexible and to
  allow for both cleaner and extensible API, and a cleaner
  implementation.

  Hopefully, I'd have the time to experiment with it.

Ivan Shmakov wrote:

> If you use NFS, either you ensure that locking works, or you
> essentially give up the ability to safely perform concurrent file
> access; this problem isn't limited to GRASS.

  As I've noted before, the inventory scheme allows for lock-less
  read-only access to be performed over NFS in presence of
  concurring writes, but without GC.

  Unfortunately, enabling locking protocol seems to impact the
  reliability. In particular, I wonder, could there be a way for
  the server to know whether the client that acquired the lock is
  still alive? Seemingly this one reduces to the two generals'
  problem.

By "client", do you mean the host or the process? When the process
terminates, any files which it has open will be closed, and any locks
will be released. This is handled by the kernel.

If a host which is an NFS client dies or loses network connectivity,
the server won't know until a timeout occurs. That should be a rare
occurrence, though.

>>> The inventory system can prevent inconsistencies without any
>>> locking. However, there's still the potential for updates to be
>>> discarded if you don't have locking. I.e. if two process attempt to
>>> update a module concurrently, the update which completes first will
>>> be lost.

>> Agreed. Though I'm not sure that the only reasonable way to prevent
>> this is to make any concurrent write access to fail.

> I'm not suggesting having concurrent access fail, just waiting until
> the operation is safe (e.g. F_SETLKW rather than F_SETLK).

> If modules are structured correctly (i.e. all support files are read
> in a burst before starting to read the data, or written in a burst
> after writing the data), both the probability of contention and the
> resulting delay will be minimal.

> In the case of multiple concurrent writes, you're bound to lose one
> of the two writes whichever mechanism is used.

  I don't understand how the schemes being discussed are different
  with respect to this point?

They aren't.

The only way that this issue can be prevented is if each module is
treated as a single transaction. In practice, it shouldn't be that
much of a problem; if two processes are trying to replace the same map
with competing versions, that's really an organisational issue, not a
technical issue.

      The only way not to lose the data
  just computed will be to make the attempt to open the target
  GRASS object for write to fail in presence of a concurrent
  process.

Yes; IOW, the transaction covers the entire process, not just the
final step where the existing map is replaced with an updated version.

Whether or not this is even desirable is open to debate. And if you do
want to prevent this, do you want to wait until the first process
completes (which could be a long time if the process is e.g. r.flow),
or just make the second process fail with a "busy" error.

I'm only interested in making individual read and write operations
into atomic transactions, so that a reader which opens a map while it
is being replaced won't see an intermediate state.

>> FWIW, the appropriate locking to a writable mapset could be
>> implemented without fcntl ().

> There are other ways to do it (e.g. lock files), but the OS' locking
> primitives are the simplest to use (no chance of stale lock files, no
> hacks to deal with NFS' non-atomicity, built-in deadlock detection,
> etc).

  Yes.

  From the discussion, I could conclude that the inventory scheme
  doesn't necessary imply any better concurrency (other than by
  allowing certain ways of lock-less access), nor does it prevent
  the concurrency to be improved (by means of the OS locking.)

  Still, I believe the inventory scheme to be more flexible and to
  allow for both cleaner and extensible API, and a cleaner
  implementation.

The main disadvantage is it makes it harder for the user to analyse or
modify the database manually (which is sometimes necessary).

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements <glynn@gclements.plus.com> writes:

[...]

>>> In the case of multiple concurrent writes, you're bound to lose one
>>> of the two writes whichever mechanism is used.

>> I don't understand how the schemes being discussed are different
>> with respect to this point?

> They aren't.

> The only way that this issue can be prevented is if each module is
> treated as a single transaction. In practice, it shouldn't be that
> much of a problem; if two processes are trying to replace the same
> map with competing versions, that's really an organisational issue,
> not a technical issue.

  Agreed.

[...]

>> From the discussion, I could conclude that the inventory scheme
>> doesn't necessary imply any better concurrency (other than by
>> allowing certain ways of lock-less access), nor does it prevent the
>> concurrency to be improved (by means of the OS locking.)

>> Still, I believe the inventory scheme to be more flexible and to
>> allow for both cleaner and extensible API, and a cleaner
>> implementation.

  ... And it eliminates certain problems with database replication
  as well.

> The main disadvantage is it makes it harder for the user to analyse
> or modify the database manually (which is sometimes necessary).

  The use of advisory locks seem to make manual modifications
  harder as well. May I suggest g.replacefile and (or) g.editfile
  to complement g.findfile, ensuring proper locking is done?

  BTW, it would be nice to have a way to obtain a list of all the
  files comprising the given raster or rasters.