[Geoserver-devel] Jdbc resource store

Hi Kevin and Gabriel,

I have a few questions regarding the jdbc resource store and the resource store API in general that I'd like to hear your opinion on. Including the mailing list because others might have thoughts on this.

1) The ResourceStore does not appear to provide anything for creating directories. At the moment GeoServerResourceLoader.getOrCreateDirectory is used for that. How can we change all modules to use the generic Resource Store rather than the file based one directly without such a method in the API?

2) In your implementation, the JDBCResource is using the OID primary key as a handle. At first I thought that made sense but I think there are some issues which respect to how the API assumes resources to work. Imagine that somebody else (possibly a different geoserver instance, as this is meant for clustering) overwrites the file or deletes it and then recreates one at the same location. Your resource holding the OID handle will be out of date. It might be able to notice the delete, but not the new content. But if you for example look at the rename test theory, it expects that both resources have changed to reflect the new reality. Many other parts of the API and the tests suggest that the ResourceStore is meant to support concurrent access.

The way that File System resources work is that they are simply wrapping around a File which is basically just wrapping around String which holds the path, without any state stored. Each method will query the file system using that path. My idea is that the only way for the JDBCStore to have the same expected behaviour is to also use the full path as the handle and in each method query the database using the full path. Does that make sense? Do you agree with my conclusion?

Kind Regards
Niels

On Tue, Jul 21, 2015 at 6:18 PM Niels Charlier <niels@anonymised.com> wrote:

Hi Kevin and Gabriel,

I have a few questions regarding the jdbc resource store and the
resource store API in general that I’d like to hear your opinion on.
Including the mailing list because others might have thoughts on this.

  1. The ResourceStore does not appear to provide anything for creating
    directories. At the moment GeoServerResourceLoader.getOrCreateDirectory
    is used for that. How can we change all modules to use the generic
    Resource Store rather than the file based one directly without such a
    method in the API?

Directories do not exist until needed, just like resources are not created until needed. We worked hard to simplify all the common checks including that one :slight_smile: So for a file system resource store if you create a resource using resource.out() the file (and the directory it belongs to) will be created as needed.

Do you have an example that needs a directory that we can port together? I tried to include before and after examples in the proposal.

That makes sense. Have a look at anything that calls GeoServerResourceLoader.getOrCreateDirectory Guess they can be modified to work with Resources instead of the file system Cheers Niels

···

On 23-07-15 07:05, Jody Garnett wrote:

Directories do not exist until needed, just like resources are not created until needed. We worked hard to simplify all the common checks including that one :slight_smile: So for a file system resource store if you create a resource using resource.out() the file (and the directory it belongs to) will be created as needed.

Do you have an example that needs a directory that we can port together? I tried to include before and after examples in the proposal.

By the way, if that is the case, that yet another reason to use paths rather than IDS as handles in the JDBCResourceStore. COnsidering that a resource can be initialized that contains multiple directories that do not even exist yet. Cheers Niels

···

On 23-07-15 10:16, Niels Charlier wrote:

That makes sense. Have a look at anything that calls GeoServerResourceLoader.getOrCreateDirectory Guess they can be modified to work with Resources instead of the file system

On 23-07-15 07:05, Jody Garnett wrote:

Directories do not exist until needed, just like resources are not created until needed. We worked hard to simplify all the common checks including that one :slight_smile: So for a file system resource store if you create a resource using resource.out() the file (and the directory it belongs to) will be created as needed.

Do you have an example that needs a directory that we can port together? I tried to include before and after examples in the proposal.

Have a look at the examples I linked to. We need a second QA pass through the codebase to remove references to files (and deprecate methods like getOrCreateDirectory). For the first pass I just implemented them all in terms of Resource - but any code that asks for a file will get one (unpacking the blob from the resource store if needed).

We can wait to hear from kevin, but I am not sure he is too concerned about how JDBCResourceStore takes shape. We went over a couple different designs (including placeholder entries for directories, so that resources could link to their “parent”).

What is you idea with Paths? Use a like filter to shortlist directory contents for Resource.list() ?

···

On 23 July 2015 at 01:16, Niels Charlier <niels@anonymised.com> wrote:

On 23-07-15 07:05, Jody Garnett wrote:

Directories do not exist until needed, just like resources are not created until needed. We worked hard to simplify all the common checks including that one :slight_smile: So for a file system resource store if you create a resource using resource.out() the file (and the directory it belongs to) will be created as needed.

Do you have an example that needs a directory that we can port together? I tried to include before and after examples in the proposal.

That makes sense. Have a look at anything that calls GeoServerResourceLoader.getOrCreateDirectory
Guess they can be modified to work with Resources instead of the file system

Cheers
Niels


Jody Garnett

Hi Jody,

On 23-07-15 17:53, Jody Garnett wrote:

Have a look at the examples I linked to. We need a second QA pass through the codebase to remove references to files (and deprecate methods like getOrCreateDirectory). For the first pass I just implemented them all in terms of Resource - but any code that asks for a file will get one (unpacking the blob from the resource store if needed).

I'm currently trying to implement that changes to the cached files are noticed by a file-watcher and then written to the database, it's technically possible but that seems quite dodgy and error prone to me. It is fine as a fall-back mechanism for backwards compatibility and it allows us to gradually change all the modules rather than all at once, but I would strongly discourage it and deprecate those methods in the API.
  Right now we actually have lots of code that just asks one time for one directory in the datadir and then uses the file system from there onwards....

Either way, I still find it strange in the API that we can have a resource that represent a directory, but no way to create such a thing, unless by creating a file inside it. This was my concern, and I haven't been 100% convinced it makes sense.

You can do:

    Resource fileRes = store.get("/some/path/to/a/file");
    fileRes.out() //creates the file and the directories "some", "path", "to", and "a" if necessary

Okay, that is convenient. But what about this:

     Resource dirRes = fileRes.getParent();
    Type type = dirRes.getType(); //returns dirRes.getDirectory()

Cool. But imagine now doing this:

    Resource dirRes = store.get("/some/path/to/a/dir");
    //there is no way to simply create this path as a directory in the store!
    //note the dirRes.dir() always creates a directory on the _file system_ (caching it if necessary), a method we wish to deprecate!

Yes, of course, technically we could modify all the code to create configuration files directly by their full path, and ignore directories altogether. But then why is there such a thing as a Resource of type directory?
For the sake of convenience you could want to work with a directory resource to query and modify its children. For example :

class MyClass {

     Resource myDir;

     void myMethod() {
  ... //I need to do something depending on all the files inside our directory
  List<Resource> children = myDir.list() ...
         ...
     }

     MyClass() {
         //initialize our config dir
         myDir = store.get("/my_config_dir");

         if (myDir.getType() == Type.UNDEFINED) {
    //what now???
    //there is no way to create this directory unless we create a file inside it. but what if we don't need to yet at this stage?
    //create a dummy file == ugly
  }

     }

}

See how it can be inconvenient that there is no explicit directory creating mechanism?

We can wait to hear from kevin, but I am not sure he is too concerned about how JDBCResourceStore takes shape. We went over a couple different designs (including placeholder entries for directories, so that resources could link to their "parent").

That seems to be the design indeed.

What is you idea with Paths? Use a like filter to shortlist directory contents for Resource.list() ?

The design of the database does not need to be changed necessarily, but definitely the design of the code. The resource object needs to hold the path, not the ID. Otherwise, it will not be possible to hold use a resource object for longer than a single method, and even then it is not concurrency safe.
However, the efficiency of the database design might indeed be questioned if we change the design of the code. In this design each method would need find the record by recursively looping through the path to find the resource. Is that so different from how files query the file system though? I'm not sure. Advise regarding this question is welcome.

Regards
Niels

On 23-07-15 17:53, Jody Garnett wrote:

Have a look at the examples I linked to. We need a second QA pass through

the codebase to remove references to files (and deprecate methods like
getOrCreateDirectory). For the first pass I just implemented them all in
terms of Resource - but any code that asks for a file will get one
(unpacking the blob from the resource store if needed).

I'm currently trying to implement that changes to the cached files are
noticed by a file-watcher and then written to the database, it's
technically possible but that seems quite dodgy and error prone to me. It
is fine as a fall-back mechanism for backwards compatibility and it allows
us to gradually change all the modules rather than all at once, but I would
strongly discourage it and deprecate those methods in the API.

I implemented replacement watchers as part of the migration? Are they not
working for you ...

Right now we actually have lots of code that just asks one time for one
directory in the datadir and then uses the file system from there
onwards....

And that code should still work, the directory will get unpacked - but it
is not any kind of fun.

Either way, I still find it strange in the API that we can have a resource
that represent a directory, but no way to create such a thing, unless by
creating a file inside it. This was my concern, and I haven't been 100%
convinced it makes sense.

Let me know how you go, you can still list the contents and so on ...

You can do:

   Resource fileRes = store.get("/some/path/to/a/file");
   fileRes.out() //creates the file and the directories "some", "path",
"to", and "a" if necessary

Okay, that is convenient. But what about this:

   Resource dirRes = fileRes.getParent();
   Type type = dirRes.getType(); //returns dirRes.getDirectory()

Cool. But imagine now doing this:

   Resource dirRes = store.get("/some/path/to/a/dir");
   //there is no way to simply create this path as a directory in the
store!
   //note the dirRes.dir() always creates a directory on the _file system_
(caching it if necessary), a method we wish to deprecate!

You statement is a bit circular. Yes you can use dir() to create the folder
on the file system if needed (just like you can use file() to create a
file). Is there anything else you need that is not provided?

Like who cares if the directory is created or not, as long as it behaves
the same... It is up to your store implementation if you want an internal
representation of a directory or not. I think you can create it in a lazy
fashion as needed while respecting the api ...

Is directory creation or not just an internal detail?

Yes, of course, technically we could modify all the code to create

configuration files directly by their full path, and ignore directories
altogether. But then why is there such a thing as a Resource of type
directory?

To list contents, rename, etc...

For the sake of convenience you could want to work with a directory
resource to query and modify its children. For example :

class MyClass {

    Resource myDir;

    void myMethod() {
        ... //I need to do something depending on all the files inside our
directory
        List<Resource> children = myDir.list() ...
        ...
    }

    MyClass() {
        //initialize our config dir
        myDir = store.get("/my_config_dir");

        if (myDir.getType() == Type.UNDEFINED) {
                //what now???
                //there is no way to create this directory unless we
create a file inside it. but what if we don't need to yet at this stage?
                //create a dummy file == ugly
        }

    }
}

Okay I see your concern. If you list() an undefined resource you get an
empty list .. can you use a UNDEFINED resource and trust it to behave as a
directory for your code?

    Resource myDir;

    void myMethod() {
        ... //I need to do something depending on all the files inside our
directory
        List<Resource> children = myDir.list() ...
        ...
    }

    init() {
        // initialize our config dir
        myDir = store.get("/my_config_dir");
        if (myDir.getType() == Type.UNDEFINED) {
                // empty configuration, directory will be created during
first save()
        }
        else {
                // configuration exists, work with myDir.list() to access
contents...
        }
    }
}

This would help if you can hunt down an example from the codebase :slight_smile:
Especially if you want to replace FileWatch ...

See how it can be inconvenient that there is no explicit directory creating

mechanism?

We can wait to hear from kevin, but I am not sure he is too concerned

about how JDBCResourceStore takes shape. We went over a couple different
designs (including placeholder entries for directories, so that resources
could link to their "parent").

That seems to be the design indeed.

What is you idea with Paths? Use a like filter to shortlist directory

contents for Resource.list() ?

The design of the database does not need to be changed necessarily, but

definitely the design of the code. The resource object needs to hold the
path, not the ID. Otherwise, it will not be possible to hold use a resource
object for longer than a single method, and even then it is not concurrency
safe.
However, the efficiency of the database design might indeed be questioned
if we change the design of the code. In this design each method would need
find the record by recursively looping through the path to find the
resource. Is that so different from how files query the file system though?
I'm not sure. Advise regarding this question is welcome.

I am not close enough to the code to know specifically what you are talking
about.

On 23-07-15 18:45, Jody Garnett wrote:

I implemented replacement watchers as part of the migration? Are they not working for you ...
And that code should still work, the directory will get unpacked - but it is not any kind of fun.

If the changes are picked up and written to the db properly..

I am having some issues with watching actually. Can we talk about this on skype or IRC ? Are you online today?

Either way, you have to admit it is not the ideal way of doing this, writing data to the file system and secretly picking this up in another thread and writing it to the database. It's also very hard to test, at this moment the resourcestore test assume that changes happen immediately. I found that I had to put in a delay of 20 seconds (!) for the test to work on the jdbcresourcestore, but there is no guarantee that this will work always and everywhere within that time frame. We cannot possibly know when the other thread has finished updating the database, can we? How would you suggest testing it?

You statement is a bit circular. Yes you can use dir() to create the folder on the file system if needed (just like you can use file() to create a file). Is there anything else you need that is not provided?

Like who cares if the directory is created or not, as long as it behaves the same... It is up to your store implementation if you want an internal representation of a directory or not. I think you can create it in a lazy fashion as needed while respecting the api ...

Is directory creation or not just an internal detail?

Do we agree on this: the file() and dir() methods are only for backwards compatibility and should be discouraged (deprecated in my opinion). People should be use in() and out() instead, right? If you want to really use the resource system, and NOT the file system, there is no way of making sure a directory exists in the resource store (DB).

To list contents, rename, etc...

Indeed, but we can only do all of these things if the directory already exists. And we cannot easily force it to exist if we we need it to exist (see my example code). Basically, it is not possible to create an empty directory unless you create a dummy file and then delete it.

Okay I see your concern. If you list() an undefined resource you get an empty list .. can you use a UNDEFINED resource and trust it to behave as a directory for your code?

I thought list() would throw an exception, but it actually returns null. I guess you have a point there, just leaving it as undefined would work. It does require us to do a null check, but okay. Some operation like rename might also require us to check if it actually exists, while if we knew for sure it did we didn't have to worry. It seems like it would add more ifs to the code, often needing to verify whether it is actually a directory yet or not.

Also , if the existence of directories is to an implementation detail hidden from the users, why is it possible to delete (but not to create an empty) directory?

This would help if you can hunt down an example from the codebase :slight_smile: Especially if you want to replace FileWatch ...

Anything that calls findOrCreateDirectory().

Not sure what you mean with replacing FileWatch ?

I am not close enough to the code to know specifically what you are talking about.

I probably didn't explain it properly, but it's not really necessary to know the code. Basically, currently the jdbcresource holds a OID of the resource that is uses to query the database. But if something deletes and replaces the resource located at that same path with new data, a resource object still holding the old OID will not be able to know this. While you would expect it to, since the filesystemresource works this way.

Regards
Niels

Comments inline.

···

On 24 July 2015 at 07:30, Niels Charlier <niels@anonymised.com…> wrote:

On 23-07-15 18:45, Jody Garnett wrote:

I implemented replacement watchers as part of the migration? Are they not working for you …
And that code should still work, the directory will get unpacked - but it is not any kind of fun.

If the changes are picked up and written to the db properly…

I think it was a one way migration disk to JDBCConfig. I was thinking specifically of the FileWatcher class which was changed to accept a Resource. It should be able to listen to a resource changing and notify code if required. The file based implantation checks file timestamps in the background, although Java 7 has event notification for this.

Not sure how you want to handle this for JDBCConfig?

Still we may be thinking in a different direction. The initial import from data directory to JDBCConfig is a one time journey. After that the idea is to manage via REST API (yes ResourceStore still needs a REST API).

It sounds like you are considering editing files on disk, and then reimporting them as they change? That is a very tricky problem since the data directory would then be a local cache and and place for editing.

I am having some issues with watching actually. Can we talk about this on skype or IRC ? Are you online today?

I am online for a bit, will join IRC now and see if you are there.

Either way, you have to admit it is not the ideal way of doing this, writing data to the file system and secretly picking this up in another thread and writing it to the database. It’s also very hard to test, at this moment the resourcestore test assume that changes happen immediately. I found that I had to put in a delay of 20 seconds (!) for the test to work on the jdbcresourcestore, but there is no guarantee that this will work always and everywhere within that time frame. We cannot possibly know when the other thread has finished updating the database, can we? How would you suggest testing it?

See above, import should be a one time trip. Edit via REST API where you can tell when change is written to database (and issue change notification for other nodes in the cluster).

You statement is a bit circular. Yes you can use dir() to create the folder on the file system if needed (just like you can use file() to create a file). Is there anything else you need that is not provided?

Like who cares if the directory is created or not, as long as it behaves the same… It is up to your store implementation if you want an internal representation of a directory or not. I think you can create it in a lazy fashion as needed while respecting the api …

Is directory creation or not just an internal detail?

Do we agree on this: the file() and dir() methods are only for backwards compatibility and should be discouraged (deprecated in my opinion).

I don’t completely agree, we have a few sections of the code that require file(). See the initial design doc for details (it was one of templating things (GeoServerTeamplateLoader?) and perhaps Fonts in the styles directory.

Since dir() is a short cut for going through list() and calling file() it would remain undeprecated.

People should be use in() and out() instead, right?

Yes, we should only have a couple uses of file() in our application at the end of the day - and they will be very specific.

If you want to really use the resource system, and NOT the file system, there is no way of making sure a directory exists in the resource store (DB).

If you feel this is necessary, go ahead, I just do not see the need myself.

To list contents, rename, etc…

Indeed, but we can only do all of these things if the directory already exists. And we cannot easily force it to exist if we we need it to exist (see my example code). Basically, it is not possible to create an empty directory unless you create a dummy file and then delete it.

I saw your example code, I just do not understand why your code needs the directory to exist. That is why I was hoping for a real example from the codebase…

Okay I see your concern. If you list() an undefined resource you get an empty list … can you use a UNDEFINED resource and trust it to behave as a directory for your code?

I thought list() would throw an exception, but it actually returns null. I guess you have a point there, just leaving it as undefined would work. It does require us to do a null check, but okay. Some operation like rename might also require us to check if it actually exists, while if we knew for sure it did we didn’t have to worry. It seems like it would add more ifs to the code, often needing to verify whether it is actually a directory yet or not.

I would prefer to set rename up to be cheerful to rename undefined things, idea is to remove null checks and special cases where possible. So if you need an undefined resource to return an empty list you could make that change … indeed that may be smart so code like the following can work with out modification:

for (Resource r : resource.list() ){

}

Also , if the existence of directories is to an implementation detail hidden from the users, why is it possible to delete (but not to create an empty) directory?

This would help if you can hunt down an example from the codebase :slight_smile: Especially if you want to replace FileWatch …

Anything that calls findOrCreateDirectory().

You know resource.dir() acts like find or create a directory right? It will make the directory if needed …

Okay here is the search: https://github.com/geoserver/geoserver/search?utf8=✓&q=findOrCreateDirectory

src/community/python/src/main/java/org/geoserver/python/Python.java

  • This is a community module so it is not really our concern to port it.
  • That said it looks like it could be ported to resource use if required, if the script languages really want a directory to play with than using dir() to unpack a directory with all the scripts should be fine.

src/extension/csw/core/src/main/java/org/geoserver/csw/store/internal/GeoServerInternalCatalogStore.java

  • replace with Resource use

src/community/jdbcconfig/src/main/java/org/geoserver/jdbcconfig/internal/JDBCConfigPropertiesFactoryBean.java

  • not sure about this, creating a base directory is fine, but that would also be created as needed
  • this class also has a method to create the scripts directory - although see above scripts is a community module

src/community/w3ds/src/main/java/org/geoserver/w3ds/utilities/X3DInfoExtract.java

  • should just be using dir()

src/extension/importer/rest/src/main/java/org/geoserver/importer/rest/TaskResource.java

  • example of file system based code, not using resource store

src/community/wms-eo/src/main/java/org/geoserver/wms/eo/EoStyleCatalogListener.java

  • setting up for copyUrlToFile - should be able to use one of the resource utility classes for this

src/restconfig/src/main/java/org/geoserver/catalog/rest/FreemarkerTemplateResource.java

  • Some file upload code, should be able to convert to Resource use

src/community/w3ds/src/main/java/org/geoserver/w3ds/web/X3DLayerConfigPanel.java

  • Convert to resource use, a TileSetParser wants to be pointed at a directory

src/gwc/src/main/java/org/geoserver/gwc/layer/DefaultTileLayerCatalog.java

  • trying to get access to the base directory for tile sets

So when you go through these consider what the code is about:

  • if it is getting access to configuration it should use the resource store API (and thus JDBCConfigResourceStore) in(), out(), etc…
  • if it is needing a configuration file on disk (for use with a third-party library) it should use resource.file() or resource.dir() as appropriate (for use of Fonts, Templates).
  • If it is needing a cache (for example GWC tile cache) then it should be using Java File API (since such things are defined by environmental variables and may (or may not) be in the data directory they will not end up using the Resource Store API

Not sure what you mean with replacing FileWatch?

https://github.com/geoserver/geoserver/blob/master/src/main/src/main/java/org/geoserver/security/file/FileWatcher.java

Have a look at the above, it is used by security system to “watch” for configuration changes.
Jody

Jody,

Well this was something Kevin suggested that was considered before. That’s why I was saying we need to get away from the use of file(), of course we want people to use the DB directly. Without this, we will definitely need to change any code that writes configuration to a file, because that will not be saved then. I am aware of it, I was looking into it for the reason above. I’m just not sure why you thought I wanted to replace it. This is why we should continue this on a chat, miscommunications… :wink: Okay, understood. Fair enough, as long as they are not used to write data, and preferably not when not necessary but you agree with that. Yes I know, but again, I was talking about creating a directory in the store (which may or may not be a FS or a DB) and not on the FS. Anyway, let’s finish this on chat. Regards Niels

···

On 07/24/2015 10:28 PM, Jody Garnett wrote:

Still we may be thinking in a different direction. The initial import from data directory to JDBCConfig is a one time journey. After that the idea is to manage via REST API (yes ResourceStore still needs a REST API).

It sounds like you are considering editing files on disk, and then reimporting them as they change? That is a very tricky problem since the data directory would then be a local cache and and place for editing.

https://github.com/geoserver/geoserver/blob/master/src/main/src/main/java/org/geoserver/security/file/FileWatcher.java

Have a look at the above, it is used by security system to “watch” for configuration changes.

I don’t completely agree, we have a few sections of the code that require file(). See the initial design doc for details (it was one of templating things (GeoServerTeamplateLoader?) and perhaps Fonts in the styles directory.

Since dir() is a short cut for going through list() and calling file() it would remain undeprecated.

You know resource.dir() acts like find or create a directory right? It will make the directory if needed …