I am having trouble understanding the changes that were made about 6 months ago to the ResourceStore’s expected behaviour.
In particular, in the class ‘org.geoserver.platform.resource.ResourceTheoryTest’, the unit test ‘theoryRootSlashIsIgnored’ was replaced by ‘theoryRootIsAbsolute’. I cannot make sense out of this theory test at all.
This seems to be entirely contradictory to the whole reason that the ResourceStore API was created, that is to make an abstraction of the Data Directory, so that it can be replaced by something else (such as jdbc store or other implementations that have been made). There was already support for absolute file paths in all circumstances by using “file:” URLs. This will bypass the resource store and call Files.asResource instead. But resource: URLs are for the data directory or alternative resource store only. How does it make sense to get absolute paths from the resource store?
In order to make jdbc-config pass the tests, I will have to turn off this particular method. But why should the test even be there if the file resource store is the only one that could ever support it? Programmers and users will rely on this behaviour and support for all alternative implementations of ResourceStore will be broken. In this case we may as well do away with the API and just use the file system directly again.
The PRs for this activity contain extensive discussion.
The fundamental issue was the handling of absolute paths which was done differently by different sections of code. Specifically we found that the REST API endpoint was allowing paths //data and /data to both reference content in the data directory, rather than treating the first one as an absolute path. In response we tightened up the javadocs and added test cases including the one you mentioned.
I agree that this goes against the goal of resource store, but the codebase had drifted away from the intended use over time.
Now that you are present in the discussion it would be a good opportunity to discuss this with the parties involved.
Yes, I admit it’s my own fault for missing this discussion at the time.
I think it would be a shame to let the codebase further drift away from the intended use of the resource store. We have done the work to make the entirety of geoserver generic with respect to the implementation of the data directory and it is only a minimal effort to keep it this way. Even though jdbc store is still a community module and the Redis based geoserver project has been discontinued, it has been proven that alternative implementations for the data directory work and the jdbc store module is actually being used in production successfully.
Now, interestingly, I have discovered that there are contradictions in how it works / is documented now.
I discovered that while ‘theoryRootIsAbsolute’ test is successful, it is actually very misleading. At least on linux, paths starting with ‘/’ still create a file relative to the data directory!. The only difference is the string returned by the path() method. So while the path might seem absolute, the file you are accessing is not. So the path() method is misleading and in reality the leading slash is still being ignored (again, at least on linux).
Note that the javadoc of ResourceStore still says:
This abstraction assumes a unix like file system, all paths are relative
There it says it: all paths are relative for the resourcestore. Absolute paths have no meaning or function when it comes to the resource store.
I think this contradiction could be resolved in two possible ways:
the test ‘theoryIsRootSlashIsIgnored’ should come back instead of ‘theoryRootIsAbsolute’. The root slash is ignored and removed from the path.
The resource store throws an exception when you ask for an absolute path.
Either way, all absolute paths should be handled outside of the ResourceStore, for instance by calling Files.asResource(). So problems with absolute paths in the rest service should be resolved in the rest service, or some other layer that is used by the rest service.
···
Kind Regards
Niels
On 04/07/2023 21:59, Jody Garnett wrote:
Hello Niels,
The PRs for this activity contain extensive discussion.
The fundamental issue was the handling of absolute paths which was done differently by different sections of code. Specifically we found that the REST API endpoint was allowing paths //data and /data to both reference content in the data directory, rather than treating the first one as an absolute path. In response we tightened up the javadocs and added test cases including the one you mentioned.
I agree that this goes against the goal of resource store, but the codebase had drifted away from the intended use over time.
Now that you are present in the discussion it would be a good opportunity to discuss this with the parties involved.
I am having trouble understanding the changes that were made about 6 months ago to the ResourceStore’s expected behaviour.
In particular, in the class ‘org.geoserver.platform.resource.ResourceTheoryTest’, the unit test ‘theoryRootSlashIsIgnored’ was replaced by ‘theoryRootIsAbsolute’. I cannot make sense out of this theory test at all.
This seems to be entirely contradictory to the whole reason that the ResourceStore API was created, that is to make an abstraction of the Data Directory, so that it can be replaced by something else (such as jdbc store or other implementations that have been made). There was already support for absolute file paths in all circumstances by using “file:” URLs. This will bypass the resource store and call Files.asResource instead. But resource: URLs are for the data directory or alternative resource store only. How does it make sense to get absolute paths from the resource store?
In order to make jdbc-config pass the tests, I will have to turn off this particular method. But why should the test even be there if the file resource store is the only one that could ever support it? Programmers and users will rely on this behaviour and support for all alternative implementations of ResourceStore will be broken. In this case we may as well do away with the API and just use the file system directly again.
If “the codebase had drifted away from the intended use over time” I think it’s even more important to stick to the contract and not the other way around.
As far as I can see, there are two abstractions, ResourceStore and Resource, the former clearly says
“This abstraction assumes a unix like file system, all paths are relative and use forward slash {@code /} as the separator”,
the latter “Resource used for configuration storage.”.
Only by adhering to this contract, we could provide alternative implementations.
So if the changes were made to accommodate “common usage” on the specific case of a filesystem-based ResourceStore
implementation, it should be the other way around, to find out usages that don’t adhere to the spec and fix them.
Concretely, I think the only place where absolute URI’s would be requested to the ResourceStore, is where a data, not configuration,
file, is resolved, expecting the ResourceStore to be smart and resolve “file:data/roads.shp” relative to the datadir, and “file:/data/roads.shp”
relative to the file system.
Now, in doing so, we’re asking the ResourceStore to do what it’s not intended to. The test case Niels mentioned used to check that a leading “/”
had no meaning for the resource store (i.e. /data/roads.shp == data/roads.shp), and that was changed to mean the opposite.
So, IMHO, the responsibility of resolving files outside the datadir shouldn’t be on ResourceStore, but on the client code. Something like:
As a matter of fact, Resource.file():java.io.File should be deprecated and eventually removed. Resource is for config contents and
has Resource.in():InputStream and Resource.out():OutputStream.
It is hard enough already to adhere to lax interface contracts (catalog’s default workspace hello) as to keep on making it more and more difficult.
So what do we do? can we get to an agreement that the contract in the interfaces is mandatory and work from there?
Hey folks, I was tired and not in position to act clearly when these gaps were noticed last year.
We can tighten up the api definition, at least with the changes made we can now notice when an absolute path was used.
I never quite managed to communicate the separation between using a file URL and File access (when DataStores wish to access local files) as distinct from use of a resource for managing configuration files.
I suspect you both (Niels and Gabe) have met this distinction first hand - but it is hard to explain the value.
I agree with most of what Gabriel says, it is mainly the same point as mine except that I think it is unrealistic to deprecate the file() method. First of all we would need to to move the resource API to geotools because SLD files are stored in the resource store, but there are still tertiary tools that require files like template and image processors. The idea is that the file created on disk is only a cache for these tertiary tools and manipulating it will have no effect.
Jody, most of the docs seem fine, except that the second blue box explanation for ‘path’ is confusing paths with URLs: ‘file:/’ URLs are not part of the resourcestore and resource rest API, and the other problems is that docs are missing any mention of the “resource:” URLs. We must first of all distinguish between (1) resourcestore API and (2) Resources.fromURL method, I think there might be some confusion there. Then the distinction between file: and resource: urls need to be in the docs.
The (1) Resourcestore does not deal with URLs or absolute paths, only relative paths. Internally, geoserver modules load their configuration files in the resource store directly from this API.
The (2) Resources.fromURL deals with URLS and can handle both absolute file paths and resources from the resourcestore using the ‘resource:’ protocol prefix. This is for external use: the purpose is so that people can refer to both files on disk and resources in the resource store wherever they can specify a URL, for instance inside style files or in the parameters of a data store. File: URLs is for the entire file system, not the data directory and get ‘converted’ to a resource using the Files.asResource wrapping method. Resource: URLs refer to anything in the data directory or alternative resource store.
As legacy from pre-resourcestore times, geoserver supports (supported?) ‘file:’ URLs relative to the data directory (this is a little bit ‘dodgy’ since ‘file:’ urls are standardized and do not paths without leading slash - https://en.wikipedia.org/wiki/File_URI_scheme ). Note that the Resources.fromURL javadoc says this is deprecated and should be replaced with resource: URLs.
Kind Regards
Niels
···
On 05/07/2023 15:06, Jody Garnett wrote:
Hey folks, I was tired and not in position to act clearly when these gaps were noticed last year.
We can tighten up the api definition, at least with the changes made we can now notice when an absolute path was used.
I never quite managed to communicate the separation between using a file URL and File access (when DataStores wish to access local files) as distinct from use of a resource for managing configuration files.
I suspect you both (Niels and Gabe) have met this distinction first hand - but it is hard to explain the value.
If “the codebase had drifted away from the intended use over time” I think it’s even more important to stick to the contract and not the other way around.
As far as I can see, there are two abstractions, ResourceStore and Resource, the former clearly says
“This abstraction assumes a unix like file system, all paths are relative and use forward slash {@code /} as the separator”,
the latter “Resource used for configuration storage.”.
Only by adhering to this contract, we could provide alternative implementations.
So if the changes were made to accommodate “common usage” on the specific case of a filesystem-based ResourceStore
implementation, it should be the other way around, to find out usages that don’t adhere to the spec and fix them.
Concretely, I think the only place where absolute URI’s would be requested to the ResourceStore, is where a data, not configuration,
file, is resolved, expecting the ResourceStore to be smart and resolve “file:data/roads.shp” relative to the datadir, and “file:/data/roads.shp”
relative to the file system.
Now, in doing so, we’re asking the ResourceStore to do what it’s not intended to. The test case Niels mentioned used to check that a leading “/”
had no meaning for the resource store (i.e. /data/roads.shp == data/roads.shp), and that was changed to mean the opposite.
So, IMHO, the responsibility of resolving files outside the datadir shouldn’t be on ResourceStore, but on the client code. Something like:
As a matter of fact, Resource.file():java.io.File should be deprecated and eventually removed. Resource is for config contents and
has Resource.in():InputStream and Resource.out():OutputStream.
It is hard enough already to adhere to lax interface contracts (catalog’s default workspace hello) as to keep on making it more and more difficult.
So what do we do? can we get to an agreement that the contract in the interfaces is mandatory and work from there?
Yes, I admit it’s my own fault for missing this discussion at the time.
I think it would be a shame to let the codebase further drift away from the intended use of the resource store. We have done the work to make the entirety of geoserver generic with respect to the implementation of the data directory and it is only a minimal effort to keep it this way. Even though jdbc store is still a community module and the Redis based geoserver project has been discontinued, it has been proven that alternative implementations for the data directory work and the jdbc store module is actually being used in production successfully.
Now, interestingly, I have discovered that there are contradictions in how it works / is documented now.
I discovered that while ‘theoryRootIsAbsolute’ test is successful, it is actually very misleading. At least on linux, paths starting with ‘/’ still create a file relative to the data directory!. The only difference is the string returned by the path() method. So while the path might seem absolute, the file you are accessing is not. So the path() method is misleading and in reality the leading slash is still being ignored (again, at least on linux).
Note that the javadoc of ResourceStore still says:
This abstraction assumes a unix like file system, all paths are relative
There it says it: all paths are relative for the resourcestore. Absolute paths have no meaning or function when it comes to the resource store.
I think this contradiction could be resolved in two possible ways:
the test ‘theoryIsRootSlashIsIgnored’ should come back instead of ‘theoryRootIsAbsolute’. The root slash is ignored and removed from the path.
The resource store throws an exception when you ask for an absolute path.
Either way, all absolute paths should be handled outside of the ResourceStore, for instance by calling Files.asResource(). So problems with absolute paths in the rest service should be resolved in the rest service, or some other layer that is used by the rest service.
Kind Regards
Niels
On 04/07/2023 21:59, Jody Garnett wrote:
Hello Niels,
The PRs for this activity contain extensive discussion.
The fundamental issue was the handling of absolute paths which was done differently by different sections of code. Specifically we found that the REST API endpoint was allowing paths //data and /data to both reference content in the data directory, rather than treating the first one as an absolute path. In response we tightened up the javadocs and added test cases including the one you mentioned.
I agree that this goes against the goal of resource store, but the codebase had drifted away from the intended use over time.
Now that you are present in the discussion it would be a good opportunity to discuss this with the parties involved.
I am having trouble understanding the changes that were made about 6 months ago to the ResourceStore’s expected behaviour.
In particular, in the class ‘org.geoserver.platform.resource.ResourceTheoryTest’, the unit test ‘theoryRootSlashIsIgnored’ was replaced by ‘theoryRootIsAbsolute’. I cannot make sense out of this theory test at all.
This seems to be entirely contradictory to the whole reason that the ResourceStore API was created, that is to make an abstraction of the Data Directory, so that it can be replaced by something else (such as jdbc store or other implementations that have been made). There was already support for absolute file paths in all circumstances by using “file:” URLs. This will bypass the resource store and call Files.asResource instead. But resource: URLs are for the data directory or alternative resource store only. How does it make sense to get absolute paths from the resource store?
In order to make jdbc-config pass the tests, I will have to turn off this particular method. But why should the test even be there if the file resource store is the only one that could ever support it? Programmers and users will rely on this behaviour and support for all alternative implementations of ResourceStore will be broken. In this case we may as well do away with the API and just use the file system directly again.
I do not understand the comment about deprecating the file() method? I agree with everything you said about how it is used and useful.
I am not sure when resource URLs were added, I thought they were only added to pass information over to geotools side of things for SLD validation and did not expect to see them used directly in any user configuration. I guess we could talk resource: URL here in the developers docs explicitly. Do we know who wrote that (I hope it was not me).
Still let’s take this discussion and update the docs and API.
My comment about deprecating the file() was in response to what Gabriel said about that.
Resource: urls are indeed for instance supported by SLD, if you want to link to a resource inside your data directory / resource store. But I think they can be used about anywhere where file: or other URLs are supported. As the javadoc for Resources.fromURL says: relative file: URLs are deprecated, use resource: instead. I don’t know why they were never documented.
Kind Regards
Niels
···
On 07/07/2023 16:48, Jody Garnett wrote:
Niels,
I do not understand the comment about deprecating the file() method? I agree with everything you said about how it is used and useful.
I am not sure when resource URLs were added, I thought they were only added to pass information over to geotools side of things for SLD validation and did not expect to see them used directly in any user configuration. I guess we could talk resource: URL here in the developers docs explicitly. Do we know who wrote that (I hope it was not me).
Still let’s take this discussion and update the docs and API.
I agree with most of what Gabriel says, it is mainly the same point as mine except that I think it is unrealistic to deprecate the file() method. First of all we would need to to move the resource API to geotools because SLD files are stored in the resource store, but there are still tertiary tools that require files like template and image processors. The idea is that the file created on disk is only a cache for these tertiary tools and manipulating it will have no effect.
Jody, most of the docs seem fine, except that the second blue box explanation for ‘path’ is confusing paths with URLs: ‘file:/’ URLs are not part of the resourcestore and resource rest API, and the other problems is that docs are missing any mention of the “resource:” URLs. We must first of all distinguish between (1) resourcestore API and (2) Resources.fromURL method, I think there might be some confusion there. Then the distinction between file: and resource: urls need to be in the docs.
The (1) Resourcestore does not deal with URLs or absolute paths, only relative paths. Internally, geoserver modules load their configuration files in the resource store directly from this API.
The (2) Resources.fromURL deals with URLS and can handle both absolute file paths and resources from the resourcestore using the ‘resource:’ protocol prefix. This is for external use: the purpose is so that people can refer to both files on disk and resources in the resource store wherever they can specify a URL, for instance inside style files or in the parameters of a data store. File: URLs is for the entire file system, not the data directory and get ‘converted’ to a resource using the Files.asResource wrapping method. Resource: URLs refer to anything in the data directory or alternative resource store.
As legacy from pre-resourcestore times, geoserver supports (supported?) ‘file:’ URLs relative to the data directory (this is a little bit ‘dodgy’ since ‘file:’ urls are standardized and do not paths without leading slash - https://en.wikipedia.org/wiki/File_URI_scheme ). Note that the Resources.fromURL javadoc says this is deprecated and should be replaced with resource: URLs.
Kind Regards
Niels
On 05/07/2023 15:06, Jody Garnett wrote:
Hey folks, I was tired and not in position to act clearly when these gaps were noticed last year.
We can tighten up the api definition, at least with the changes made we can now notice when an absolute path was used.
I never quite managed to communicate the separation between using a file URL and File access (when DataStores wish to access local files) as distinct from use of a resource for managing configuration files.
I suspect you both (Niels and Gabe) have met this distinction first hand - but it is hard to explain the value.
If “the codebase had drifted away from the intended use over time” I think it’s even more important to stick to the contract and not the other way around.
As far as I can see, there are two abstractions, ResourceStore and Resource, the former clearly says
“This abstraction assumes a unix like file system, all paths are relative and use forward slash {@code /} as the separator”,
the latter “Resource used for configuration storage.”.
Only by adhering to this contract, we could provide alternative implementations.
So if the changes were made to accommodate “common usage” on the specific case of a filesystem-based ResourceStore
implementation, it should be the other way around, to find out usages that don’t adhere to the spec and fix them.
Concretely, I think the only place where absolute URI’s would be requested to the ResourceStore, is where a data, not configuration,
file, is resolved, expecting the ResourceStore to be smart and resolve “file:data/roads.shp” relative to the datadir, and “file:/data/roads.shp”
relative to the file system.
Now, in doing so, we’re asking the ResourceStore to do what it’s not intended to. The test case Niels mentioned used to check that a leading “/”
had no meaning for the resource store (i.e. /data/roads.shp == data/roads.shp), and that was changed to mean the opposite.
So, IMHO, the responsibility of resolving files outside the datadir shouldn’t be on ResourceStore, but on the client code. Something like:
As a matter of fact, Resource.file():java.io.File should be deprecated and eventually removed. Resource is for config contents and
has Resource.in():InputStream and Resource.out():OutputStream.
It is hard enough already to adhere to lax interface contracts (catalog’s default workspace hello) as to keep on making it more and more difficult.
So what do we do? can we get to an agreement that the contract in the interfaces is mandatory and work from there?
Yes, I admit it’s my own fault for missing this discussion at the time.
I think it would be a shame to let the codebase further drift away from the intended use of the resource store. We have done the work to make the entirety of geoserver generic with respect to the implementation of the data directory and it is only a minimal effort to keep it this way. Even though jdbc store is still a community module and the Redis based geoserver project has been discontinued, it has been proven that alternative implementations for the data directory work and the jdbc store module is actually being used in production successfully.
Now, interestingly, I have discovered that there are contradictions in how it works / is documented now.
I discovered that while ‘theoryRootIsAbsolute’ test is successful, it is actually very misleading. At least on linux, paths starting with ‘/’ still create a file relative to the data directory!. The only difference is the string returned by the path() method. So while the path might seem absolute, the file you are accessing is not. So the path() method is misleading and in reality the leading slash is still being ignored (again, at least on linux).
Note that the javadoc of ResourceStore still says:
This abstraction assumes a unix like file system, all paths are relative
There it says it: all paths are relative for the resourcestore. Absolute paths have no meaning or function when it comes to the resource store.
I think this contradiction could be resolved in two possible ways:
the test ‘theoryIsRootSlashIsIgnored’ should come back instead of ‘theoryRootIsAbsolute’. The root slash is ignored and removed from the path.
The resource store throws an exception when you ask for an absolute path.
Either way, all absolute paths should be handled outside of the ResourceStore, for instance by calling Files.asResource(). So problems with absolute paths in the rest service should be resolved in the rest service, or some other layer that is used by the rest service.
Kind Regards
Niels
On 04/07/2023 21:59, Jody Garnett wrote:
Hello Niels,
The PRs for this activity contain extensive discussion.
The fundamental issue was the handling of absolute paths which was done differently by different sections of code. Specifically we found that the REST API endpoint was allowing paths //data and /data to both reference content in the data directory, rather than treating the first one as an absolute path. In response we tightened up the javadocs and added test cases including the one you mentioned.
I agree that this goes against the goal of resource store, but the codebase had drifted away from the intended use over time.
Now that you are present in the discussion it would be a good opportunity to discuss this with the parties involved.
I am having trouble understanding the changes that were made about 6 months ago to the ResourceStore’s expected behaviour.
In particular, in the class ‘org.geoserver.platform.resource.ResourceTheoryTest’, the unit test ‘theoryRootSlashIsIgnored’ was replaced by ‘theoryRootIsAbsolute’. I cannot make sense out of this theory test at all.
This seems to be entirely contradictory to the whole reason that the ResourceStore API was created, that is to make an abstraction of the Data Directory, so that it can be replaced by something else (such as jdbc store or other implementations that have been made). There was already support for absolute file paths in all circumstances by using “file:” URLs. This will bypass the resource store and call Files.asResource instead. But resource: URLs are for the data directory or alternative resource store only. How does it make sense to get absolute paths from the resource store?
In order to make jdbc-config pass the tests, I will have to turn off this particular method. But why should the test even be there if the file resource store is the only one that could ever support it? Programmers and users will rely on this behaviour and support for all alternative implementations of ResourceStore will be broken. In this case we may as well do away with the API and just use the file system directly again.
I think we we made it so the ResourceStore implementations fail when provided an absolute path it would be best; and remove the test case that is causing problem.
That way only Resources.fromURL(base, url) would support “absolute” paths and life would be more clear?
In theory I agree but there is always a risk to changing lenient behavior to throwing exceptions, if there is code or production configs that have relied on paths starting with ‘/’ just passing as relative…
···
On 11/07/2023 22:27, Jody Garnett wrote:
Thanks:
Reading the docs, it is okay but not great.
I think we we made it so the ResourceStore implementations fail when provided an absolute path it would be best; and remove the test case that is causing problem.
That way only Resources.fromURL(base, url) would support “absolute” paths and life would be more clear?
My comment about deprecating the file() was in response to what Gabriel said about that.
Resource: urls are indeed for instance supported by SLD, if you want to link to a resource inside your data directory / resource store. But I think they can be used about anywhere where file: or other URLs are supported. As the javadoc for Resources.fromURL says: relative file: URLs are deprecated, use resource: instead. I don’t know why they were never documented.
Kind Regards
Niels
On 07/07/2023 16:48, Jody Garnett wrote:
Niels,
I do not understand the comment about deprecating the file() method? I agree with everything you said about how it is used and useful.
I am not sure when resource URLs were added, I thought they were only added to pass information over to geotools side of things for SLD validation and did not expect to see them used directly in any user configuration. I guess we could talk resource: URL here in the developers docs explicitly. Do we know who wrote that (I hope it was not me).
Still let’s take this discussion and update the docs and API.
I agree with most of what Gabriel says, it is mainly the same point as mine except that I think it is unrealistic to deprecate the file() method. First of all we would need to to move the resource API to geotools because SLD files are stored in the resource store, but there are still tertiary tools that require files like template and image processors. The idea is that the file created on disk is only a cache for these tertiary tools and manipulating it will have no effect.
Jody, most of the docs seem fine, except that the second blue box explanation for ‘path’ is confusing paths with URLs: ‘file:/’ URLs are not part of the resourcestore and resource rest API, and the other problems is that docs are missing any mention of the “resource:” URLs. We must first of all distinguish between (1) resourcestore API and (2) Resources.fromURL method, I think there might be some confusion there. Then the distinction between file: and resource: urls need to be in the docs.
The (1) Resourcestore does not deal with URLs or absolute paths, only relative paths. Internally, geoserver modules load their configuration files in the resource store directly from this API.
The (2) Resources.fromURL deals with URLS and can handle both absolute file paths and resources from the resourcestore using the ‘resource:’ protocol prefix. This is for external use: the purpose is so that people can refer to both files on disk and resources in the resource store wherever they can specify a URL, for instance inside style files or in the parameters of a data store. File: URLs is for the entire file system, not the data directory and get ‘converted’ to a resource using the Files.asResource wrapping method. Resource: URLs refer to anything in the data directory or alternative resource store.
As legacy from pre-resourcestore times, geoserver supports (supported?) ‘file:’ URLs relative to the data directory (this is a little bit ‘dodgy’ since ‘file:’ urls are standardized and do not paths without leading slash - https://en.wikipedia.org/wiki/File_URI_scheme ). Note that the Resources.fromURL javadoc says this is deprecated and should be replaced with resource: URLs.
Kind Regards
Niels
On 05/07/2023 15:06, Jody Garnett wrote:
Hey folks, I was tired and not in position to act clearly when these gaps were noticed last year.
We can tighten up the api definition, at least with the changes made we can now notice when an absolute path was used.
I never quite managed to communicate the separation between using a file URL and File access (when DataStores wish to access local files) as distinct from use of a resource for managing configuration files.
I suspect you both (Niels and Gabe) have met this distinction first hand - but it is hard to explain the value.
If “the codebase had drifted away from the intended use over time” I think it’s even more important to stick to the contract and not the other way around.
As far as I can see, there are two abstractions, ResourceStore and Resource, the former clearly says
“This abstraction assumes a unix like file system, all paths are relative and use forward slash {@code /} as the separator”,
the latter “Resource used for configuration storage.”.
Only by adhering to this contract, we could provide alternative implementations.
So if the changes were made to accommodate “common usage” on the specific case of a filesystem-based ResourceStore
implementation, it should be the other way around, to find out usages that don’t adhere to the spec and fix them.
Concretely, I think the only place where absolute URI’s would be requested to the ResourceStore, is where a data, not configuration,
file, is resolved, expecting the ResourceStore to be smart and resolve “file:data/roads.shp” relative to the datadir, and “file:/data/roads.shp”
relative to the file system.
Now, in doing so, we’re asking the ResourceStore to do what it’s not intended to. The test case Niels mentioned used to check that a leading “/”
had no meaning for the resource store (i.e. /data/roads.shp == data/roads.shp), and that was changed to mean the opposite.
So, IMHO, the responsibility of resolving files outside the datadir shouldn’t be on ResourceStore, but on the client code. Something like:
As a matter of fact, Resource.file():java.io.File should be deprecated and eventually removed. Resource is for config contents and
has Resource.in():InputStream and Resource.out():OutputStream.
It is hard enough already to adhere to lax interface contracts (catalog’s default workspace hello) as to keep on making it more and more difficult.
So what do we do? can we get to an agreement that the contract in the interfaces is mandatory and work from there?
Yes, I admit it’s my own fault for missing this discussion at the time.
I think it would be a shame to let the codebase further drift away from the intended use of the resource store. We have done the work to make the entirety of geoserver generic with respect to the implementation of the data directory and it is only a minimal effort to keep it this way. Even though jdbc store is still a community module and the Redis based geoserver project has been discontinued, it has been proven that alternative implementations for the data directory work and the jdbc store module is actually being used in production successfully.
Now, interestingly, I have discovered that there are contradictions in how it works / is documented now.
I discovered that while ‘theoryRootIsAbsolute’ test is successful, it is actually very misleading. At least on linux, paths starting with ‘/’ still create a file relative to the data directory!. The only difference is the string returned by the path() method. So while the path might seem absolute, the file you are accessing is not. So the path() method is misleading and in reality the leading slash is still being ignored (again, at least on linux).
Note that the javadoc of ResourceStore still says:
This abstraction assumes a unix like file system, all paths are relative
There it says it: all paths are relative for the resourcestore. Absolute paths have no meaning or function when it comes to the resource store.
I think this contradiction could be resolved in two possible ways:
the test ‘theoryIsRootSlashIsIgnored’ should come back instead of ‘theoryRootIsAbsolute’. The root slash is ignored and removed from the path.
The resource store throws an exception when you ask for an absolute path.
Either way, all absolute paths should be handled outside of the ResourceStore, for instance by calling Files.asResource(). So problems with absolute paths in the rest service should be resolved in the rest service, or some other layer that is used by the rest service.
Kind Regards
Niels
On 04/07/2023 21:59, Jody Garnett wrote:
Hello Niels,
The PRs for this activity contain extensive discussion.
The fundamental issue was the handling of absolute paths which was done differently by different sections of code. Specifically we found that the REST API endpoint was allowing paths //data and /data to both reference content in the data directory, rather than treating the first one as an absolute path. In response we tightened up the javadocs and added test cases including the one you mentioned.
I agree that this goes against the goal of resource store, but the codebase had drifted away from the intended use over time.
Now that you are present in the discussion it would be a good opportunity to discuss this with the parties involved.
I am having trouble understanding the changes that were made about 6 months ago to the ResourceStore’s expected behaviour.
In particular, in the class ‘org.geoserver.platform.resource.ResourceTheoryTest’, the unit test ‘theoryRootSlashIsIgnored’ was replaced by ‘theoryRootIsAbsolute’. I cannot make sense out of this theory test at all.
This seems to be entirely contradictory to the whole reason that the ResourceStore API was created, that is to make an abstraction of the Data Directory, so that it can be replaced by something else (such as jdbc store or other implementations that have been made). There was already support for absolute file paths in all circumstances by using “file:” URLs. This will bypass the resource store and call Files.asResource instead. But resource: URLs are for the data directory or alternative resource store only. How does it make sense to get absolute paths from the resource store?
In order to make jdbc-config pass the tests, I will have to turn off this particular method. But why should the test even be there if the file resource store is the only one that could ever support it? Programmers and users will rely on this behaviour and support for all alternative implementations of ResourceStore will be broken. In this case we may as well do away with the API and just use the file system directly again.
In theory I agree but there is always a risk to changing lenient behavior to throwing exceptions, if there is code or production configs that have relied on paths starting with ‘/’ just passing as relative…
On 11/07/2023 22:27, Jody Garnett wrote:
Thanks:
Reading the docs, it is okay but not great.
I think we we made it so the ResourceStore implementations fail when provided an absolute path it would be best; and remove the test case that is causing problem.
That way only Resources.fromURL(base, url) would support “absolute” paths and life would be more clear?
My comment about deprecating the file() was in response to what Gabriel said about that.
Resource: urls are indeed for instance supported by SLD, if you want to link to a resource inside your data directory / resource store. But I think they can be used about anywhere where file: or other URLs are supported. As the javadoc for Resources.fromURL says: relative file: URLs are deprecated, use resource: instead. I don’t know why they were never documented.
Kind Regards
Niels
On 07/07/2023 16:48, Jody Garnett wrote:
Niels,
I do not understand the comment about deprecating the file() method? I agree with everything you said about how it is used and useful.
I am not sure when resource URLs were added, I thought they were only added to pass information over to geotools side of things for SLD validation and did not expect to see them used directly in any user configuration. I guess we could talk resource: URL here in the developers docs explicitly. Do we know who wrote that (I hope it was not me).
Still let’s take this discussion and update the docs and API.
I agree with most of what Gabriel says, it is mainly the same point as mine except that I think it is unrealistic to deprecate the file() method. First of all we would need to to move the resource API to geotools because SLD files are stored in the resource store, but there are still tertiary tools that require files like template and image processors. The idea is that the file created on disk is only a cache for these tertiary tools and manipulating it will have no effect.
Jody, most of the docs seem fine, except that the second blue box explanation for ‘path’ is confusing paths with URLs: ‘file:/’ URLs are not part of the resourcestore and resource rest API, and the other problems is that docs are missing any mention of the “resource:” URLs. We must first of all distinguish between (1) resourcestore API and (2) Resources.fromURL method, I think there might be some confusion there. Then the distinction between file: and resource: urls need to be in the docs.
The (1) Resourcestore does not deal with URLs or absolute paths, only relative paths. Internally, geoserver modules load their configuration files in the resource store directly from this API.
The (2) Resources.fromURL deals with URLS and can handle both absolute file paths and resources from the resourcestore using the ‘resource:’ protocol prefix. This is for external use: the purpose is so that people can refer to both files on disk and resources in the resource store wherever they can specify a URL, for instance inside style files or in the parameters of a data store. File: URLs is for the entire file system, not the data directory and get ‘converted’ to a resource using the Files.asResource wrapping method. Resource: URLs refer to anything in the data directory or alternative resource store.
As legacy from pre-resourcestore times, geoserver supports (supported?) ‘file:’ URLs relative to the data directory (this is a little bit ‘dodgy’ since ‘file:’ urls are standardized and do not paths without leading slash - https://en.wikipedia.org/wiki/File_URI_scheme ). Note that the Resources.fromURL javadoc says this is deprecated and should be replaced with resource: URLs.
Kind Regards
Niels
On 05/07/2023 15:06, Jody Garnett wrote:
Hey folks, I was tired and not in position to act clearly when these gaps were noticed last year.
We can tighten up the api definition, at least with the changes made we can now notice when an absolute path was used.
I never quite managed to communicate the separation between using a file URL and File access (when DataStores wish to access local files) as distinct from use of a resource for managing configuration files.
I suspect you both (Niels and Gabe) have met this distinction first hand - but it is hard to explain the value.
If “the codebase had drifted away from the intended use over time” I think it’s even more important to stick to the contract and not the other way around.
As far as I can see, there are two abstractions, ResourceStore and Resource, the former clearly says
“This abstraction assumes a unix like file system, all paths are relative and use forward slash {@code /} as the separator”,
the latter “Resource used for configuration storage.”.
Only by adhering to this contract, we could provide alternative implementations.
So if the changes were made to accommodate “common usage” on the specific case of a filesystem-based ResourceStore
implementation, it should be the other way around, to find out usages that don’t adhere to the spec and fix them.
Concretely, I think the only place where absolute URI’s would be requested to the ResourceStore, is where a data, not configuration,
file, is resolved, expecting the ResourceStore to be smart and resolve “file:data/roads.shp” relative to the datadir, and “file:/data/roads.shp”
relative to the file system.
Now, in doing so, we’re asking the ResourceStore to do what it’s not intended to. The test case Niels mentioned used to check that a leading “/”
had no meaning for the resource store (i.e. /data/roads.shp == data/roads.shp), and that was changed to mean the opposite.
So, IMHO, the responsibility of resolving files outside the datadir shouldn’t be on ResourceStore, but on the client code. Something like:
As a matter of fact, Resource.file():java.io.File should be deprecated and eventually removed. Resource is for config contents and
has Resource.in():InputStream and Resource.out():OutputStream.
It is hard enough already to adhere to lax interface contracts (catalog’s default workspace hello) as to keep on making it more and more difficult.
So what do we do? can we get to an agreement that the contract in the interfaces is mandatory and work from there?
Yes, I admit it’s my own fault for missing this discussion at the time.
I think it would be a shame to let the codebase further drift away from the intended use of the resource store. We have done the work to make the entirety of geoserver generic with respect to the implementation of the data directory and it is only a minimal effort to keep it this way. Even though jdbc store is still a community module and the Redis based geoserver project has been discontinued, it has been proven that alternative implementations for the data directory work and the jdbc store module is actually being used in production successfully.
Now, interestingly, I have discovered that there are contradictions in how it works / is documented now.
I discovered that while ‘theoryRootIsAbsolute’ test is successful, it is actually very misleading. At least on linux, paths starting with ‘/’ still create a file relative to the data directory!. The only difference is the string returned by the path() method. So while the path might seem absolute, the file you are accessing is not. So the path() method is misleading and in reality the leading slash is still being ignored (again, at least on linux).
Note that the javadoc of ResourceStore still says:
This abstraction assumes a unix like file system, all paths are relative
There it says it: all paths are relative for the resourcestore. Absolute paths have no meaning or function when it comes to the resource store.
I think this contradiction could be resolved in two possible ways:
the test ‘theoryIsRootSlashIsIgnored’ should come back instead of ‘theoryRootIsAbsolute’. The root slash is ignored and removed from the path.
The resource store throws an exception when you ask for an absolute path.
Either way, all absolute paths should be handled outside of the ResourceStore, for instance by calling Files.asResource(). So problems with absolute paths in the rest service should be resolved in the rest service, or some other layer that is used by the rest service.
Kind Regards
Niels
On 04/07/2023 21:59, Jody Garnett wrote:
Hello Niels,
The PRs for this activity contain extensive discussion.
The fundamental issue was the handling of absolute paths which was done differently by different sections of code. Specifically we found that the REST API endpoint was allowing paths //data and /data to both reference content in the data directory, rather than treating the first one as an absolute path. In response we tightened up the javadocs and added test cases including the one you mentioned.
I agree that this goes against the goal of resource store, but the codebase had drifted away from the intended use over time.
Now that you are present in the discussion it would be a good opportunity to discuss this with the parties involved.
I am having trouble understanding the changes that were made about 6 months ago to the ResourceStore’s expected behaviour.
In particular, in the class ‘org.geoserver.platform.resource.ResourceTheoryTest’, the unit test ‘theoryRootSlashIsIgnored’ was replaced by ‘theoryRootIsAbsolute’. I cannot make sense out of this theory test at all.
This seems to be entirely contradictory to the whole reason that the ResourceStore API was created, that is to make an abstraction of the Data Directory, so that it can be replaced by something else (such as jdbc store or other implementations that have been made). There was already support for absolute file paths in all circumstances by using “file:” URLs. This will bypass the resource store and call Files.asResource instead. But resource: URLs are for the data directory or alternative resource store only. How does it make sense to get absolute paths from the resource store?
In order to make jdbc-config pass the tests, I will have to turn off this particular method. But why should the test even be there if the file resource store is the only one that could ever support it? Programmers and users will rely on this behaviour and support for all alternative implementations of ResourceStore will be broken. In this case we may as well do away with the API and just use the file system directly again.
I think we make the change, so resource store fails with absolute path, and learn what other areas of the code assume resource store can access any location on disk.
Then we will know how widespread this problems is.
I am happy to split up the work, and we can add to the documentation a very clear sentence about resource store not supporting absolute paths.
I have been looking into this, but I do not think that throwing an exception with absolute paths is a viable option. It is too problematic for backwards compatibility. For one it seems to completely break the “file chooser” in web-core, and it causes problems all around.
The problem is the confusion of the term “absolute path” which can mean two things: absolute for the file system or absolute for the resource store’s structure. The thing is, it has always been so that the resource store “root” has been treated as a linux style root: “/”. It is normal to write ResourceStore.get(“/a/b/c”) or Resources.fromURL(“resource:/a/b/c”). It is even much more natural than to write this as a relative path. That is because it actually is not relative but absolute: absolute for the resource store, separate from the file system.
Also, checking for absolute paths would mean very different behaviour on windows than linux. On windows resource store style “/a/b/c/” paths would not be recognised as absolute and thus allowed, but they would be forbidden on linux. But the syntax of resource store paths is always the same, and is not dependent on the syntax of file system paths.
This is exactly what is so problematic with the ‘theoryRootIsAbsoluteTest’: it confirms whether a resource store path is absolute for the operating system file system. But as I said, the resource store path syntax does not depend on the operating system! So this is essentially a test that forces the resource store to be the file system, and not a custom implementation.
We need to get rid of this test and get back to the old test that confirms that “/a/b/c” and “a/b/c” are one and the same thing, since the resource store has no concept of either a “current directory” or a file system root. It has its own root and it always starts from there, for any path submitted to it. This is very clear and lenient at the same time.
Kind Regards
Niels
···
On 21/07/2023 16:06, Jody Garnett wrote:
I think we make the change, so resource store fails with absolute path, and learn what other areas of the code assume resource store can access any location on disk.
Then we will know how widespread this problems is.
I am happy to split up the work, and we can add to the documentation a very clear sentence about resource store not supporting absolute paths.
In theory I agree but there is always a risk to changing lenient behavior to throwing exceptions, if there is code or production configs that have relied on paths starting with ‘/’ just passing as relative…
On 11/07/2023 22:27, Jody Garnett wrote:
Thanks:
Reading the docs, it is okay but not great.
I think we we made it so the ResourceStore implementations fail when provided an absolute path it would be best; and remove the test case that is causing problem.
That way only Resources.fromURL(base, url) would support “absolute” paths and life would be more clear?
My comment about deprecating the file() was in response to what Gabriel said about that.
Resource: urls are indeed for instance supported by SLD, if you want to link to a resource inside your data directory / resource store. But I think they can be used about anywhere where file: or other URLs are supported. As the javadoc for Resources.fromURL says: relative file: URLs are deprecated, use resource: instead. I don’t know why they were never documented.
Kind Regards
Niels
On 07/07/2023 16:48, Jody Garnett wrote:
Niels,
I do not understand the comment about deprecating the file() method? I agree with everything you said about how it is used and useful.
I am not sure when resource URLs were added, I thought they were only added to pass information over to geotools side of things for SLD validation and did not expect to see them used directly in any user configuration. I guess we could talk resource: URL here in the developers docs explicitly. Do we know who wrote that (I hope it was not me).
Still let’s take this discussion and update the docs and API.
I agree with most of what Gabriel says, it is mainly the same point as mine except that I think it is unrealistic to deprecate the file() method. First of all we would need to to move the resource API to geotools because SLD files are stored in the resource store, but there are still tertiary tools that require files like template and image processors. The idea is that the file created on disk is only a cache for these tertiary tools and manipulating it will have no effect.
Jody, most of the docs seem fine, except that the second blue box explanation for ‘path’ is confusing paths with URLs: ‘file:/’ URLs are not part of the resourcestore and resource rest API, and the other problems is that docs are missing any mention of the “resource:” URLs. We must first of all distinguish between (1) resourcestore API and (2) Resources.fromURL method, I think there might be some confusion there. Then the distinction between file: and resource: urls need to be in the docs.
The (1) Resourcestore does not deal with URLs or absolute paths, only relative paths. Internally, geoserver modules load their configuration files in the resource store directly from this API.
The (2) Resources.fromURL deals with URLS and can handle both absolute file paths and resources from the resourcestore using the ‘resource:’ protocol prefix. This is for external use: the purpose is so that people can refer to both files on disk and resources in the resource store wherever they can specify a URL, for instance inside style files or in the parameters of a data store. File: URLs is for the entire file system, not the data directory and get ‘converted’ to a resource using the Files.asResource wrapping method. Resource: URLs refer to anything in the data directory or alternative resource store.
As legacy from pre-resourcestore times, geoserver supports (supported?) ‘file:’ URLs relative to the data directory (this is a little bit ‘dodgy’ since ‘file:’ urls are standardized and do not paths without leading slash - https://en.wikipedia.org/wiki/File_URI_scheme ). Note that the Resources.fromURL javadoc says this is deprecated and should be replaced with resource: URLs.
Kind Regards
Niels
On 05/07/2023 15:06, Jody Garnett wrote:
Hey folks, I was tired and not in position to act clearly when these gaps were noticed last year.
We can tighten up the api definition, at least with the changes made we can now notice when an absolute path was used.
I never quite managed to communicate the separation between using a file URL and File access (when DataStores wish to access local files) as distinct from use of a resource for managing configuration files.
I suspect you both (Niels and Gabe) have met this distinction first hand - but it is hard to explain the value.
If “the codebase had drifted away from the intended use over time” I think it’s even more important to stick to the contract and not the other way around.
As far as I can see, there are two abstractions, ResourceStore and Resource, the former clearly says
“This abstraction assumes a unix like file system, all paths are relative and use forward slash {@code /} as the separator”,
the latter “Resource used for configuration storage.”.
Only by adhering to this contract, we could provide alternative implementations.
So if the changes were made to accommodate “common usage” on the specific case of a filesystem-based ResourceStore
implementation, it should be the other way around, to find out usages that don’t adhere to the spec and fix them.
Concretely, I think the only place where absolute URI’s would be requested to the ResourceStore, is where a data, not configuration,
file, is resolved, expecting the ResourceStore to be smart and resolve “file:data/roads.shp” relative to the datadir, and “file:/data/roads.shp”
relative to the file system.
Now, in doing so, we’re asking the ResourceStore to do what it’s not intended to. The test case Niels mentioned used to check that a leading “/”
had no meaning for the resource store (i.e. /data/roads.shp == data/roads.shp), and that was changed to mean the opposite.
So, IMHO, the responsibility of resolving files outside the datadir shouldn’t be on ResourceStore, but on the client code. Something like:
As a matter of fact, Resource.file():java.io.File should be deprecated and eventually removed. Resource is for config contents and
has Resource.in():InputStream and Resource.out():OutputStream.
It is hard enough already to adhere to lax interface contracts (catalog’s default workspace hello) as to keep on making it more and more difficult.
So what do we do? can we get to an agreement that the contract in the interfaces is mandatory and work from there?
Yes, I admit it’s my own fault for missing this discussion at the time.
I think it would be a shame to let the codebase further drift away from the intended use of the resource store. We have done the work to make the entirety of geoserver generic with respect to the implementation of the data directory and it is only a minimal effort to keep it this way. Even though jdbc store is still a community module and the Redis based geoserver project has been discontinued, it has been proven that alternative implementations for the data directory work and the jdbc store module is actually being used in production successfully.
Now, interestingly, I have discovered that there are contradictions in how it works / is documented now.
I discovered that while ‘theoryRootIsAbsolute’ test is successful, it is actually very misleading. At least on linux, paths starting with ‘/’ still create a file relative to the data directory!. The only difference is the string returned by the path() method. So while the path might seem absolute, the file you are accessing is not. So the path() method is misleading and in reality the leading slash is still being ignored (again, at least on linux).
Note that the javadoc of ResourceStore still says:
This abstraction assumes a unix like file system, all paths are relative
There it says it: all paths are relative for the resourcestore. Absolute paths have no meaning or function when it comes to the resource store.
I think this contradiction could be resolved in two possible ways:
the test ‘theoryIsRootSlashIsIgnored’ should come back instead of ‘theoryRootIsAbsolute’. The root slash is ignored and removed from the path.
The resource store throws an exception when you ask for an absolute path.
Either way, all absolute paths should be handled outside of the ResourceStore, for instance by calling Files.asResource(). So problems with absolute paths in the rest service should be resolved in the rest service, or some other layer that is used by the rest service.
Kind Regards
Niels
On 04/07/2023 21:59, Jody Garnett wrote:
Hello Niels,
The PRs for this activity contain extensive discussion.
The fundamental issue was the handling of absolute paths which was done differently by different sections of code. Specifically we found that the REST API endpoint was allowing paths //data and /data to both reference content in the data directory, rather than treating the first one as an absolute path. In response we tightened up the javadocs and added test cases including the one you mentioned.
I agree that this goes against the goal of resource store, but the codebase had drifted away from the intended use over time.
Now that you are present in the discussion it would be a good opportunity to discuss this with the parties involved.
I am having trouble understanding the changes that were made about 6 months ago to the ResourceStore’s expected behaviour.
In particular, in the class ‘org.geoserver.platform.resource.ResourceTheoryTest’, the unit test ‘theoryRootSlashIsIgnored’ was replaced by ‘theoryRootIsAbsolute’. I cannot make sense out of this theory test at all.
This seems to be entirely contradictory to the whole reason that the ResourceStore API was created, that is to make an abstraction of the Data Directory, so that it can be replaced by something else (such as jdbc store or other implementations that have been made). There was already support for absolute file paths in all circumstances by using “file:” URLs. This will bypass the resource store and call Files.asResource instead. But resource: URLs are for the data directory or alternative resource store only. How does it make sense to get absolute paths from the resource store?
In order to make jdbc-config pass the tests, I will have to turn off this particular method. But why should the test even be there if the file resource store is the only one that could ever support it? Programmers and users will rely on this behaviour and support for all alternative implementations of ResourceStore will be broken. In this case we may as well do away with the API and just use the file system directly again.
You have made a long email about asking that an absolute path be treated as relative to data directory.
The problem we are faced with is that the code base has two inconsistent uses of absolute path.
No matter what decision made we are going to break half the code base.
With that in mind “causing problems all around” is expected.
I am also not sure if we are talking about the same thing exactly. If the documentation only paths (the Strings) are absolute.
When faced with a path we can clearly tell:
relative: process as relative to the data directory use Resource API to access content. If needed resource.file() can be used to unpack File onto local file system.
absolute: process as File. If needed for a test case or something there is a wrapper that can pretend a random File location is a resource.
I feel you are not understanding me, it seems you are not really responding to any my points.
I disagree with your analysis. There is no need to break half to code base. In my opinion there is nothing wrong with the original idea and that is already how nearly all of the code base works. I think barely any work is necessary, if we stick with the original design. The original design is clear and makes a lot of sense.
What is your definition of an “absolute path”? Currently the method Paths.isAbsolute() is OS dependent. Should the syntax of the resource store paths be OS dependent, so that paths starting with a forward slash are allowed in windows but not in linux? That makes no sense to me, so please clarify.
I think there are two big confusions at the heart of this:
confusing paths with URLs. Only URLS support both the file system and the resource store at the same time. Apart from that the file system and the resource store live in separate worlds. So there is no joint concept of a “path”. (they do not even necessarily have the same basis syntax: the resource store always uses forward slashes.) It is not true that we need to decide for a certain path whether it is for the file system or the resource store, based on it being relative or absolute. We only have to do this, and already do this, for URLs, using the the ‘protocol’ (resource: vs file:).
assuming that the resource store is part of the file system. As far the developer who uses the resource API is concerned, the resource store is not part of the file system. That is just one particular implementation.
The resource store literally has a “root”. This root is commonly referred to as “/”. It is very unnatural to start resource store paths immediately with a name, because the resource store is not relative to anything. It is its own thing and it has a root.
Regards
Niels
···
On 31/07/2023 16:12, Jody Garnett wrote:
Do I understand correctly,
You have made a long email about asking that an absolute path be treated as relative to data directory.
The problem we are faced with is that the code base has two inconsistent uses of absolute path.
No matter what decision made we are going to break half the code base.
With that in mind “causing problems all around” is expected.
I am also not sure if we are talking about the same thing exactly. If the documentation only paths (the Strings) are absolute.
When faced with a path we can clearly tell:
relative: process as relative to the data directory use Resource API to access content. If needed resource.file() can be used to unpack File onto local file system.
absolute: process as File. If needed for a test case or something there is a wrapper that can pretend a random File location is a resource.
I have been looking into this, but I do not think that throwing an exception with absolute paths is a viable option. It is too problematic for backwards compatibility. For one it seems to completely break the “file chooser” in web-core, and it causes problems all around.
The problem is the confusion of the term “absolute path” which can mean two things: absolute for the file system or absolute for the resource store’s structure. The thing is, it has always been so that the resource store “root” has been treated as a linux style root: “/”. It is normal to write ResourceStore.get(“/a/b/c”) or Resources.fromURL(“resource:/a/b/c”). It is even much more natural than to write this as a relative path. That is because it actually is not relative but absolute: absolute for the resource store, separate from the file system.
Also, checking for absolute paths would mean very different behaviour on windows than linux. On windows resource store style “/a/b/c/” paths would not be recognised as absolute and thus allowed, but they would be forbidden on linux. But the syntax of resource store paths is always the same, and is not dependent on the syntax of file system paths.
This is exactly what is so problematic with the ‘theoryRootIsAbsoluteTest’: it confirms whether a resource store path is absolute for the operating system file system. But as I said, the resource store path syntax does not depend on the operating system! So this is essentially a test that forces the resource store to be the file system, and not a custom implementation.
We need to get rid of this test and get back to the old test that confirms that “/a/b/c” and “a/b/c” are one and the same thing, since the resource store has no concept of either a “current directory” or a file system root. It has its own root and it always starts from there, for any path submitted to it. This is very clear and lenient at the same time.
Kind Regards
Niels
On 21/07/2023 16:06, Jody Garnett wrote:
I think we make the change, so resource store fails with absolute path, and learn what other areas of the code assume resource store can access any location on disk.
Then we will know how widespread this problems is.
I am happy to split up the work, and we can add to the documentation a very clear sentence about resource store not supporting absolute paths.
In theory I agree but there is always a risk to changing lenient behavior to throwing exceptions, if there is code or production configs that have relied on paths starting with ‘/’ just passing as relative…
On 11/07/2023 22:27, Jody Garnett wrote:
Thanks:
Reading the docs, it is okay but not great.
I think we we made it so the ResourceStore implementations fail when provided an absolute path it would be best; and remove the test case that is causing problem.
That way only Resources.fromURL(base, url) would support “absolute” paths and life would be more clear?
My comment about deprecating the file() was in response to what Gabriel said about that.
Resource: urls are indeed for instance supported by SLD, if you want to link to a resource inside your data directory / resource store. But I think they can be used about anywhere where file: or other URLs are supported. As the javadoc for Resources.fromURL says: relative file: URLs are deprecated, use resource: instead. I don’t know why they were never documented.
Kind Regards
Niels
On 07/07/2023 16:48, Jody Garnett wrote:
Niels,
I do not understand the comment about deprecating the file() method? I agree with everything you said about how it is used and useful.
I am not sure when resource URLs were added, I thought they were only added to pass information over to geotools side of things for SLD validation and did not expect to see them used directly in any user configuration. I guess we could talk resource: URL here in the developers docs explicitly. Do we know who wrote that (I hope it was not me).
Still let’s take this discussion and update the docs and API.
I agree with most of what Gabriel says, it is mainly the same point as mine except that I think it is unrealistic to deprecate the file() method. First of all we would need to to move the resource API to geotools because SLD files are stored in the resource store, but there are still tertiary tools that require files like template and image processors. The idea is that the file created on disk is only a cache for these tertiary tools and manipulating it will have no effect.
Jody, most of the docs seem fine, except that the second blue box explanation for ‘path’ is confusing paths with URLs: ‘file:/’ URLs are not part of the resourcestore and resource rest API, and the other problems is that docs are missing any mention of the “resource:” URLs. We must first of all distinguish between (1) resourcestore API and (2) Resources.fromURL method, I think there might be some confusion there. Then the distinction between file: and resource: urls need to be in the docs.
The (1) Resourcestore does not deal with URLs or absolute paths, only relative paths. Internally, geoserver modules load their configuration files in the resource store directly from this API.
The (2) Resources.fromURL deals with URLS and can handle both absolute file paths and resources from the resourcestore using the ‘resource:’ protocol prefix. This is for external use: the purpose is so that people can refer to both files on disk and resources in the resource store wherever they can specify a URL, for instance inside style files or in the parameters of a data store. File: URLs is for the entire file system, not the data directory and get ‘converted’ to a resource using the Files.asResource wrapping method. Resource: URLs refer to anything in the data directory or alternative resource store.
As legacy from pre-resourcestore times, geoserver supports (supported?) ‘file:’ URLs relative to the data directory (this is a little bit ‘dodgy’ since ‘file:’ urls are standardized and do not paths without leading slash - https://en.wikipedia.org/wiki/File_URI_scheme ). Note that the Resources.fromURL javadoc says this is deprecated and should be replaced with resource: URLs.
Kind Regards
Niels
On 05/07/2023 15:06, Jody Garnett wrote:
Hey folks, I was tired and not in position to act clearly when these gaps were noticed last year.
We can tighten up the api definition, at least with the changes made we can now notice when an absolute path was used.
I never quite managed to communicate the separation between using a file URL and File access (when DataStores wish to access local files) as distinct from use of a resource for managing configuration files.
I suspect you both (Niels and Gabe) have met this distinction first hand - but it is hard to explain the value.
If “the codebase had drifted away from the intended use over time” I think it’s even more important to stick to the contract and not the other way around.
As far as I can see, there are two abstractions, ResourceStore and Resource, the former clearly says
“This abstraction assumes a unix like file system, all paths are relative and use forward slash {@code /} as the separator”,
the latter “Resource used for configuration storage.”.
Only by adhering to this contract, we could provide alternative implementations.
So if the changes were made to accommodate “common usage” on the specific case of a filesystem-based ResourceStore
implementation, it should be the other way around, to find out usages that don’t adhere to the spec and fix them.
Concretely, I think the only place where absolute URI’s would be requested to the ResourceStore, is where a data, not configuration,
file, is resolved, expecting the ResourceStore to be smart and resolve “file:data/roads.shp” relative to the datadir, and “file:/data/roads.shp”
relative to the file system.
Now, in doing so, we’re asking the ResourceStore to do what it’s not intended to. The test case Niels mentioned used to check that a leading “/”
had no meaning for the resource store (i.e. /data/roads.shp == data/roads.shp), and that was changed to mean the opposite.
So, IMHO, the responsibility of resolving files outside the datadir shouldn’t be on ResourceStore, but on the client code. Something like:
As a matter of fact, Resource.file():java.io.File should be deprecated and eventually removed. Resource is for config contents and
has Resource.in():InputStream and Resource.out():OutputStream.
It is hard enough already to adhere to lax interface contracts (catalog’s default workspace hello) as to keep on making it more and more difficult.
So what do we do? can we get to an agreement that the contract in the interfaces is mandatory and work from there?
Yes, I admit it’s my own fault for missing this discussion at the time.
I think it would be a shame to let the codebase further drift away from the intended use of the resource store. We have done the work to make the entirety of geoserver generic with respect to the implementation of the data directory and it is only a minimal effort to keep it this way. Even though jdbc store is still a community module and the Redis based geoserver project has been discontinued, it has been proven that alternative implementations for the data directory work and the jdbc store module is actually being used in production successfully.
Now, interestingly, I have discovered that there are contradictions in how it works / is documented now.
I discovered that while ‘theoryRootIsAbsolute’ test is successful, it is actually very misleading. At least on linux, paths starting with ‘/’ still create a file relative to the data directory!. The only difference is the string returned by the path() method. So while the path might seem absolute, the file you are accessing is not. So the path() method is misleading and in reality the leading slash is still being ignored (again, at least on linux).
Note that the javadoc of ResourceStore still says:
This abstraction assumes a unix like file system, all paths are relative
There it says it: all paths are relative for the resourcestore. Absolute paths have no meaning or function when it comes to the resource store.
I think this contradiction could be resolved in two possible ways:
the test ‘theoryIsRootSlashIsIgnored’ should come back instead of ‘theoryRootIsAbsolute’. The root slash is ignored and removed from the path.
The resource store throws an exception when you ask for an absolute path.
Either way, all absolute paths should be handled outside of the ResourceStore, for instance by calling Files.asResource(). So problems with absolute paths in the rest service should be resolved in the rest service, or some other layer that is used by the rest service.
Kind Regards
Niels
On 04/07/2023 21:59, Jody Garnett wrote:
Hello Niels,
The PRs for this activity contain extensive discussion.
The fundamental issue was the handling of absolute paths which was done differently by different sections of code. Specifically we found that the REST API endpoint was allowing paths //data and /data to both reference content in the data directory, rather than treating the first one as an absolute path. In response we tightened up the javadocs and added test cases including the one you mentioned.
I agree that this goes against the goal of resource store, but the codebase had drifted away from the intended use over time.
Now that you are present in the discussion it would be a good opportunity to discuss this with the parties involved.
I am having trouble understanding the changes that were made about 6 months ago to the ResourceStore’s expected behaviour.
In particular, in the class ‘org.geoserver.platform.resource.ResourceTheoryTest’, the unit test ‘theoryRootSlashIsIgnored’ was replaced by ‘theoryRootIsAbsolute’. I cannot make sense out of this theory test at all.
This seems to be entirely contradictory to the whole reason that the ResourceStore API was created, that is to make an abstraction of the Data Directory, so that it can be replaced by something else (such as jdbc store or other implementations that have been made). There was already support for absolute file paths in all circumstances by using “file:” URLs. This will bypass the resource store and call Files.asResource instead. But resource: URLs are for the data directory or alternative resource store only. How does it make sense to get absolute paths from the resource store?
In order to make jdbc-config pass the tests, I will have to turn off this particular method. But why should the test even be there if the file resource store is the only one that could ever support it? Programmers and users will rely on this behaviour and support for all alternative implementations of ResourceStore will be broken. In this case we may as well do away with the API and just use the file system directly again.