[Geoserver-devel] ResourceStore API workparty

Hello Jody,

Unfortunately I am not really on board with your plan. I really don’t think we need to make a new branch and do a lot of work. This seems like a lot of unnecessary work. I think the original design is fine, and does not require breaking the code base. We just need to go back to how things were and perhaps fix a few cases where people have gone wrong since then.

  • Gabe and Niels are having trouble with the API changes made earlier this year

  • The definition of path was inconsistent for windows absolute paths

I disagree. The definition of ‘path’ for the resource store is not in any way related to the OS, because the resource store lives in a separate world from the file system. The resource store and the file system have no joint concept of ‘path’.

  • Early resource store work was not strict about use of paths, and when to use Resource and when to use File

I disagree. It was always very clear about this. There should obviously never be hard coded absolute paths for the file system. In general the code should basically always use resources. Only where users can use URLs they have access to both the file system and the resource store, this is always how it was intended to work, as can be read from the javadoc.

  • The email conversation is going in circles because something has to break

  • We have to accept that something that break

I disagree. I think nearly all of the code base is fine with how it was designed to work.

Regards

Niels

···

On 01/08/2023 19:31, Jody Garnett wrote:

Niel, Gabe:

Our email thread on this topic has explored the topic by now, I think action rather than more communication is needed next.

Can we setup a 1/2 day breakout work-party to get this activity underway (prior to the Bolsena code sprint and the next release cycle). Even a couple hours to get the topic started on a branch would be great.

I am proposing two times below, and am willing to wake up early to see this topic move along:

  1. August 8th 15 CEST / 11 BRT / 6 PDT
  2. August 15th: 15 CEST / 11 BRT / 6 PDT

Is this a good idea :slight_smile:


Jody Garnett

Niels:

We may be on the same page, you indicate ResoueceStore should never use absolute paths for the file system.

That is the decision I wish to make, and then we should go do the work to enforce it. How much work? I am not sure, hence the work party.

There is something subtle here though. You are focused on the ResourceStore API, and I am asking you to consider the application as a whole.

We do have paths in the application including absolute paths. The QA activity to check that these are handled correctly is the work.

Paths are not only for ResourceStore.

Jody

···


Jody Garnett


Jody Garnett

Hi Jody,

I do support checking everything is handled correctly.

I just don’t seem the point in breaking huge amounts of code for nothing. And that is what will happen if we start throwing exceptions in the resource store if the path starts with a “/”. Which is actually normal, because the resource store has a root.

Please: define “absolute path”. Is this OS dependent? Should the behaviour of the resource store be OS dependent? What /exactly/ are you suggesting should be allowed or not allowed to access a resource from the resource store?

Yes, “paths” are used across the application. But “path” is a vague term, ‘xpaths’ are paths too, they have no relationship with file paths. Resource paths are no different, they live in their own world and are not part of the file system. The notion of a path being ‘absolute’ or ‘relative’ has no meaning when you are talking about paths in general, this is file system terminology.

When someone gets a resource directly from the resource store, they are using the resource store as a black box, and the “path” is always relative to the root of the resource store.

Resource.fromURL is a different story, these are not hard-coded URL’s, they are user provided, and then we allow both the file system as the resource store to be accessed, where file: is for the file system and resource: is for the resource store.

I just haven’t seen any valid argument why the original design is flawed.

Kind Regards

Niels

Niels:

Comments inline

Hi Jody,

I do support checking everything is handled correctly.

Thanks Niels, sorry I am a bit frustrated these days so am probably not explaining clearly.

I just don’t seem the point in breaking huge amounts of code for nothing. And that is what will happen if we start throwing exceptions in the resource store if the path starts with a “/”. Which is actually normal, because the resource store has a root.

ResourceStore no longer uses / as a root.
The work done earlier this year removed the use leading / from the codebase (including the REST API Controller).

Please: define “absolute path”. Is this OS dependent? Should the behaviour of the resource store be OS dependent? What /exactly/

I put the definitions into the document I shared and the javadocs (not sure why I cannot find the javadocs online?).

When I say “path” I mean the strings used by the class Paths.isAbsolute(String)

This is covered in the developers guide here: https://docs.geoserver.org/latest/en/developer/programming-guide/config/resource.html#paths

Paths are broken down into a sequence of names, as listed by Paths.names(path):

  • Path.names("data/tasmania/roads.shp") is represented as a list of data, tasmania, roads.shp.

  • On linux Path.names("/src/gis/cadaster/district.geopkg") starts with a marker to indicate an absolute path, resulting in /, src, gis, cadaster, district.geopkg.

  • On windows Path.names("D:/gis/cadaster/district.geopkg") starts with a marker to indicate an absolute path, resulting in D:/, gis, cadaster, district.geopkg.

are you suggesting should be allowed or not allowed to access a resource from the resource store?

I am suggesting paths that return true for this method be not allowed to access resources from the resource store.

Yes, “paths” are used across the application. But “path” is a vague term, ‘xpaths’ are paths too, they have no relationship with file paths. Resource paths are no different, they live in their own world and are not part of the file system. The notion of a path being ‘absolute’ or ‘relative’ has no meaning when you are talking about paths in general, this is file system terminology.

Earlier in this email thread I shared the developer guide documentation which was written to define what Paths mean for GeoServer.

When someone gets a resource directly from the resource store, they are using the resource store as a black box, and the “path” is always relative to the root of the resource store.

Correct, relative paths are with respect to the resource store location. Actually when I did he QA I found that there was at one point a search path that could be scanned to allow “data directory” to be spread across several folders (terrifying).

Resource.fromURL is a different story, these are not hard-coded URL’s, they are user provided, and then we allow both the file system as the resource store to be accessed, where file: is for the file system and resource: is for the resource store.

I have not seen users use “resource:”, to my knowledge that was only used internally in the logic between in geoserver and geotools ( applied by a style visitor to help geotools resolve icons in the data directory.)

See the examples Resource.fromURL(base, url):

  • Resources.fromURL( null, "/src/gis/cadaster/district.geopgk") - absolute file path (linux)

  • Resources.fromURL( baseDirectory, "D:\\gis\\cadaster\\district.geopkg") - absolute file path (windows)

  • Resources.fromURL( baseDirectory, "file:///D:/gis/cadaster/district.geopkg") - absolute file url (windows)

  • Resources.fromURL( baseDirectory, "ftp://veftp.gsfc.nasa.gov/bluemarble/") - null (external reference)

See the examples Paths.convert(base, filename)

  • Paths.convert(base,file) - uses URI relativize method to determine relative path (between file and base)

  • Paths.convert(base,folder, fileLocation) - can resolve relative location, limited to content within the base directory

  • Paths.convert(base, filename)

See examples Files.fromURL( base, url)

  • Files.fromURL( null, "resource:styles/logo.svg") - internal url format restricted to data directory content

  • Files.fromURL( null, "/src/gis/cadaster/district.geopgk") - absolute file path (linux)

  • Files.fromURL( baseDirectory, "D:\\gis\\cadaster\\district.geopkg") - absolute file path (windows)

  • Files.fromURL( baseDirectory, "file:///D:/gis/cadaster/district.geopkg") - absolute file url (windows)

  • Files.fromURL( baseDirectory, "ftp://veftp.gsfc.nasa.gov/bluemarble/") - null (external reference ignored as we cannot determine a file)

  • Files.fromURL( baseDirectory, "sde://user:pass@anonymised.com:port") - null (custom strings are ignored as we cannot determine a file)

Folks use Paths (in the sense defined above) here also; the is the motivation for fixing absolute paths on windows (as the definition had changed in Java 11) and where the inconsistency of use was found.

I just haven’t seen any valid argument why the original design is flawed.

I do not believe the origional design was flawed, it was not consistently applied in the codebase. Where a path is used

Kind Regards

Niels

Does it help that path and absolute path is defined strictly above?

Thanks,
Jody

···


Jody Garnett


Jody Garnett


Jody Garnett

Hello Jody,

You write "I am suggesting paths that return true for this method be not allowed to access resources from the resource store. "

I think this is problematic: the method is OS dependent: a path with leading slash will return “false” on windows and “true” on linux. So your suggestion is that resource paths can start with a leading slash on windows, but not on linux. I don’t think the syntax of the resource store should be OS dependent. Because the resource store is not part of the file system or the OS, it is an internal system of geoserver.

I think the recent changes are problematic: you added a test method to the resource theory test that can only work for a file system based resource store, and not for alternative implementations. It makes the existence of the resource store abstract API obsolete. There is a lot of confusion going on and creeping into the code base.

My position is that this should be pulled back. Removing all leading slashes from resource stores paths will require a huge amount of unnecessary work, and I still do not see a reason why this is necessary.

Regards

Niels

···

On 03/08/2023 01:49, Jody Garnett wrote:

Niels:

Comments inline


Jody Garnett

On Aug 2, 2023 at 12:20:43 PM, Niels Charlier <niels@…2918…> wrote:

Hi Jody,

I do support checking everything is handled correctly.

Thanks Niels, sorry I am a bit frustrated these days so am probably not explaining clearly.

I just don’t seem the point in breaking huge amounts of code for nothing. And that is what will happen if we start throwing exceptions in the resource store if the path starts with a “/”. Which is actually normal, because the resource store has a root.

ResourceStore no longer uses / as a root.
The work done earlier this year removed the use leading / from the codebase (including the REST API Controller).

Please: define “absolute path”. Is this OS dependent? Should the behaviour of the resource store be OS dependent? What /exactly/

I put the definitions into the document I shared and the javadocs (not sure why I cannot find the javadocs online?).

When I say “path” I mean the strings used by the class Paths.isAbsolute(String)

This is covered in the developers guide here: https://docs.geoserver.org/latest/en/developer/programming-guide/config/resource.html#paths

Paths are broken down into a sequence of names, as listed by Paths.names(path):

  • Path.names("data/tasmania/roads.shp") is represented as a list of data, tasmania, roads.shp.

  • On linux Path.names("/src/gis/cadaster/district.geopkg") starts with a marker to indicate an absolute path, resulting in /, src, gis, cadaster, district.geopkg.

  • On windows Path.names("D:/gis/cadaster/district.geopkg") starts with a marker to indicate an absolute path, resulting in D:/, gis, cadaster, district.geopkg.

are you suggesting should be allowed or not allowed to access a resource from the resource store?

I am suggesting paths that return true for this method be not allowed to access resources from the resource store.

Yes, “paths” are used across the application. But “path” is a vague term, ‘xpaths’ are paths too, they have no relationship with file paths. Resource paths are no different, they live in their own world and are not part of the file system. The notion of a path being ‘absolute’ or ‘relative’ has no meaning when you are talking about paths in general, this is file system terminology.

Earlier in this email thread I shared the developer guide documentation which was written to define what Paths mean for GeoServer.

When someone gets a resource directly from the resource store, they are using the resource store as a black box, and the “path” is always relative to the root of the resource store.

Correct, relative paths are with respect to the resource store location. Actually when I did he QA I found that there was at one point a search path that could be scanned to allow “data directory” to be spread across several folders (terrifying).

Resource.fromURL is a different story, these are not hard-coded URL’s, they are user provided, and then we allow both the file system as the resource store to be accessed, where file: is for the file system and resource: is for the resource store.

I have not seen users use “resource:”, to my knowledge that was only used internally in the logic between in geoserver and geotools ( applied by a style visitor to help geotools resolve icons in the data directory.)

See the examples Resource.fromURL(base, url):

  • Resources.fromURL( null, "/src/gis/cadaster/district.geopgk") - absolute file path (linux)

  • Resources.fromURL( baseDirectory, "D:\\gis\\cadaster\\district.geopkg") - absolute file path (windows)

  • Resources.fromURL( baseDirectory, "file:///D:/gis/cadaster/district.geopkg") - absolute file url (windows)

  • Resources.fromURL( baseDirectory, "ftp://veftp.gsfc.nasa.gov/bluemarble/") - null (external reference)

See the examples Paths.convert(base, filename)

  • Paths.convert(base,file) - uses URI relativize method to determine relative path (between file and base)

  • Paths.convert(base,folder, fileLocation) - can resolve relative location, limited to content within the base directory

  • Paths.convert(base, filename)

See examples Files.fromURL( base, url)

  • Files.fromURL( null, "resource:styles/logo.svg") - internal url format restricted to data directory content

  • Files.fromURL( null, "/src/gis/cadaster/district.geopgk") - absolute file path (linux)

  • Files.fromURL( baseDirectory, "D:\\gis\\cadaster\\district.geopkg") - absolute file path (windows)

  • Files.fromURL( baseDirectory, "file:///D:/gis/cadaster/district.geopkg") - absolute file url (windows)

  • Files.fromURL( baseDirectory, "ftp://veftp.gsfc.nasa.gov/bluemarble/") - null (external reference ignored as we cannot determine a file)

  • Files.fromURL( baseDirectory, "sde://user:pass@anonymised.com:port") - null (custom strings are ignored as we cannot determine a file)

Folks use Paths (in the sense defined above) here also; the is the motivation for fixing absolute paths on windows (as the definition had changed in Java 11) and where the inconsistency of use was found.

I just haven’t seen any valid argument why the original design is flawed.

I do not believe the origional design was flawed, it was not consistently applied in the codebase. Where a path is used

Kind Regards

Niels

Does it help that path and absolute path is defined strictly above?

Thanks,
Jody

On 02/08/2023 16:40, Jody Garnett wrote:

Niels:

We may be on the same page, you indicate ResoueceStore should never use absolute paths for the file system.

That is the decision I wish to make, and then we should go do the work to enforce it. How much work? I am not sure, hence the work party.

There is something subtle here though. You are focused on the ResourceStore API, and I am asking you to consider the application as a whole.

We do have paths in the application including absolute paths. The QA activity to check that these are handled correctly is the work.

Paths are not only for ResourceStore.

Jody

On Tue, Aug 1, 2023 at 11:55 PM Niels Charlier <niels@anonymised.com> wrote:

Hello Jody,

Unfortunately I am not really on board with your plan. I really don’t think we need to make a new branch and do a lot of work. This seems like a lot of unnecessary work. I think the original design is fine, and does not require breaking the code base. We just need to go back to how things were and perhaps fix a few cases where people have gone wrong since then.

  • Gabe and Niels are having trouble with the API changes made earlier this year

  • The definition of path was inconsistent for windows absolute paths

I disagree. The definition of ‘path’ for the resource store is not in any way related to the OS, because the resource store lives in a separate world from the file system. The resource store and the file system have no joint concept of ‘path’.

  • Early resource store work was not strict about use of paths, and when to use Resource and when to use File

I disagree. It was always very clear about this. There should obviously never be hard coded absolute paths for the file system. In general the code should basically always use resources. Only where users can use URLs they have access to both the file system and the resource store, this is always how it was intended to work, as can be read from the javadoc.

  • The email conversation is going in circles because something has to break

  • We have to accept that something that break

I disagree. I think nearly all of the code base is fine with how it was designed to work.

Regards

Niels

On 01/08/2023 19:31, Jody Garnett wrote:

Niel, Gabe:

Our email thread on this topic has explored the topic by now, I think action rather than more communication is needed next.

Can we setup a 1/2 day breakout work-party to get this activity underway (prior to the Bolsena code sprint and the next release cycle). Even a couple hours to get the topic started on a branch would be great.

I am proposing two times below, and am willing to wake up early to see this topic move along:

  1. August 8th 15 CEST / 11 BRT / 6 PDT
  2. August 15th: 15 CEST / 11 BRT / 6 PDT

Is this a good idea :slight_smile:


Jody Garnett


Jody Garnett

Jody,

Perhaps I can offer a suggestion that may be considered a reasonable compromise.

The “resource:” URLs are not used because they were never documented in user docs. (The javadoc however clearly states that it always was the intention to be used for access to the ‘data directory’ via URLs.) I can see that one of the issues you have been working to resolve is the usage of URLs without a prefix, which are just naked paths and where there is ambiguity in interpretation.

The resourcestore is low level API only accessible to developers, not users. Users will always access it via some layer in between, like rest or something that calls Resources.fromURL. When a developer call resourcestore.get he knows he is not necessarily using the file system, and the paths should not be considered file system paths. With Resources.fromURL this is not so clear.

My suggestion is that we bring back / leave the resource store API to how it was, being lenient on the leading slash because there is no possible confusion there, and it avoids a huge amount of work. We may however agree (and document this) that the Resources.fromURL, as well as REST or other services accessible to users use the following guidelines (and we document this for both developers as well as users):

  • file: URLS will use the file system
  • resource: URLs will use the resource store
  • naked path without URL prefix: absolute paths are interpreted as file: URLs, relative paths are interpreted as resource: URLs.

This way the issue is addressed at the level where it actually occurs, and OS/filesystem stuff does not creep into the resource store API.

Kind Regards

Niels

···

On 03/08/2023 01:49, Jody Garnett wrote:

Niels:

Comments inline


Jody Garnett

On Aug 2, 2023 at 12:20:43 PM, Niels Charlier <niels@…2918…> wrote:

Hi Jody,

I do support checking everything is handled correctly.

Thanks Niels, sorry I am a bit frustrated these days so am probably not explaining clearly.

I just don’t seem the point in breaking huge amounts of code for nothing. And that is what will happen if we start throwing exceptions in the resource store if the path starts with a “/”. Which is actually normal, because the resource store has a root.

ResourceStore no longer uses / as a root.
The work done earlier this year removed the use leading / from the codebase (including the REST API Controller).

Please: define “absolute path”. Is this OS dependent? Should the behaviour of the resource store be OS dependent? What /exactly/

I put the definitions into the document I shared and the javadocs (not sure why I cannot find the javadocs online?).

When I say “path” I mean the strings used by the class Paths.isAbsolute(String)

This is covered in the developers guide here: https://docs.geoserver.org/latest/en/developer/programming-guide/config/resource.html#paths

Paths are broken down into a sequence of names, as listed by Paths.names(path):

  • Path.names("data/tasmania/roads.shp") is represented as a list of data, tasmania, roads.shp.

  • On linux Path.names("/src/gis/cadaster/district.geopkg") starts with a marker to indicate an absolute path, resulting in /, src, gis, cadaster, district.geopkg.

  • On windows Path.names("D:/gis/cadaster/district.geopkg") starts with a marker to indicate an absolute path, resulting in D:/, gis, cadaster, district.geopkg.

are you suggesting should be allowed or not allowed to access a resource from the resource store?

I am suggesting paths that return true for this method be not allowed to access resources from the resource store.

Yes, “paths” are used across the application. But “path” is a vague term, ‘xpaths’ are paths too, they have no relationship with file paths. Resource paths are no different, they live in their own world and are not part of the file system. The notion of a path being ‘absolute’ or ‘relative’ has no meaning when you are talking about paths in general, this is file system terminology.

Earlier in this email thread I shared the developer guide documentation which was written to define what Paths mean for GeoServer.

When someone gets a resource directly from the resource store, they are using the resource store as a black box, and the “path” is always relative to the root of the resource store.

Correct, relative paths are with respect to the resource store location. Actually when I did he QA I found that there was at one point a search path that could be scanned to allow “data directory” to be spread across several folders (terrifying).

Resource.fromURL is a different story, these are not hard-coded URL’s, they are user provided, and then we allow both the file system as the resource store to be accessed, where file: is for the file system and resource: is for the resource store.

I have not seen users use “resource:”, to my knowledge that was only used internally in the logic between in geoserver and geotools ( applied by a style visitor to help geotools resolve icons in the data directory.)

See the examples Resource.fromURL(base, url):

  • Resources.fromURL( null, "/src/gis/cadaster/district.geopgk") - absolute file path (linux)

  • Resources.fromURL( baseDirectory, "D:\\gis\\cadaster\\district.geopkg") - absolute file path (windows)

  • Resources.fromURL( baseDirectory, "file:///D:/gis/cadaster/district.geopkg") - absolute file url (windows)

  • Resources.fromURL( baseDirectory, "ftp://veftp.gsfc.nasa.gov/bluemarble/") - null (external reference)

See the examples Paths.convert(base, filename)

  • Paths.convert(base,file) - uses URI relativize method to determine relative path (between file and base)

  • Paths.convert(base,folder, fileLocation) - can resolve relative location, limited to content within the base directory

  • Paths.convert(base, filename)

See examples Files.fromURL( base, url)

  • Files.fromURL( null, "resource:styles/logo.svg") - internal url format restricted to data directory content

  • Files.fromURL( null, "/src/gis/cadaster/district.geopgk") - absolute file path (linux)

  • Files.fromURL( baseDirectory, "D:\\gis\\cadaster\\district.geopkg") - absolute file path (windows)

  • Files.fromURL( baseDirectory, "file:///D:/gis/cadaster/district.geopkg") - absolute file url (windows)

  • Files.fromURL( baseDirectory, "ftp://veftp.gsfc.nasa.gov/bluemarble/") - null (external reference ignored as we cannot determine a file)

  • Files.fromURL( baseDirectory, "sde://user:pass@anonymised.com:port") - null (custom strings are ignored as we cannot determine a file)

Folks use Paths (in the sense defined above) here also; the is the motivation for fixing absolute paths on windows (as the definition had changed in Java 11) and where the inconsistency of use was found.

I just haven’t seen any valid argument why the original design is flawed.

I do not believe the origional design was flawed, it was not consistently applied in the codebase. Where a path is used

Kind Regards

Niels

Does it help that path and absolute path is defined strictly above?

Thanks,
Jody

On 02/08/2023 16:40, Jody Garnett wrote:

Niels:

We may be on the same page, you indicate ResoueceStore should never use absolute paths for the file system.

That is the decision I wish to make, and then we should go do the work to enforce it. How much work? I am not sure, hence the work party.

There is something subtle here though. You are focused on the ResourceStore API, and I am asking you to consider the application as a whole.

We do have paths in the application including absolute paths. The QA activity to check that these are handled correctly is the work.

Paths are not only for ResourceStore.

Jody

On Tue, Aug 1, 2023 at 11:55 PM Niels Charlier <niels@anonymised.com> wrote:

Hello Jody,

Unfortunately I am not really on board with your plan. I really don’t think we need to make a new branch and do a lot of work. This seems like a lot of unnecessary work. I think the original design is fine, and does not require breaking the code base. We just need to go back to how things were and perhaps fix a few cases where people have gone wrong since then.

  • Gabe and Niels are having trouble with the API changes made earlier this year

  • The definition of path was inconsistent for windows absolute paths

I disagree. The definition of ‘path’ for the resource store is not in any way related to the OS, because the resource store lives in a separate world from the file system. The resource store and the file system have no joint concept of ‘path’.

  • Early resource store work was not strict about use of paths, and when to use Resource and when to use File

I disagree. It was always very clear about this. There should obviously never be hard coded absolute paths for the file system. In general the code should basically always use resources. Only where users can use URLs they have access to both the file system and the resource store, this is always how it was intended to work, as can be read from the javadoc.

  • The email conversation is going in circles because something has to break

  • We have to accept that something that break

I disagree. I think nearly all of the code base is fine with how it was designed to work.

Regards

Niels

On 01/08/2023 19:31, Jody Garnett wrote:

Niel, Gabe:

Our email thread on this topic has explored the topic by now, I think action rather than more communication is needed next.

Can we setup a 1/2 day breakout work-party to get this activity underway (prior to the Bolsena code sprint and the next release cycle). Even a couple hours to get the topic started on a branch would be great.

I am proposing two times below, and am willing to wake up early to see this topic move along:

  1. August 8th 15 CEST / 11 BRT / 6 PDT
  2. August 15th: 15 CEST / 11 BRT / 6 PDT

Is this a good idea :slight_smile:


Jody Garnett


Jody Garnett

Jody,

Note that the current javadoc of org.geoserver.platform.resource.Files is highly problematic.

From the original javadoc we get:

"Utility class for handling Resource paths in a consistent fashion.

This utility class is primarily aimed at implementations of ResourceStore …"

This utility class was indeed intended for resource paths and the resource store, not file paths and the file system.

"The base location is represented with “”, * relative paths are not supported.

The sentence ‘relative paths are not supported’ was meant with respect to the resource store. The meaning here is that all paths in the resource store are absolute, with respect to the REsource Store root, not the file system root!

And then comes the new sentence:

" Absolute paths are supported, with Linux systems using a leading {@code /}, and windows using * {@code L:}."

This sentence contradicts everything before it. If this is a resourcestore utility class,

why are we suddenly talking about absolute paths in the operating system? The resource store has nothing to do with this.

There never should have been anything inside this class

testing on operating system. There should never have been the method called “isAbsolute”. This should have been put inside another class that is used for file paths.

This will incredibly confuse new developers. They will not at all understand the purpose of the

resource store as an abstraction that should not be assumed to be part of the file system.

Instead of clarifying things for developers, make sure they understand how to use it and not abuse

it, it seems we are giving up on its intended purpose completely and just making the confusion

worse by writing nonsensical javadoc about it. Unless this is rolled back, we may as well

delete the ‘jdbcstore’ module and give up entirely on the resource API.

If that is the chose direction, fine, it is not up to me, but then be clear about it.

Kind Regards

Niels

···

On 03/08/2023 08:46, Niels Charlier via Geoserver-devel wrote:

Jody,

Perhaps I can offer a suggestion that may be considered a reasonable compromise.

The “resource:” URLs are not used because they were never documented in user docs. (The javadoc however clearly states that it always was the intention to be used for access to the ‘data directory’ via URLs.) I can see that one of the issues you have been working to resolve is the usage of URLs without a prefix, which are just naked paths and where there is ambiguity in interpretation.

The resourcestore is low level API only accessible to developers, not users. Users will always access it via some layer in between, like rest or something that calls Resources.fromURL. When a developer call resourcestore.get he knows he is not necessarily using the file system, and the paths should not be considered file system paths. With Resources.fromURL this is not so clear.

My suggestion is that we bring back / leave the resource store API to how it was, being lenient on the leading slash because there is no possible confusion there, and it avoids a huge amount of work. We may however agree (and document this) that the Resources.fromURL, as well as REST or other services accessible to users use the following guidelines (and we document this for both developers as well as users):

  • file: URLS will use the file system
  • resource: URLs will use the resource store
  • naked path without URL prefix: absolute paths are interpreted as file: URLs, relative paths are interpreted as resource: URLs.

This way the issue is addressed at the level where it actually occurs, and OS/filesystem stuff does not creep into the resource store API.

Kind Regards

Niels

On 03/08/2023 01:49, Jody Garnett wrote:

Niels:

Comments inline


Jody Garnett

On Aug 2, 2023 at 12:20:43 PM, Niels Charlier <niels@anonymised.com> wrote:

Hi Jody,

I do support checking everything is handled correctly.

Thanks Niels, sorry I am a bit frustrated these days so am probably not explaining clearly.

I just don’t seem the point in breaking huge amounts of code for nothing. And that is what will happen if we start throwing exceptions in the resource store if the path starts with a “/”. Which is actually normal, because the resource store has a root.

ResourceStore no longer uses / as a root.
The work done earlier this year removed the use leading / from the codebase (including the REST API Controller).

Please: define “absolute path”. Is this OS dependent? Should the behaviour of the resource store be OS dependent? What /exactly/

I put the definitions into the document I shared and the javadocs (not sure why I cannot find the javadocs online?).

When I say “path” I mean the strings used by the class Paths.isAbsolute(String)

This is covered in the developers guide here: https://docs.geoserver.org/latest/en/developer/programming-guide/config/resource.html#paths

Paths are broken down into a sequence of names, as listed by Paths.names(path):

  • Path.names("data/tasmania/roads.shp") is represented as a list of data, tasmania, roads.shp.

  • On linux Path.names("/src/gis/cadaster/district.geopkg") starts with a marker to indicate an absolute path, resulting in /, src, gis, cadaster, district.geopkg.

  • On windows Path.names("D:/gis/cadaster/district.geopkg") starts with a marker to indicate an absolute path, resulting in D:/, gis, cadaster, district.geopkg.

are you suggesting should be allowed or not allowed to access a resource from the resource store?

I am suggesting paths that return true for this method be not allowed to access resources from the resource store.

Yes, “paths” are used across the application. But “path” is a vague term, ‘xpaths’ are paths too, they have no relationship with file paths. Resource paths are no different, they live in their own world and are not part of the file system. The notion of a path being ‘absolute’ or ‘relative’ has no meaning when you are talking about paths in general, this is file system terminology.

Earlier in this email thread I shared the developer guide documentation which was written to define what Paths mean for GeoServer.

When someone gets a resource directly from the resource store, they are using the resource store as a black box, and the “path” is always relative to the root of the resource store.

Correct, relative paths are with respect to the resource store location. Actually when I did he QA I found that there was at one point a search path that could be scanned to allow “data directory” to be spread across several folders (terrifying).

Resource.fromURL is a different story, these are not hard-coded URL’s, they are user provided, and then we allow both the file system as the resource store to be accessed, where file: is for the file system and resource: is for the resource store.

I have not seen users use “resource:”, to my knowledge that was only used internally in the logic between in geoserver and geotools ( applied by a style visitor to help geotools resolve icons in the data directory.)

See the examples Resource.fromURL(base, url):

  • Resources.fromURL( null, "/src/gis/cadaster/district.geopgk") - absolute file path (linux)

  • Resources.fromURL( baseDirectory, "D:\\gis\\cadaster\\district.geopkg") - absolute file path (windows)

  • Resources.fromURL( baseDirectory, "file:///D:/gis/cadaster/district.geopkg") - absolute file url (windows)

  • Resources.fromURL( baseDirectory, "ftp://veftp.gsfc.nasa.gov/bluemarble/") - null (external reference)

See the examples Paths.convert(base, filename)

  • Paths.convert(base,file) - uses URI relativize method to determine relative path (between file and base)

  • Paths.convert(base,folder, fileLocation) - can resolve relative location, limited to content within the base directory

  • Paths.convert(base, filename)

See examples Files.fromURL( base, url)

  • Files.fromURL( null, "resource:styles/logo.svg") - internal url format restricted to data directory content

  • Files.fromURL( null, "/src/gis/cadaster/district.geopgk") - absolute file path (linux)

  • Files.fromURL( baseDirectory, "D:\\gis\\cadaster\\district.geopkg") - absolute file path (windows)

  • Files.fromURL( baseDirectory, "file:///D:/gis/cadaster/district.geopkg") - absolute file url (windows)

  • Files.fromURL( baseDirectory, "ftp://veftp.gsfc.nasa.gov/bluemarble/") - null (external reference ignored as we cannot determine a file)

  • Files.fromURL( baseDirectory, "sde://user:pass@anonymised.com:port") - null (custom strings are ignored as we cannot determine a file)

Folks use Paths (in the sense defined above) here also; the is the motivation for fixing absolute paths on windows (as the definition had changed in Java 11) and where the inconsistency of use was found.

I just haven’t seen any valid argument why the original design is flawed.

I do not believe the origional design was flawed, it was not consistently applied in the codebase. Where a path is used

Kind Regards

Niels

Does it help that path and absolute path is defined strictly above?

Thanks,
Jody

On 02/08/2023 16:40, Jody Garnett wrote:

Niels:

We may be on the same page, you indicate ResoueceStore should never use absolute paths for the file system.

That is the decision I wish to make, and then we should go do the work to enforce it. How much work? I am not sure, hence the work party.

There is something subtle here though. You are focused on the ResourceStore API, and I am asking you to consider the application as a whole.

We do have paths in the application including absolute paths. The QA activity to check that these are handled correctly is the work.

Paths are not only for ResourceStore.

Jody

On Tue, Aug 1, 2023 at 11:55 PM Niels Charlier <niels@anonymised.com> wrote:

Hello Jody,

Unfortunately I am not really on board with your plan. I really don’t think we need to make a new branch and do a lot of work. This seems like a lot of unnecessary work. I think the original design is fine, and does not require breaking the code base. We just need to go back to how things were and perhaps fix a few cases where people have gone wrong since then.

  • Gabe and Niels are having trouble with the API changes made earlier this year

  • The definition of path was inconsistent for windows absolute paths

I disagree. The definition of ‘path’ for the resource store is not in any way related to the OS, because the resource store lives in a separate world from the file system. The resource store and the file system have no joint concept of ‘path’.

  • Early resource store work was not strict about use of paths, and when to use Resource and when to use File

I disagree. It was always very clear about this. There should obviously never be hard coded absolute paths for the file system. In general the code should basically always use resources. Only where users can use URLs they have access to both the file system and the resource store, this is always how it was intended to work, as can be read from the javadoc.

  • The email conversation is going in circles because something has to break

  • We have to accept that something that break

I disagree. I think nearly all of the code base is fine with how it was designed to work.

Regards

Niels

On 01/08/2023 19:31, Jody Garnett wrote:

Niel, Gabe:

Our email thread on this topic has explored the topic by now, I think action rather than more communication is needed next.

Can we setup a 1/2 day breakout work-party to get this activity underway (prior to the Bolsena code sprint and the next release cycle). Even a couple hours to get the topic started on a branch would be great.

I am proposing two times below, and am willing to wake up early to see this topic move along:

  1. August 8th 15 CEST / 11 BRT / 6 PDT
  2. August 15th: 15 CEST / 11 BRT / 6 PDT

Is this a good idea :slight_smile:


Jody Garnett


Jody Garnett

_______________________________________________
Geoserver-devel mailing list
[Geoserver-devel@lists.sourceforge.net](mailto:Geoserver-devel@anonymised.comsourceforge.net)
[https://lists.sourceforge.net/lists/listinfo/geoserver-devel](https://lists.sourceforge.net/lists/listinfo/geoserver-devel)

Niels:

The ResourceStore API should only use relative paths.

The test case is problematic and in error.

Removing leading slashes from resource store paths was already done.

···


Jody Garnett


Jody Garnett


Jody Garnett


Jody Garnett

Niels:

Thanks for the analysis I believe we are now working on the same problem: clarifying api design

org.geosever.platform.resource.Files and org.geoserver.platform.resource.Paths are not strictly a ResourceStore utility class. These classes helped the GeoServer codebase adjust to having ResourceStore be such a strict API.

Files and Paths utility classes work with absolute paths (because our code base does).
ResourceStore does not, and I believe we should take steps to remove the test case that assumes it is willing to. And take steps to provide a warning if an linux absolute path is used (and the ResourceStore will assume that such a path is intended to be relative to the data directory). I have found several cases where such paths were used (notably in the rest Resource Controller) which have been addressed.

···


Jody Garnett


Jody Garnett


Jody Garnett


Jody Garnett

Jody,

My apologies if I sound or have sounded agitated in this discussion. I do hope we find a solution.

Thanks for the analysis I believe we are now working on the same problem: clarifying api design

Yes, I share your goal but we don't seem to find a way to agree? I have done a proposal for a possible compromise, but afterwards I found another problem (the Paths class). If you read the entire javadoc at the top, you must agree something doesn't make sense there.

org.geosever.platform.resource.Files and org.geoserver.platform.resource.Paths are not strictly a ResourceStore utility class. These classes helped the GeoServer codebase adjust to having ResourceStore be such a strict API.

org.geoserver.platform.resource.Files is indeed about handling Files. That would have been the right place to put OS & file system related stuff. Not Paths, the original javadoc (everything apart from the sentence you added) clearly states that this is resourcestore path utility class, and that relative paths are not supported (by the resource store that is).

Your sentence about operating systems and absolute paths on the file system seems entirely out of place there.

Files and Paths utility classes work with absolute paths (because our code base does).
ResourceStore does not, and I believe we should take steps to remove the test case that assumes it is willing to.

I agree about removing the test case, but I feel you are still missing the point.

Relative/absolute have different meaning when you are talking about resource store versus file system, because they have a different root. As said, the resource store in fact *only* supports absolute paths, absolute for its own root that is. Windows paths have nothing to do with it. The resource store API should be considered as entirely separate from the file system, that is just one implementation.

Perhaps the confusion is over the "path" that is used to retrieve resources from the resource store versus the "path" that the file system resource store specifically uses to access the files behind the resources.

And take steps to provide a warning if an linux absolute path is used (and the ResourceStore will assume that such a path is intended to be relative to the data directory). I have found several cases where such paths were used (notably in the rest Resource Controller) which have been addressed.

But how so? The resource controller is for the resource store, it should not have access to the file system? That would be a security issue too.

I think what probably should have happened, is that was made sure the file system resource store specifically never uses absolute files, only files relative to the data directory (but again, just one implementation).

Warning instead of an exception is perhaps a good idea. But which classes / methods do you suggest would provide this warning?

Kind Regards

Niels