[Geoserver-devel] (GEOS-10705) Loading missing resources creates unsolicited empty files

Hi dev list,

I frequently noticed, that Resource.in() just creates non existing files and then returns an empty input stream. This almost ever causes further problems like https://github.com/geoserver/geoserver/pull/6216.

I think the root cause is the implementation of Resource.in() for UNDEFINED resources.
Looking at Files.ResourceAdaptor.in() I find:

final File actualFile = file();
if (!actualFile.exists()) {
throw new IllegalStateException("Cannot access " + actualFile);
}

Seems, the initial intention was to throw an exception, but since file() creates the missing file, it does not work as intended. I analyzed the former GSIP 106, but I did not find any hints about the expected behavior here.

This can be fixed easily and in think an Exception about a missing file is much more expressive than a message from a failed content parser.

Most use cases check for Type.RESOURCE in advance or have some elaborated exception handlers and are thus unaffected. Some usages however try to handle this condition later on by using inputStream.available() like EchoParametersDao.getEchoParameters(in) does and will fail then.

I’m working on a pull request to change the behavior this way, but this causes adaptions to other related areas. Even SpringResourceAdaptor is affected, since it relays on IOExcetion which have been wrapped into IllegalStateException (and for which I think an UncheckedIOException is the better choice).

Other solutions are:

return null (can be checked afterwards but not so nice)

return an empty InputStream (Java 11 even has some: InputStream.nullInputStream())

Do you think it is worth to propose a refactoring here or should I better keep hands off?

Dieter.

···

Privacy statements

YouTube | Twitter | LinkedIn

It is worth proposing a refactoring here.

The guideline during the api developement was to optimize for the most common patterns of File usage we found in the codebase; and utillity methods such as in() for common usage patterns. Where possible we provided method name compatibility so it would be easier to migrate code from File to Resource.

I was expecting far more input on this API when it was first introduced; your input is welcome now.

For the specific example you show:

  • in() should not accidentally create a file for UNDEFINED
  • returning an empty input stream would be okay - less for client code to handle. I understand that failed parsing expression is not so fun
  • throwing UncheckedIOException is okay - concerned it would lead to a bunch of boiler plate code try / catch being introduced

Since we have the whole codebase available it should be easy to access what this change would require and answer the concerns above.

Whatever change is made I ask that the javadocs be clearly updated.

···


Jody Garnett

The primary change (throwing an Exception for missing files) is quite easy, but I found lots of unit tests failing thereafter.

Some are quite easy to fix, others require some refactorings.

I’m a bit in struggle with EchoParametersDao and RulesDao. Those commonly use already opened InputStreams to read.

Unfortunately it unclear who has to close them again and if try/resource block can be used.

In the context of GEOS-9882 the InputStream was turned into a Supplier

https://github.com/geoserver/geoserver/commit/9a8e4c30#diff-4b4acded1754ed2027dfc585dcfe55d05348911be59e55624c70b65fb6a815bbR53

May be this was related to: https://github.com/pmd/pmd/issues/3235

I think the Resource must be forwarded down to the read function itself to keep the try/catch block as small as possible.

(Similar argument for OutputStream here)

I found, that the InputStream becomes necessary by TestSupport. doWork only, which works on File instead of Resource.

And here we find another try-resource block again. This shows, how fragile it is to pass an open stream.

Does it make sense to wrap the file using Files.asResource into a Resource here?

Anyway: is the URL returned by Class.getResource always a File if it comes from a JAR instead?

I think It’s worth to open a separate subtask to refactor this at first.

Dieter.

···

Von: Jody Garnett jody.garnett@anonymised.com
Gesendet: Freitag, 21. Oktober 2022 21:58
An: Dieter Stüken - con terra GmbH d.stueken@anonymised.com
Cc: GeoServer geoserver-devel@lists.sourceforge.net
Betreff: Re: [Geoserver-devel] (GEOS-10705) Loading missing resources creates unsolicited empty files

It is worth proposing a refactoring here.

The guideline during the api developement was to optimize for the most common patterns of File usage we found in the codebase; and utillity methods such as in() for common usage patterns. Where possible we provided method name compatibility so it would be easier to migrate code from File to Resource.

I was expecting far more input on this API when it was first introduced; your input is welcome now.

For the specific example you show:

  • in() should not accidentally create a file for UNDEFINED

  • returning an empty input stream would be okay - less for client code to handle. I understand that failed parsing expression is not so fun

  • throwing UncheckedIOException is okay - concerned it would lead to a bunch of boiler plate code try / catch being introduced

Since we have the whole codebase available it should be easy to access what this change would require and answer the concerns above.

Whatever change is made I ask that the javadocs be clearly updated.

Jody

On Fri, Oct 21, 2022 at 12:03 PM Dieter Stüken - con terra GmbH via Geoserver-devel <geoserver-devel@lists.sourceforge.net> wrote:

Hi dev list,

I frequently noticed, that Resource.in() just creates non existing files and then returns an empty input stream. This almost ever causes further problems like https://github.com/geoserver/geoserver/pull/6216.

I think the root cause is the implementation of Resource.in() for UNDEFINED resources.
Looking at Files.ResourceAdaptor.in() I find:

final File actualFile = file();
if (!actualFile.exists()) {
throw new IllegalStateException("Cannot access " + actualFile);
}

Seems, the initial intention was to throw an exception, but since file() creates the missing file, it does not work as intended. I analyzed the former GSIP 106, but I did not find any hints about the expected behavior here.

This can be fixed easily and in think an Exception about a missing file is much more expressive than a message from a failed content parser.

Most use cases check for Type.RESOURCE in advance or have some elaborated exception handlers and are thus unaffected. Some usages however try to handle this condition later on by using inputStream.available() like EchoParametersDao.getEchoParameters(in) does and will fail then.

I’m working on a pull request to change the behavior this way, but this causes adaptions to other related areas. Even SpringResourceAdaptor is affected, since it relays on IOExcetion which have been wrapped into IllegalStateException (and for which I think an UncheckedIOException is the better choice).

Other solutions are:

return null (can be checked afterwards but not so nice)

return an empty InputStream (Java 11 even has some: InputStream.nullInputStream())

Do you think it is worth to propose a refactoring here or should I better keep hands off?

Dieter.

Dieter Stüken

Software Engineer

Nature Environment and Resources

T +49 251 59689 403

d.stueken@anonymised.com

con terra GmbH

Martin-Luther-King-Weg 20

48155 Münster

conterra.de

Geschäftsführung: Karl Wiesmann, Uwe König

HRB 4149, Amtsgericht Münster

Privacy statements

YouTube | Twitter | LinkedIn


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Jody Garnett

Good discussion, feedback inline:

On Wed, 26 Oct 2022 at 03:13, Dieter Stüken - con terra GmbH via Geoserver-devel <geoserver-devel@anonymised.comt> wrote:

The primary change (throwing an Exception for missing files) is quite easy, but I found lots of unit tests failing thereafter.

Some are quite easy to fix, others require some refactorings.

So this is going to be a judgement call:

  • Is it more clear to have an exception (forcing client code to handle) or runtime exception?
  • Is returning an empty inputstream going to be kinder / easier for code to manage?

I’m a bit in struggle with EchoParametersDao and RulesDao. Those commonly use already opened InputStreams to read.

Unfortunately it unclear who has to close them again and if try/resource block can be used.

In the context of GEOS-9882 the InputStream was turned into a Supplier

https://github.com/geoserver/geoserver/commit/9a8e4c30#diff-4b4acded1754ed2027dfc585dcfe55d05348911be59e55624c70b65fb6a815bbR53

May be this was related to: https://github.com/pmd/pmd/issues/3235

I think the Resource must be forwarded down to the read function itself to keep the try/catch block as small as possible.

(Similar argument for OutputStream here)

This is an example where returning an empty inputstream would smoothly allow the existing code to function. The resulting stream.available() would always return 0 producing the desired functionality.

However I like your approach of using Resource in the method api: RulesDao getRules() / RulesDao.saveOrUpdate() / deleteRules() …

I found, that the InputStream becomes necessary by TestSupport. doWork only, which works on File instead of Resource.

And here we find another try-resource block again. This shows, how fragile it is to pass an open stream.

That is a good one to fix; I was less concerned with the test cases using Files/

Does it make sense to wrap the file using Files.asResource into a Resource here?

Yes, it is why the adaptor was created, to help test API in isolation.

Anyway: is the URL returned by Class.getResource always a File if it comes from a JAR instead?

I think it is when working with resources from your own module; but when working with resources from a jar it is best to use the GeoTools TestSupport class (which can unpack jar resources into a tmp file if needed).

I think It’s worth to open a separate subtask to refactor this at first.

Agreed, thanks for digging into this Dieter it is a good maintenance activity that will benefit the codebase.

Jody

Dieter.

Von: Jody Garnett <jody.garnett@anonymised.com>
Gesendet: Freitag, 21. Oktober 2022 21:58
An: Dieter Stüken - con terra GmbH <d.stueken@anonymised.com>
Cc: GeoServer <geoserver-devel@lists.sourceforge.net>
Betreff: Re: [Geoserver-devel] (GEOS-10705) Loading missing resources creates unsolicited empty files

It is worth proposing a refactoring here.

The guideline during the api developement was to optimize for the most common patterns of File usage we found in the codebase; and utillity methods such as in() for common usage patterns. Where possible we provided method name compatibility so it would be easier to migrate code from File to Resource.

I was expecting far more input on this API when it was first introduced; your input is welcome now.

For the specific example you show:

  • in() should not accidentally create a file for UNDEFINED

  • returning an empty input stream would be okay - less for client code to handle. I understand that failed parsing expression is not so fun

  • throwing UncheckedIOException is okay - concerned it would lead to a bunch of boiler plate code try / catch being introduced

Since we have the whole codebase available it should be easy to access what this change would require and answer the concerns above.

Whatever change is made I ask that the javadocs be clearly updated.

Jody

On Fri, Oct 21, 2022 at 12:03 PM Dieter Stüken - con terra GmbH via Geoserver-devel <geoserver-devel@lists.sourceforge.net> wrote:

Hi dev list,

I frequently noticed, that Resource.in() just creates non existing files and then returns an empty input stream. This almost ever causes further problems like https://github.com/geoserver/geoserver/pull/6216.

I think the root cause is the implementation of Resource.in() for UNDEFINED resources.
Looking at Files.ResourceAdaptor.in() I find:

final File actualFile = file();
if (!actualFile.exists()) {
throw new IllegalStateException("Cannot access " + actualFile);
}

Seems, the initial intention was to throw an exception, but since file() creates the missing file, it does not work as intended. I analyzed the former GSIP 106, but I did not find any hints about the expected behavior here.

This can be fixed easily and in think an Exception about a missing file is much more expressive than a message from a failed content parser.

Most use cases check for Type.RESOURCE in advance or have some elaborated exception handlers and are thus unaffected. Some usages however try to handle this condition later on by using inputStream.available() like EchoParametersDao.getEchoParameters(in) does and will fail then.

I’m working on a pull request to change the behavior this way, but this causes adaptions to other related areas. Even SpringResourceAdaptor is affected, since it relays on IOExcetion which have been wrapped into IllegalStateException (and for which I think an UncheckedIOException is the better choice).

Other solutions are:

return null (can be checked afterwards but not so nice)

return an empty InputStream (Java 11 even has some: InputStream.nullInputStream())

Do you think it is worth to propose a refactoring here or should I better keep hands off?

Dieter.

Dieter Stüken

Software Engineer

Nature Environment and Resources

T +49 251 59689 403

d.stueken@anonymised.com…514…

con terra GmbH

Martin-Luther-King-Weg 20

48155 Münster

conterra.de

Geschäftsführung: Karl Wiesmann, Uwe König

HRB 4149, Amtsgericht Münster

Privacy statements

YouTube | Twitter | LinkedIn


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Jody Garnett


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Throwing an Exception seems to be the better solution, I think.

So you can differ between an empty file and a not existing file.

I discovered some possible bug at:

https://github.com/geoserver/geoserver/blob/main/src/main/src/main/java/org/geoserver/config/GeoServerPropertyConfigurer.java#L104

Here an IOException is expected if the file is missing.

Currently an empty file is silently created and I think this was not intended.

If an IllegalStateException is thrown, it does not work well, too.

This is the reason for [GEOS-10724].

I just try to find a unit test for GeoServerPropertyConfigurer.loadProperties with missing files.

The existence can also be checked before without depending on an IOException.

Should empty config files also be replaced here?

Dieter.

···

Von: Jody Garnett jody.garnett@anonymised.com
Gesendet: Donnerstag, 27. Oktober 2022 01:22
An: Dieter Stüken - con terra GmbH d.stueken@anonymised.com
Cc: geoserver-devel@lists.sourceforge.net
Betreff: Re: [Geoserver-devel] (GEOS-10705) Loading missing resources creates unsolicited empty files

Good discussion, feedback inline:

On Wed, 26 Oct 2022 at 03:13, Dieter Stüken - con terra GmbH via Geoserver-devel <geoserver-devel@lists.sourceforge.net> wrote:

The primary change (throwing an Exception for missing files) is quite easy, but I found lots of unit tests failing thereafter.

Some are quite easy to fix, others require some refactorings.

So this is going to be a judgement call:

  • Is it more clear to have an exception (forcing client code to handle) or runtime exception?

  • Is returning an empty inputstream going to be kinder / easier for code to manage?

I’m a bit in struggle with EchoParametersDao and RulesDao. Those commonly use already opened InputStreams to read.

Unfortunately it unclear who has to close them again and if try/resource block can be used.

In the context of GEOS-9882 the InputStream was turned into a Supplier

https://github.com/geoserver/geoserver/commit/9a8e4c30#diff-4b4acded1754ed2027dfc585dcfe55d05348911be59e55624c70b65fb6a815bbR53

May be this was related to: https://github.com/pmd/pmd/issues/3235

I think the Resource must be forwarded down to the read function itself to keep the try/catch block as small as possible.

(Similar argument for OutputStream here)

This is an example where returning an empty inputstream would smoothly allow the existing code to function. The resulting stream.available() would always return 0 producing the desired functionality.

However I like your approach of using Resource in the method api: RulesDao getRules() / RulesDao.saveOrUpdate() / deleteRules() …

I found, that the InputStream becomes necessary by TestSupport. doWork only, which works on File instead of Resource.

And here we find another try-resource block again. This shows, how fragile it is to pass an open stream.

That is a good one to fix; I was less concerned with the test cases using Files/

Does it make sense to wrap the file using Files.asResource into a Resource here?

Yes, it is why the adaptor was created, to help test API in isolation.

Anyway: is the URL returned by Class.getResource always a File if it comes from a JAR instead?

I think it is when working with resources from your own module; but when working with resources from a jar it is best to use the GeoTools TestSupport class (which can unpack jar resources into a tmp file if needed).

I think It’s worth to open a separate subtask to refactor this at first.

Agreed, thanks for digging into this Dieter it is a good maintenance activity that will benefit the codebase.

Jody

Dieter.

Von: Jody Garnett <jody.garnett@anonymised.com>
Gesendet: Freitag, 21. Oktober 2022 21:58
An: Dieter Stüken - con terra GmbH <d.stueken@anonymised.com>
Cc: GeoServer <geoserver-devel@lists.sourceforge.net>
Betreff: Re: [Geoserver-devel] (GEOS-10705) Loading missing resources creates unsolicited empty files

It is worth proposing a refactoring here.

The guideline during the api developement was to optimize for the most common patterns of File usage we found in the codebase; and utillity methods such as in() for common usage patterns. Where possible we provided method name compatibility so it would be easier to migrate code from File to Resource.

I was expecting far more input on this API when it was first introduced; your input is welcome now.

For the specific example you show:

  • in() should not accidentally create a file for UNDEFINED

  • returning an empty input stream would be okay - less for client code to handle. I understand that failed parsing expression is not so fun

  • throwing UncheckedIOException is okay - concerned it would lead to a bunch of boiler plate code try / catch being introduced

Since we have the whole codebase available it should be easy to access what this change would require and answer the concerns above.

Whatever change is made I ask that the javadocs be clearly updated.

Jody

On Fri, Oct 21, 2022 at 12:03 PM Dieter Stüken - con terra GmbH via Geoserver-devel <geoserver-devel@lists.sourceforge.net> wrote:

Hi dev list,

I frequently noticed, that Resource.in() just creates non existing files and then returns an empty input stream. This almost ever causes further problems like https://github.com/geoserver/geoserver/pull/6216.

I think the root cause is the implementation of Resource.in() for UNDEFINED resources.
Looking at Files.ResourceAdaptor.in() I find:

final File actualFile = file();
if (!actualFile.exists()) {
throw new IllegalStateException("Cannot access " + actualFile);
}

Seems, the initial intention was to throw an exception, but since file() creates the missing file, it does not work as intended. I analyzed the former GSIP 106, but I did not find any hints about the expected behavior here.

This can be fixed easily and in think an Exception about a missing file is much more expressive than a message from a failed content parser.

Most use cases check for Type.RESOURCE in advance or have some elaborated exception handlers and are thus unaffected. Some usages however try to handle this condition later on by using inputStream.available() like EchoParametersDao.getEchoParameters(in) does and will fail then.

I’m working on a pull request to change the behavior this way, but this causes adaptions to other related areas. Even SpringResourceAdaptor is affected, since it relays on IOExcetion which have been wrapped into IllegalStateException (and for which I think an UncheckedIOException is the better choice).

Other solutions are:

return null (can be checked afterwards but not so nice)

return an empty InputStream (Java 11 even has some: InputStream.nullInputStream())

Do you think it is worth to propose a refactoring here or should I better keep hands off?

Dieter.

Dieter Stüken

Software Engineer

Nature Environment and Resources

T +49 251 59689 403

d.stueken@anonymised.com

con terra GmbH

Martin-Luther-King-Weg 20

48155 Münster

conterra.de

Geschäftsführung: Karl Wiesmann, Uwe König

HRB 4149, Amtsgericht Münster

Privacy statements

YouTube | Twitter | LinkedIn


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Jody Garnett


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Yes it looks like an empty file should be created; the comments added to the top of the file probably form some instructions for use / customization.

···


Jody Garnett

Hi Jody,

after the pull requests for [GEOS-10723] [GEOS-10724] and [GEOS-10743] accepted, we may also close the issues, doo.

I went back to [10705] again and started another PR #6570.

The more I dive into the Resource API the more questions arise.

After reading the GSIPs about it, I think I should probably have made a separate GSIP from it.

Solving the Resource.in() problem, I noticed, that calling Resorce.file() raises similar problems.

Probably we need a hint whether the intention is to read the file or to create the file.

But this really breaks the current API.

I also just detected the jdbcstore community module. I’m currently about to understand it….

Then I think about the IllegalStateException thrown.

This should possibly be an UncheckedIOException (since 1.8) which definitely covers the underlying FileNotFound Exception.

Don’t see if this concerns any error handling.

And then the idea of using Optional or Consumer to hide the sensitive parts of dealing with the Inputstream.

This may also improve many of the deeply nested error handlers.

Regards, Dieter.

···

Von: Jody Garnett jody.garnett@anonymised.com
Gesendet: Freitag, 21. Oktober 2022 21:58
An: Dieter Stüken - con terra GmbH d.stueken@anonymised.com
Cc: GeoServer geoserver-devel@lists.sourceforge.net
Betreff: Re: [Geoserver-devel] (GEOS-10705) Loading missing resources creates unsolicited empty files

It is worth proposing a refactoring here.

The guideline during the api developement was to optimize for the most common patterns of File usage we found in the codebase; and utillity methods such as in() for common usage patterns. Where possible we provided method name compatibility so it would be easier to migrate code from File to Resource.

I was expecting far more input on this API when it was first introduced; your input is welcome now.

For the specific example you show:

  • in() should not accidentally create a file for UNDEFINED

  • returning an empty input stream would be okay - less for client code to handle. I understand that failed parsing expression is not so fun

  • throwing UncheckedIOException is okay - concerned it would lead to a bunch of boiler plate code try / catch being introduced

Since we have the whole codebase available it should be easy to access what this change would require and answer the concerns above.

Whatever change is made I ask that the javadocs be clearly updated.

Jody

On Fri, Oct 21, 2022 at 12:03 PM Dieter Stüken - con terra GmbH via Geoserver-devel <geoserver-devel@lists.sourceforge.net> wrote:

Hi dev list,

I frequently noticed, that Resource.in() just creates non existing files and then returns an empty input stream. This almost ever causes further problems like https://github.com/geoserver/geoserver/pull/6216.

I think the root cause is the implementation of Resource.in() for UNDEFINED resources.
Looking at Files.ResourceAdaptor.in() I find:

final File actualFile = file();
if (!actualFile.exists()) {
throw new IllegalStateException("Cannot access " + actualFile);
}

Seems, the initial intention was to throw an exception, but since file() creates the missing file, it does not work as intended. I analyzed the former GSIP 106, but I did not find any hints about the expected behavior here.

This can be fixed easily and in think an Exception about a missing file is much more expressive than a message from a failed content parser.

Most use cases check for Type.RESOURCE in advance or have some elaborated exception handlers and are thus unaffected. Some usages however try to handle this condition later on by using inputStream.available() like EchoParametersDao.getEchoParameters(in) does and will fail then.

I’m working on a pull request to change the behavior this way, but this causes adaptions to other related areas. Even SpringResourceAdaptor is affected, since it relays on IOExcetion which have been wrapped into IllegalStateException (and for which I think an UncheckedIOException is the better choice).

Other solutions are:

return null (can be checked afterwards but not so nice)

return an empty InputStream (Java 11 even has some: InputStream.nullInputStream())

Do you think it is worth to propose a refactoring here or should I better keep hands off?

Dieter.

Dieter Stüken

Software Engineer

Nature Environment and Resources

T +49 251 59689 403

d.stueken@anonymised.com

con terra GmbH

Martin-Luther-King-Weg 20

48155 Münster

conterra.de

Geschäftsführung: Karl Wiesmann, Uwe König

HRB 4149, Amtsgericht Münster

Privacy statements

YouTube | Twitter | LinkedIn


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Jody Garnett

Thanks!

Dieter you have prepared a number of fixes now; may I add you to the correct jira group so I can credit you as the person doing the fix?
This also let’s you indicate when you are working on an issue (ie in progress).

···


Jody Garnett

Dieter:

For Resource.file() we are stuck, the API allows this use in order to create a file (and any directories required for that file location) in order to have a file to write out to.

···


Jody Garnett

Yes, exactly.

My IDE finds 163 usages. Most of them are clearly read attempts.

What is missing is some hint about the intention (read or write).

Currently we have no change, since the API promises to create the file.

But I think this is misleading. I begin to understand the background of Resource related to JDBCResource.

In this case a file is created on demand from a database resource.

But the relevant part is creating any missing directory structure, I think.

Is there any description about what JDBCResource exactly does?

If a Resource on the database was found, creating the associated file seems OK.

If not, does it make sense to create an empty file in advance?

If the file Is written using out(), is it finally written back to the database on close()?

A possible refactoring process might be to introduce a method like file(boolean create).

At first we must define file() → file(true), to retain the API.

But we may change all those obvious read attempts to use file(false) in a first step.

Later on we can analyze the remaining calls.

I observed, that FileSystemResource and ResourceAdaptor have a lot of duplicate code in common.

Is this intended? ResourceAdaptor got some improvements which are not propagated to FileSystemResource.

I may have some suggestions to refactor the duplicate code…

I also try to understand the lock() system.

I read something about previous problems and I get the feeling there is still a race condition.

But I have to analyze this a bit deeper….

And some suggestion: I like to replace the IllegalStateException by UncheckedIOException (since 1.8).

I hope I’m not too annoying with all my overwhelming visions :blush:

Regards, Dieter.

···

Von: Jody Garnett <jody.garnett@…403…>
Gesendet: Samstag, 4. Februar 2023 09:17
An: Dieter Stüken - con terra GmbH <d.stueken@…514…>
Cc: GeoServer geoserver-devel@lists.sourceforge.net
Betreff: Re: [Geoserver-devel] (GEOS-10705) Loading missing resources creates unsolicited empty files

Dieter:

For Resource.file() we are stuck, the API allows this use in order to create a file (and any directories required for that file location) in order to have a file to write out to.

Jody Garnett

On Fri, Feb 3, 2023 at 6:24 PM Dieter Stüken - con terra GmbH <d.stueken@…514…> wrote:

Hi Jody,

after the pull requests for [GEOS-10723] [GEOS-10724] and [GEOS-10743] accepted, we may also close the issues, doo.

I went back to [10705] again and started another PR #6570.

The more I dive into the Resource API the more questions arise.

After reading the GSIPs about it, I think I should probably have made a separate GSIP from it.

Solving the Resource.in() problem, I noticed, that calling Resorce.file() raises similar problems.

Probably we need a hint whether the intention is to read the file or to create the file.

But this really breaks the current API.

I also just detected the jdbcstore community module. I’m currently about to understand it….

Then I think about the IllegalStateException thrown.

This should possibly be an UncheckedIOException (since 1.8) which definitely covers the underlying FileNotFound Exception.

Don’t see if this concerns any error handling.

And then the idea of using Optional or Consumer to hide the sensitive parts of dealing with the Inputstream.

This may also improve many of the deeply nested error handlers.

Regards, Dieter.

Von: Jody Garnett <jody.garnett@…403…>
Gesendet: Freitag, 21. Oktober 2022 21:58
An: Dieter Stüken - con terra GmbH <d.stueken@…514…>
Cc: GeoServer <geoserver-devel@lists.sourceforge.net>
Betreff: Re: [Geoserver-devel] (GEOS-10705) Loading missing resources creates unsolicited empty files

It is worth proposing a refactoring here.

The guideline during the api developement was to optimize for the most common patterns of File usage we found in the codebase; and utillity methods such as in() for common usage patterns. Where possible we provided method name compatibility so it would be easier to migrate code from File to Resource.

I was expecting far more input on this API when it was first introduced; your input is welcome now.

For the specific example you show:

  • in() should not accidentally create a file for UNDEFINED

  • returning an empty input stream would be okay - less for client code to handle. I understand that failed parsing expression is not so fun

  • throwing UncheckedIOException is okay - concerned it would lead to a bunch of boiler plate code try / catch being introduced

Since we have the whole codebase available it should be easy to access what this change would require and answer the concerns above.

Whatever change is made I ask that the javadocs be clearly updated.

Jody

On Fri, Oct 21, 2022 at 12:03 PM Dieter Stüken - con terra GmbH via Geoserver-devel <geoserver-devel@lists.sourceforge.net> wrote:

Hi dev list,

I frequently noticed, that Resource.in() just creates non existing files and then returns an empty input stream. This almost ever causes further problems like https://github.com/geoserver/geoserver/pull/6216.

I think the root cause is the implementation of Resource.in() for UNDEFINED resources.
Looking at Files.ResourceAdaptor.in() I find:

final File actualFile = file();
if (!actualFile.exists()) {
throw new IllegalStateException("Cannot access " + actualFile);
}

Seems, the initial intention was to throw an exception, but since file() creates the missing file, it does not work as intended. I analyzed the former GSIP 106, but I did not find any hints about the expected behavior here.

This can be fixed easily and in think an Exception about a missing file is much more expressive than a message from a failed content parser.

Most use cases check for Type.RESOURCE in advance or have some elaborated exception handlers and are thus unaffected. Some usages however try to handle this condition later on by using inputStream.available() like EchoParametersDao.getEchoParameters(in) does and will fail then.

I’m working on a pull request to change the behavior this way, but this causes adaptions to other related areas. Even SpringResourceAdaptor is affected, since it relays on IOExcetion which have been wrapped into IllegalStateException (and for which I think an UncheckedIOException is the better choice).

Other solutions are:

return null (can be checked afterwards but not so nice)

return an empty InputStream (Java 11 even has some: InputStream.nullInputStream())

Do you think it is worth to propose a refactoring here or should I better keep hands off?

Dieter.

Dieter Stüken

Software Engineer

Nature Environment and Resources

T +49 251 59689 403

d.stueken@…514…

con terra GmbH

Martin-Luther-King-Weg 20

48155 Münster

conterra.de

Geschäftsführung: Karl Wiesmann, Uwe König

HRB 4149, Amtsgericht Münster

Privacy statements

YouTube | Twitter | LinkedIn


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Jody Garnett

Dieter:

First up you are absolutely not “annoying” with all your visions. Indeed I am quite happy to have such a conversation and feel it it excellent to have such discussion with a) the initial migration completed so we can see how the API is used b) someone (you) passionate / interested / caring.

You may wish to try and attend one of the bi-weekly meetings for a more interactive discussion.

I will try and respond to some of your ideas “inline” as part of this reply.

My IDE finds 163 usages. Most of them are clearly read attempts.

What is missing is some hint about the intention (read or write).

Indeed if these are clearly read attempts then it would be good if they used in() or checked if the resource is a FILE before attempting a read.

We have the file() method for when we work with systems like freemarker templates which only understand a java File reference. Any GeoServer code should be possible to refactor to use in().

Currently we have no change, since the API promises to create the file.

But I think this is misleading. I begin to understand the background of Resource related to JDBCResource.

In this case a file is created on demand from a database resource.

But the relevant part is creating any missing directory structure, I think.

There are several relevant parts, creating the missing directory structure and creating the file avoid lots of code having to make such work.

Is there any description about what JDBCResource exactly does?

It should be an implementation of the Resource API backed by JDBC Blobs rather than files. In the case where a file is needed (for example freemarker template) the file contents are “unpacked” from the database blob into a file on disk for use. Indeed in a cluster each geoserver blob would unpack such a file for use.

If a Resource on the database was found, creating the associated file seems OK.

If not, does it make sense to create an empty file in advance?

If the file is being created to write to then indeed it should be created.

If the file Is written using out(), is it finally written back to the database on close()?

Yes.

A possible refactoring process might be to introduce a method like file(boolean create).

I am not sure this is needed. We do have some utility methods for common cases that could be used. I wish to have the minimal number of methods for resource store api implementations to manage.

At first we must define file() → file(true), to retain the API.

But we may change all those obvious read attempts to use file(false) in a first step.

Later on we can analyze the remaining calls.

Could we make a summary of how many can be refactored to use out(), and make the rest use:

  • Resources.find(resource)

  • Resources.find(resource, boolean)

Which behave in the manner you are proposing.

I observed, that FileSystemResource and ResourceAdaptor have a lot of duplicate code in common.

Is this intended?

ResourceAdaptor is intended to be quite independent, used for test cases and so forth.

ResourceAdaptor got some improvements which are not propagated to FileSystemResource.

I may have some suggestions to refactor the duplicate code…

Adding some package visible methods to Files class would be an appropriate location.

I also try to understand the lock() system.

I am curious what you will find.

Another thing: If you are brave, the first thing you will notice is a terrible simulation of the Java file watching system. We were using an older version of Java that did not yet have WatchService when the API was implemented. What is there now was designed to be thrown away and replaced with WatchService when we upgraded to Java 7 :slight_smile: The style of events is similar.

I read something about previous problems and I get the feeling there is still a race condition.

But I have to analyze this a bit deeper….

And some suggestion: I like to replace the IllegalStateException by UncheckedIOException (since 1.8).

I think that will be fine; it will not be so confusing for JDBC implementation.


Jody

Hi Jody,

I get the impression I just scratched the tip of the iceberg :thinking:

Resource.dir() creates a directory or throws an error.

Resource.file() creates a file from an external resource if it exists.

For plain filesystem resources however an empty file is created (if missing), but I still think this is questionable.

Using out() creates the necessary directory and the file.

There are very few places, where file() is used as output file to be opened directly, where the parent directory is expected to exist.

In general I think exposing file() and in() or out() must be avoided as much as possible.

Many pattern are like: Resources.copy(input.file(), output) or Resources.copy(input.in(), output).

Those might be replaced by Resource.copy(Resource in, Resource out), but there arises a problem:

While copy(File, Resource) assumes its output resource to be a directory and creates an output file with the same name,

Resource.copy(InputStream, Resource) takes the output resource as file resource directly.

OK, we have Resource.copy(InputStream, Resource, name) to use a target directory instead.

Resource.copy(Resource, Resource) claims to support directories, but only for input.

Currently there is no easy way, to copy a file resource into a target directory.

It might be possible, to check if the output resource is a directory, but this is still ambiguous if both are directories.

May be we simply need an extra method copyInto(Resource any, Resource directory) to make it explicit.

This way about 65 usages of r.file() and r.in() can be avoided (there is soooo much boilerplate code to find).

You may also have a look at my Pull Request #6570 which addresses a similar problem with XStreamPersister.load().

Concerning the Resource.Lock:

On FileSystemResource.in() the lock is created in advance and I worried about what happens on error.

But I did not see the final catch (FileNotFoundException e), where the lock is released, again.

But what happens on any other exception?

Why not create the lock within the FileInputStream constructor AFTER calling super(file) succeeds.

And I frequently see: finally() { lock.release(); }

Why not Lock extends Closeable?

Dieter.

···

Von: Jody Garnett <jody.garnett@…403…>
Gesendet: Samstag, 4. Februar 2023 17:11
An: Dieter Stüken - con terra GmbH <d.stueken@…514…>
Cc: GeoServer geoserver-devel@lists.sourceforge.net
Betreff: Re: [Geoserver-devel] (GEOS-10705) Loading missing resources creates unsolicited empty files

Dieter:

First up you are absolutely not “annoying” with all your visions. Indeed I am quite happy to have such a conversation and feel it it excellent to have such discussion with a) the initial migration completed so we can see how the API is used b) someone (you) passionate / interested / caring.

You may wish to try and attend one of the bi-weekly meetings for a more interactive discussion.

I will try and respond to some of your ideas “inline” as part of this reply.

My IDE finds 163 usages. Most of them are clearly read attempts.

What is missing is some hint about the intention (read or write).

Indeed if these are clearly read attempts then it would be good if they used in() or checked if the resource is a FILE before attempting a read.

We have the file() method for when we work with systems like freemarker templates which only understand a java File reference. Any GeoServer code should be possible to refactor to use in().

Currently we have no change, since the API promises to create the file.

But I think this is misleading. I begin to understand the background of Resource related to JDBCResource.

In this case a file is created on demand from a database resource.

But the relevant part is creating any missing directory structure, I think.

There are several relevant parts, creating the missing directory structure and creating the file avoid lots of code having to make such work.

Is there any description about what JDBCResource exactly does?

It should be an implementation of the Resource API backed by JDBC Blobs rather than files. In the case where a file is needed (for example freemarker template) the file contents are “unpacked” from the database blob into a file on disk for use. Indeed in a cluster each geoserver blob would unpack such a file for use.

If a Resource on the database was found, creating the associated file seems OK.

If not, does it make sense to create an empty file in advance?

If the file is being created to write to then indeed it should be created.

If the file Is written using out(), is it finally written back to the database on close()?

Yes.

A possible refactoring process might be to introduce a method like file(boolean create).

I am not sure this is needed. We do have some utility methods for common cases that could be used. I wish to have the minimal number of methods for resource store api implementations to manage.

At first we must define file() → file(true), to retain the API.

But we may change all those obvious read attempts to use file(false) in a first step.

Later on we can analyze the remaining calls.

Could we make a summary of how many can be refactored to use out(), and make the rest use:

  • Resources.find(resource)

  • Resources.find(resource, boolean)

Which behave in the manner you are proposing.

I observed, that FileSystemResource and ResourceAdaptor have a lot of duplicate code in common.

Is this intended?

ResourceAdaptor is intended to be quite independent, used for test cases and so forth.

ResourceAdaptor got some improvements which are not propagated to FileSystemResource.

I may have some suggestions to refactor the duplicate code…

Adding some package visible methods to Files class would be an appropriate location.

I also try to understand the lock() system.

I am curious what you will find.

Another thing: If you are brave, the first thing you will notice is a terrible simulation of the Java file watching system. We were using an older version of Java that did not yet have WatchService when the API was implemented. What is there now was designed to be thrown away and replaced with WatchService when we upgraded to Java 7 :slight_smile: The style of events is similar.

I read something about previous problems and I get the feeling there is still a race condition.

But I have to analyze this a bit deeper….

And some suggestion: I like to replace the IllegalStateException by UncheckedIOException (since 1.8).

I think that will be fine; it will not be so confusing for JDBC implementation.

Jody

Hey Dieter:

Checking-in on this email thread - you are welcome to make a GSIP page now as it could help capture your understanding and improvements you have made thus far (and are considering). Indeed I recommend you do this as you are asking more and more interesting questions :slight_smile:

Do you wish me to review #6570 - or are you still thinking through options…

···


Jody Garnett

Hi Jody,

I think a GSIP will be helpful to review the Resource package in general.

For #6570 we have the option to accept the first commit (don’t create input files) only.

This throws an exception if the input file is missing and it is quite small.

Together with the other (already approved) fixes, it passes all unit tests now.

Then we may close GEOS-10705 with a hint to the GSIP.

The second commit proposes a change concerning XStreamPersister.

This needs some more reviews and possibly a discussion about using a BufferedInputStream in general.

But I get the impression, this is only a first example of the ideas for the new GSIP.

BTW. I tried to test all extension and community modules, too. But I get strange errors then.

Are there any GitHub-CI jobs running on those modules?

Regards, Dieter.

···

Von: Jody Garnett jody.garnett@anonymised.com
Gesendet: Mittwoch, 15. Februar 2023 18:17
An: Dieter Stüken - con terra GmbH d.stueken@anonymised.com
Cc: GeoServer geoserver-devel@lists.sourceforge.net
Betreff: Re: [Geoserver-devel] (GEOS-10705) Loading missing resources creates unsolicited empty files

Hey Dieter:

Checking-in on this email thread - you are welcome to make a GSIP page now as it could help capture your understanding and improvements you have made thus far (and are considering). Indeed I recommend you do this as you are asking more and more interesting questions :slight_smile:

Do you wish me to review #6570 - or are you still thinking through options…

Jody Garnett

On Fri, Feb 3, 2023 at 9:24 AM Dieter Stüken - con terra GmbH <d.stueken@anonymised.com> wrote:

Hi Jody,

after the pull requests for [GEOS-10723] [GEOS-10724] and [GEOS-10743] accepted, we may also close the issues, doo.

I went back to [10705] again and started another PR #6570.

The more I dive into the Resource API the more questions arise.

After reading the GSIPs about it, I think I should probably have made a separate GSIP from it.

Solving the Resource.in() problem, I noticed, that calling Resorce.file() raises similar problems.

Probably we need a hint whether the intention is to read the file or to create the file.

But this really breaks the current API.

I also just detected the jdbcstore community module. I’m currently about to understand it….

Then I think about the IllegalStateException thrown.

This should possibly be an UncheckedIOException (since 1.8) which definitely covers the underlying FileNotFound Exception.

Don’t see if this concerns any error handling.

And then the idea of using Optional or Consumer to hide the sensitive parts of dealing with the Inputstream.

This may also improve many of the deeply nested error handlers.

Regards, Dieter.

Von: Jody Garnett <jody.garnett@anonymised.com>
Gesendet: Freitag, 21. Oktober 2022 21:58
An: Dieter Stüken - con terra GmbH <d.stueken@anonymised.com>
Cc: GeoServer <geoserver-devel@lists.sourceforge.net>
Betreff: Re: [Geoserver-devel] (GEOS-10705) Loading missing resources creates unsolicited empty files

It is worth proposing a refactoring here.

The guideline during the api developement was to optimize for the most common patterns of File usage we found in the codebase; and utillity methods such as in() for common usage patterns. Where possible we provided method name compatibility so it would be easier to migrate code from File to Resource.

I was expecting far more input on this API when it was first introduced; your input is welcome now.

For the specific example you show:

  • in() should not accidentally create a file for UNDEFINED

  • returning an empty input stream would be okay - less for client code to handle. I understand that failed parsing expression is not so fun

  • throwing UncheckedIOException is okay - concerned it would lead to a bunch of boiler plate code try / catch being introduced

Since we have the whole codebase available it should be easy to access what this change would require and answer the concerns above.

Whatever change is made I ask that the javadocs be clearly updated.

Jody

On Fri, Oct 21, 2022 at 12:03 PM Dieter Stüken - con terra GmbH via Geoserver-devel <geoserver-devel@lists.sourceforge.net> wrote:

Hi dev list,

I frequently noticed, that Resource.in() just creates non existing files and then returns an empty input stream. This almost ever causes further problems like https://github.com/geoserver/geoserver/pull/6216.

I think the root cause is the implementation of Resource.in() for UNDEFINED resources.
Looking at Files.ResourceAdaptor.in() I find:

final File actualFile = file();
if (!actualFile.exists()) {
throw new IllegalStateException("Cannot access " + actualFile);
}

Seems, the initial intention was to throw an exception, but since file() creates the missing file, it does not work as intended. I analyzed the former GSIP 106, but I did not find any hints about the expected behavior here.

This can be fixed easily and in think an Exception about a missing file is much more expressive than a message from a failed content parser.

Most use cases check for Type.RESOURCE in advance or have some elaborated exception handlers and are thus unaffected. Some usages however try to handle this condition later on by using inputStream.available() like EchoParametersDao.getEchoParameters(in) does and will fail then.

I’m working on a pull request to change the behavior this way, but this causes adaptions to other related areas. Even SpringResourceAdaptor is affected, since it relays on IOExcetion which have been wrapped into IllegalStateException (and for which I think an UncheckedIOException is the better choice).

Other solutions are:

return null (can be checked afterwards but not so nice)

return an empty InputStream (Java 11 even has some: InputStream.nullInputStream())

Do you think it is worth to propose a refactoring here or should I better keep hands off?

Dieter.

Dieter Stüken

Software Engineer

Nature Environment and Resources

T +49 251 59689 403

d.stueken@anonymised.com

con terra GmbH

Martin-Luther-King-Weg 20

48155 Münster

conterra.de

Geschäftsführung: Karl Wiesmann, Uwe König

HRB 4149, Amtsgericht Münster

Privacy statements

YouTube | Twitter | LinkedIn


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Jody Garnett