[Geoserver-devel] Importer and grabbing files from external (remote) locations

Hi,
some time ago I asked feedback on this question, related to importing (eventually
big) rasters:

  1. The client indicates that one or more “golden” images need to be importer
    into GeoServer for WMS/WCS usage. The original files are big, and must
    not be touched, so ideally the client would just tell GeoServer where the
    files are (disk shares) and GeoServer should copy them autonomously

I proposed a few options, but got no feedback.

In the spirit of collaboration, here is what I’m going to do. The REST api
will recognize a new bit in “data” called “source”, which can be either
a plain string, or an array of strings, something like this:

“data”: {
“type”: “directory”,
“source”: “/mnt/share/myImage.tif”,
}

When this tag is found, the file (or files) pointed by source will be treated
as data that needs to be copied over to GeoServer, and the importer
will act just like as is it was an upload, it will grab those files, copy
them locally into a random subdirectory of $DATA_DIR/uploads,
and then move from there as if the file had gotten uploaded.

I am planning to use commons VFS just like it’s already done with
compressed files to perform the copy, in the hopes to support
more sources (like FTP), although I see we are depending on
an old version (1.0 instead of the current 2.0), and I’m not sure
what 1.0 supports, or how hard an upgrade to 2.0 would be.

The goal/scope for this set of changes is just to copy over from a share
to local file system, but I don’t want to corner the module on it.

Cheers
Andrea

···

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


Sorry Andrea, I was away … and do not have much to say about the importer REST API for fear of making it more complicated. I will check with Torben who has more recently looked at the importer internals and get back to you for any specific feedback.

In general I like the workflow of uploading the single original file and then processing it in place. It sounds much more sensible/efficient to do it this way then to take on the overhead of uploading multiple processed images.

The importer REST api already has a async workflow:

  • Allowing vector data to be processed into a database (think it even lets you poll status).
  • Allowing the SRS to be defined after the initial file(s) are uploaded.

I would hope the processing of the file would provide similar feedback?

···

On 13 May 2015 at 10:46, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
some time ago I asked feedback on this question, related to importing (eventually
big) rasters:

  1. The client indicates that one or more “golden” images need to be importer
    into GeoServer for WMS/WCS usage. The original files are big, and must
    not be touched, so ideally the client would just tell GeoServer where the
    files are (disk shares) and GeoServer should copy them autonomously

I proposed a few options, but got no feedback.

In the spirit of collaboration, here is what I’m going to do. The REST api
will recognize a new bit in “data” called “source”, which can be either
a plain string, or an array of strings, something like this:

“data”: {
“type”: “directory”,
“source”: “/mnt/share/myImage.tif”,
}

When this tag is found, the file (or files) pointed by source will be treated
as data that needs to be copied over to GeoServer, and the importer
will act just like as is it was an upload, it will grab those files, copy
them locally into a random subdirectory of $DATA_DIR/uploads,
and then move from there as if the file had gotten uploaded.

I am planning to use commons VFS just like it’s already done with
compressed files to perform the copy, in the hopes to support
more sources (like FTP), although I see we are depending on
an old version (1.0 instead of the current 2.0), and I’m not sure
what 1.0 supports, or how hard an upgrade to 2.0 would be.

The goal/scope for this set of changes is just to copy over from a share
to local file system, but I don’t want to corner the module on it.

Cheers
Andrea

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.



One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Jody Garnett

On Wed, May 13, 2015 at 8:06 PM, Jody Garnett <jody.garnett@anonymised.com>
wrote:

Sorry Andrea, I was away ... and do not have much to say about the
importer REST API for fear of making it more complicated. I will check with
Torben who has more recently looked at the importer internals and get back
to you for any specific feedback.

In general I like the workflow of uploading the single original file and
then processing it in place. It sounds much more sensible/efficient to do
it this way then to take on the overhead of uploading multiple processed
images.

Confused by your sentence here, where is multiple images coming into the
picture here (at least, it was not part of my original mail, maybe you're
extending the scope in a way I don't understand).

The importer REST api already has a async workflow:
- Allowing vector data to be processed into a database (think it even lets
you poll status).
- Allowing the SRS to be defined after the initial file(s) are uploaded.

I'm aware of it, but don't think it has anything to do with the use case
I'm trying to discuss?

I would hope the processing of the file would provide similar feedback?

As stated in my first mail (see the other thread), we cannot go for an HTTP
upload, the server has to grab the files on its own,
the files are too big and the speed difference between a HTTP transfer and
a simple shared filesystem copy is (rather) significant.

Also, if by VFS I can also get support for FTP (as said, out of scope, but
trying to leave the door open for it),
we could have a light js client point to the importer heavy files available
on a FTP, instead of making it grab them from FTP (which I don't know if
feasible to start with) and then HTTP upload
them to GeoServer.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Sorry that was my assumption:

  • sometimes an import applies to more than one file (say in a directory)
  • The result of processing may result in multiple files out of a single raster (say if tiling was used match the block size of the distributed file system).

I was more interested in the async workflow in case processing takes some time (regardless of where it is importing from).

···

On 13 May 2015 at 11:13, Andrea Aime <andrea.aime@anonymised.com> wrote:


Jody Garnett

On Wed, May 13, 2015 at 8:06 PM, Jody Garnett <jody.garnett@anonymised.com> wrote:

Sorry Andrea, I was away … and do not have much to say about the importer REST API for fear of making it more complicated. I will check with Torben who has more recently looked at the importer internals and get back to you for any specific feedback.

In general I like the workflow of uploading the single original file and then processing it in place. It sounds much more sensible/efficient to do it this way then to take on the overhead of uploading multiple processed images.

Confused by your sentence here, where is multiple images coming into the picture here (at least, it was not part of my original mail, maybe you’re extending the scope in a way I don’t understand).

The importer REST api already has a async workflow:

  • Allowing vector data to be processed into a database (think it even lets you poll status).
  • Allowing the SRS to be defined after the initial file(s) are uploaded.

I’m aware of it, but don’t think it has anything to do with the use case I’m trying to discuss?

I would hope the processing of the file would provide similar feedback?

As stated in my first mail (see the other thread), we cannot go for an HTTP upload, the server has to grab the files on its own,
the files are too big and the speed difference between a HTTP transfer and a simple shared filesystem copy is (rather) significant.

Also, if by VFS I can also get support for FTP (as said, out of scope, but trying to leave the door open for it),
we could have a light js client point to the importer heavy files available
on a FTP, instead of making it grab them from FTP (which I don’t know if feasible to start with) and then HTTP upload
them to GeoServer.

Cheers

Andrea

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


On Wed, May 13, 2015 at 9:51 PM, Jody Garnett <jody.garnett@anonymised.com>
wrote:

Sorry that was my assumption:
- sometimes an import applies to more than one file (say in a directory)
- The result of processing may result in multiple files out of a single
raster (say if tiling was used match the block size of the distributed file
system).

I was more interested in the async workflow in case processing takes some
time (regardless of where it is importing from).

Yep, and the importer uses asynch by default on the gui, and you can make
it go asynch in the REST case.

What's annoying, is that the REST setup workflow has the synch
upload/config part, because you post a request or a file
to the task list, and it will generate the tasks for you. Handy, but that
bit is synch, so it kind of stifles the idea of
an up-front setup and then have also the transfer work in a aysnch (which
is kind of mandatory for very large files).

In my use case that's not a killer, because over modern networks you can
transfer lots of data fast, but
if we think ftp, it will be limited to small-ish files or local networks,
because the post request that creates
the import is synch, and you cannot launch the import until you have the
ImportData and its task ready
(and only the run will be executed asynch).

An issue to be solved another day I guess (it will require some refactor of
both the core code, the REST module,
and the REST api workflow itself).

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

On Wed, May 13, 2015 at 9:58 PM, Andrea Aime <andrea.aime@anonymised.com>
wrote:

An issue to be solved another day I guess (it will require some refactor
of both the core code, the REST module,
and the REST api workflow itself).

Hum... maybe it does not have to be that way, maybe we can handle this
today.
Bear with me for a minute.

When creating the import context, we normally indicate the data to be
importer, and that results
in the creation of tasks (all of this, rest wise, is a synchronous http
request),
and then we can attach transformations on the tasks being created.

But say we need the importer to fetch the data, and that might take a lot
of time... we cannot rely
on a synch request, and we would like that to happen as the import runs
asynchronously.

So, what about a new type of ImportData, tentatively named FetchData (feel
free to suggest a better name),
which has a location attribute, where we should get the data from, and a
transformation attribute, the transformations
we would like to apply on the data to be imported (which will be either
raster or vector, but not both)
and that we assume are going to be uniform (not the 100% case, but pretty
common).

No tasks are created during the creation, when run happens, the importer
checks for FetchData,
if available, it will fetch it, create a File or Directory import data,
create the tasks, attach
the transformations, and then have everything run as normal.

This basically moves the expensive move operation down the line, in a place
that can be run
in async mode, and eventually retried. This would make it suitable for ftp
transfer, http fetching, and the like.

Does this work for you?

Cheers
Andrea

--

Meet us at the INSPIRE Conference in Lisbon 25-29 May 2015! Visit
http://goo.gl/WHKDXT for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

I like the idea of some sort of remote ImportData, but I feel like we would still be better off uploading/configuring it in the createContext step.

···

How about RemoteData, or OnlineData?

Rather than have this occur in “run”, I feel like this step would make more sense in “createContext”. I’m not sure about the REST API, but the Importer code does have a createContextAsync method, so this would still support asynchronous behavior.

If this option also gets pulled into the importer UI, it would make for a clearer user experience if you could see what is being imported before you import it

So I feel like this new type of ImportData is a good idea, but we should still try to upload it and create tasks in the createContext step. If the existing REST API does not support asynchronosly creating the context, then this is functionality that should be added, especially since there is already asynchronous support in the Importer code.

Torben

So, what about a new type of ImportData, tentatively named FetchData (feel free to suggest a better name),

No tasks are created during the creation, when run happens, the importer checks for FetchData,
if available, it will fetch it, create a File or Directory import data, create the tasks, attach
the transformations

On Thu, May 21, 2015 at 11:01 PM, Torben Barsballe <
tbarsballe@anonymised.com> wrote:

I like the idea of some sort of remote ImportData, but I feel like we
would still be better off uploading/configuring it in the createContext
step.

So, what about a new type of ImportData, tentatively named FetchData
(feel free to suggest a better name),

How about RemoteData, or OnlineData?

RemoteData, yes, that sounds good.

No tasks are created during the creation, when run happens, the importer
checks for FetchData,
if available, it will fetch it, create a File or Directory import data,
create the tasks, attach
the transformations

Rather than have this occur in "run", I feel like this step would make
more sense in "createContext". I'm not sure about the REST API, but the
Importer code does have a createContextAsync method, so this would still
support asynchronous behavior.
If this option also gets pulled into the importer UI, it would make for a
clearer user experience if you could see what is being imported before you
import it

Yes, I can go for that, extending the REST api in that direction does not
look too hard, and makes a lot of sense.
There is however a catch, in the rest api we need to return ... something.
Normally it would be a link to the import context just created, but as far
as I can see createContextAsync does
return a job id, not a context it. Jobs are not exposed in the REST api,
and I'm not sure about getting there,
for example, for asynch run we are still exposing the context, not its job.

So here is a counter-proposal in the same spirit as yours. We allow the
creation of the context, register it
in the store, but then have a initAsynch(...) that will only do the
asynchronous initialization.
This way the REST async creation can still return a link to a Context, but
one that will be in a new state,
State.INIT, so that REST clients will get to know the job is still
initializing, and any call to
run can be rejected.

Once init is complete, we can switch to pending and let the REST client do
its own thing with the tasks
and eventually run the import.

About the list of "default" transformations in RemoteData, Id' still
maintain it, if one knows the tasks
units being imported are uniform, or sort of uniform, it makes sense to
setup just once the transformations,
and then eventually work by difference if we know there are a few oddballs
that are different.

This actually also makes sense for other types of data, say we have for
example 1000 shapefiles to be
imported, all needing a reprojection, or something similar. So... what
about we add a "defaultRasterTransformations"
and "defaultVectorTransformations" to ImportData (the bast class for all
ImportData) and just attach
the tranforms while we create the tasks?

Cheers
Andrea

--

Meet us at the INSPIRE Conference in Lisbon 25-29 May 2015! Visit
http://goo.gl/WHKDXT for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Hi,
pull request to allow asynch retrieval of remote data here:
https://github.com/geoserver/geoserver/pull/1091

For the moment I left out the default transformations, but upgraded commons-vfs to 2.0
as I had troubles with a ftp server. Also, vfs 2.0 seems better compatible with our libs,
vfs 1.0 was depending on commons http client 2.0, vfs 2.0 on http client 3.1 instead

Cheers
Andrea

···

On Mon, May 25, 2015 at 8:14 PM, Torben Barsballe <tbarsballe@anonymised.com> wrote:

Comments inline:

On Mon, May 25, 2015 at 5:43 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

I was also running into a similar issue with the createContextAsync (and runAsync) only returning job ids. Creating and registering the context first seems like a good solution. One suggestion though is to make initAsync(…) do almost all of the heavy lifting of setting up the context (including loading the data) as well as the other asynchronous tasks. Especially with large databases, creating the context can take almost as much time as running the import.

So maybe something like registerContext(…) to get a context in State.INIT, having an Id and nothing else, then initAsync(…) to set up the context (or maybe instead of an initAsync(…) method simply extend createContextAsync to also accept an empty context?).

This seems like a good idea.

Torben

On Thu, May 21, 2015 at 11:01 PM, Torben Barsballe <tbarsballe@anonymised.com> wrote:

I like the idea of some sort of remote ImportData, but I feel like we would still be better off uploading/configuring it in the createContext step.

RemoteData, yes, that sounds good.

Yes, I can go for that, extending the REST api in that direction does not look too hard, and makes a lot of sense.
There is however a catch, in the rest api we need to return … something.
Normally it would be a link to the import context just created, but as far as I can see createContextAsync does
return a job id, not a context it. Jobs are not exposed in the REST api, and I’m not sure about getting there,
for example, for asynch run we are still exposing the context, not its job.

So here is a counter-proposal in the same spirit as yours. We allow the creation of the context, register it
in the store, but then have a initAsynch(…) that will only do the asynchronous initialization.
This way the REST async creation can still return a link to a Context, but one that will be in a new state,
State.INIT, so that REST clients will get to know the job is still initializing, and any call to
run can be rejected.

How about RemoteData, or OnlineData?

So, what about a new type of ImportData, tentatively named FetchData (feel free to suggest a better name),

Rather than have this occur in “run”, I feel like this step would make more sense in “createContext”. I’m not sure about the REST API, but the Importer code does have a createContextAsync method, so this would still support asynchronous behavior.

If this option also gets pulled into the importer UI, it would make for a clearer user experience if you could see what is being imported before you import it

No tasks are created during the creation, when run happens, the importer checks for FetchData,
if available, it will fetch it, create a File or Directory import data, create the tasks, attach
the transformations

Once init is complete, we can switch to pending and let the REST client do its own thing with the tasks
and eventually run the import.

About the list of “default” transformations in RemoteData, Id’ still maintain it, if one knows the tasks
units being imported are uniform, or sort of uniform, it makes sense to setup just once the transformations,
and then eventually work by difference if we know there are a few oddballs that are different.

This actually also makes sense for other types of data, say we have for example 1000 shapefiles to be
imported, all needing a reprojection, or something similar. So… what about we add a “defaultRasterTransformations”
and “defaultVectorTransformations” to ImportData (the bast class for all ImportData) and just attach
the tranforms while we create the tasks?

==
Meet us at the INSPIRE Conference in Lisbon 25-29 May 2015! Visit http://goo.gl/WHKDXT for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


On Wed, 27 May 2015 05:50:30 PM Andrea Aime wrote:

Hi,
pull request to allow asynch retrieval of remote data here:
https://github.com/geoserver/geoserver/pull/1091

I didn't do any real review, but I wonder if this should be restricted to
cases where the default password has been changed (or similar - I think we
have some logic for one of the script or REST APIs) to avoid this being used
to DOS another server.

Brad

On Thu, May 28, 2015 at 3:17 AM, Brad Hards <bradh@anonymised.com> wrote:

On Wed, 27 May 2015 05:50:30 PM Andrea Aime wrote:
> Hi,
> pull request to allow asynch retrieval of remote data here:
> https://github.com/geoserver/geoserver/pull/1091
I didn't do any real review, but I wonder if this should be restricted to
cases where the default password has been changed (or similar - I think we
have some logic for one of the script or REST APIs) to avoid this being
used
to DOS another server.

The importer access is limited to the administrator(s) only, do we need to
defend against the admins too? :slight_smile:

Cheers
Andrea

--

Meet us at the INSPIRE Conference in Lisbon 25-29 May 2015! Visit
http://goo.gl/WHKDXT for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------