[Geoserver-devel] Improving scalability of UI (and other subsystems) against very large catalogs

Hi,
we are looking at what makes GeoServer hard to use (or simply impossible to use)
when one has a very large amount of layers/styles/stores/workspaces configured,
and eventually stored in jdbcconfig.

Point in case, to give you an idea: 450 workspaces, 1000 stores, 100k+ layers

So the first thing that shows up is UI bits, the table pages, but the layer one,
are very slow: this is because Gabriel migrated the layer one so that filtering,
sorting and paging are done inside the catalog, but all others are not.
Migrating these is not too difficult, and seems to be rather mechanical, with
no visible consequences for the user (or else, one, the UI suddenly becomes
worth using :wink: )

Then there is a category of issues that are instead mixing togherer a full
listing of the catalog, coupled with the “wrong” UI: we dropdowns
or “right/left” choosers such as the style and security ones that end up
listing the name of all stores/layers/styles and whatnot.

In this case it’s the UI element itself that is not really scaling up, it was meant
for small lists.
For these cases I’m not totally sure what to do.
One possible approach is to create “smart” UI components, that would show
a DropDown if the list is short (less than 1000 items?) but would switch
to a textbox with autocomplete otherwise, which would start proposing autocomplete
values as the user types into it.
Or it could be a read only box, with link and a dialog to choose from, and have
a table in the dialog.
These at least are the approaches that are relatively easy to implement.
A combo box looking like a drop down, but editable and with the ability to
shrink the contents of the drop down as one types would probably be best,
but I’m not sure about how that would be done in Wicket.

Finally, there is some very nasty issue in WFS, if you have 100k layers the first

request will try to build a WFS schema containing all the schemas of all
the vector layers.
Do we have any indication on how to proceed, in order to make requests
work only with the feature types needed, instead of having to build a
Godzilla schema and having to drop it every time anything in the config
changes (feature type config, new ft, ft removed)

There are more issues, like CascadeDeleteVisitor doing linear scan,
or capabilities document generation grabbing a list instead of scrolling over
an iterator, that needs to be address, as well, but let’s say these are not
as visible as the issues above, you need some load and very large

caps documents to get into troubles with these, fortunately in our case
the layers are split among many workspaces, which make things
quite a bit more manageable.

Anyways… feedback, suggestions, very important bits that we migth
have missed?

Cheers
Andrea

···

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


Hi Andrea,

I’ve never used it with anything like those numbers of layers/stores/etc. But a few thoughts triggered from your post:

One possible approach is to create “smart” UI components, that would show

a DropDown if the list is short (less than 1000 items?) but would switch

to a textbox with autocomplete otherwise, which would start proposing autocomplete

values as the user types into it.

How about just add an “autocomplete” which is available for everyone to use, as well as the drop-down. And then disable the drop-down when there are > 1000 items.

As a bonus, the autocomplete could probably also be used for things like layer-searching.

if you have 100k layers the first

request will try to build a WFS schema containing all the schemas of all

the vector layers.

And I thought it was slow with just a couple of hundred WFS features! How about caching individual WFS schemas to disk and rebuilding only ones that have been changed? Seems pretty obvious so I guess there’s a reason that won’t work. J

HT(try and)H.

Cheers,

Jonathan

···

Hi,

we are looking at what makes GeoServer hard to use (or simply impossible to use)

when one has a very large amount of layers/styles/stores/workspaces configured,

and eventually stored in jdbcconfig.

Point in case, to give you an idea: 450 workspaces, 1000 stores, 100k+ layers

So the first thing that shows up is UI bits, the table pages, but the layer one,

are very slow: this is because Gabriel migrated the layer one so that filtering,

sorting and paging are done inside the catalog, but all others are not.

Migrating these is not too difficult, and seems to be rather mechanical, with

no visible consequences for the user (or else, one, the UI suddenly becomes

worth using :wink: )

Then there is a category of issues that are instead mixing togherer a full

listing of the catalog, coupled with the “wrong” UI: we dropdowns

or “right/left” choosers such as the style and security ones that end up

listing the name of all stores/layers/styles and whatnot.

In this case it’s the UI element itself that is not really scaling up, it was meant

for small lists.

For these cases I’m not totally sure what to do.

One possible approach is to create “smart” UI components, that would show

a DropDown if the list is short (less than 1000 items?) but would switch

to a textbox with autocomplete otherwise, which would start proposing autocomplete

values as the user types into it.

Or it could be a read only box, with link and a dialog to choose from, and have

a table in the dialog.

These at least are the approaches that are relatively easy to implement.

A combo box looking like a drop down, but editable and with the ability to

shrink the contents of the drop down as one types would probably be best,

but I’m not sure about how that would be done in Wicket.

Finally, there is some very nasty issue in WFS, if you have 100k layers the first

request will try to build a WFS schema containing all the schemas of all

the vector layers.

Do we have any indication on how to proceed, in order to make requests

work only with the feature types needed, instead of having to build a

Godzilla schema and having to drop it every time anything in the config

changes (feature type config, new ft, ft removed)

There are more issues, like CascadeDeleteVisitor doing linear scan,

or capabilities document generation grabbing a list instead of scrolling over

an iterator, that needs to be address, as well, but let’s say these are not

as visible as the issues above, you need some load and very large

caps documents to get into troubles with these, fortunately in our case

the layers are split among many workspaces, which make things

quite a bit more manageable.

Anyways… feedback, suggestions, very important bits that we migth

have missed?

Cheers

Andrea

==

GeoServer Professional Services from the experts! Visit

http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf

Technical Lead

GeoSolutions S.A.S.

Via Poggio alle Viti 1187

55054 Massarosa (LU)

Italy

phone: +39 0584 962313

fax: +39 0584 1660272

mob: +39 339 8844549

http://www.geo-solutions.it

http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


This message has been scanned for viruses by MailControl, a service from BlackSpider Technology

Click here to report this email as spam.

On Wed, Nov 12, 2014 at 4:29 PM, Jonathan Moules <J.Moules@anonymised.com

wrote:

Hi Andrea,

  I’ve never used it with anything like those numbers of
layers/stores/etc. But a few thoughts triggered from your post:

> One possible approach is to create "smart" UI components, that would
show

a DropDown if the list is short (less than 1000 items?) but would switch

to a textbox with autocomplete otherwise, which would start proposing
autocomplete

values as the user types into it.

How about just add an “autocomplete” which is available for everyone to
use, as well as the drop-down. And then disable the drop-down when there
are > 1000 items.

As a bonus, the autocomplete could probably also be used for things like
layer-searching.

Thank you for the feedback. Having both controls showing at the same time
seems a bit confusing... at least to me?

>if you have 100k layers the first

request will try to build a WFS schema containing all the schemas of all

the vector layers.

And I thought it was slow with just a couple of hundred WFS features! How
about caching individual WFS schemas to disk and rebuilding only ones that
have been changed? Seems pretty obvious so I guess there’s a reason that
won’t work. J

Jody reported during the biweekly skype meeting that some work has been
done in this area already in 2.6.x, so maybe it's not as bad as I thought.
I haven't tried it yet though, the large data dir I have is made of
thousands of raster files...

Cheers
Andrea
--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

On Mon, Nov 10, 2014 at 12:29 PM, Andrea Aime <andrea.aime@anonymised.com>
wrote:

Hi,
we are looking at what makes GeoServer hard to use (or simply impossible
to use)
when one has a very large amount of layers/styles/stores/workspaces
configured,
and eventually stored in jdbcconfig.

Point in case, to give you an idea: 450 workspaces, 1000 stores, 100k+
layers

So the first thing that shows up is UI bits, the table pages, but the
layer one,
are very slow: this is because Gabriel migrated the layer one so that
filtering,
sorting and paging are done inside the catalog, but all others are not.

That's correct. At the time it was meant to demo the perf benefit of using
the list catalog method and to
serve as reference to migrate the rest of the pages. Then it looks like
nobody including myself had
the time/energy/mandate to complete the work.

There's another showcase of the UI taking advantage of the new Catalog
methods that
might not be so obvious. The home page uses Catalog.count(...) for the
ws/store/layers links. Otherwise it
would take the time of loading the three sets into memory just to get the
list size.

Migrating these is not too difficult, and seems to be rather mechanical,
with
no visible consequences for the user (or else, one, the UI suddenly becomes
worth using :wink: )

Agreed.

Then there is a category of issues that are instead mixing togherer a full
listing of the catalog, coupled with the "wrong" UI: we dropdowns
or "right/left" choosers such as the style and security ones that end up
listing the name of all stores/layers/styles and whatnot.

In this case it's the UI element itself that is not really scaling up, it
was meant
for small lists.
For these cases I'm not totally sure what to do.
One possible approach is to create "smart" UI components, that would show
a DropDown if the list is short (less than 1000 items?) but would switch
to a textbox with autocomplete otherwise, which would start proposing
autocomplete
values as the user types into it.
Or it could be a read only box, with link and a dialog to choose from, and
have
a table in the dialog.
These at least are the approaches that are relatively easy to implement.
A combo box looking like a drop down, but editable and with the ability to
shrink the contents of the drop down as one types would probably be best,
but I'm not sure about how that would be done in Wicket.

Finally, there is some very nasty issue in WFS, if you have 100k layers
the first
request will try to build a WFS schema containing all the schemas of all
the vector layers.

Do we have any indication on how to proceed, in order to make requests

work only with the feature types needed, instead of having to build a
Godzilla schema and having to drop it every time anything in the config
changes (feature type config, new ft, ft removed)

The only workaround I found so far is to disable global services and force
using workspace
local catalogs. This way only the namespaces reachable from each workspace
seem to
be listed in the xml root element. That's a band aid solution though. I
agree it'd be much
better if the schema was built for only the namespaces reachable from the
requested feature type.

There are more issues, like CascadeDeleteVisitor doing linear scan,
or capabilities document generation grabbing a list instead of scrolling
over
an iterator, that needs to be address, as well, but let's say these are not
as visible as the issues above, you need some load and very large
caps documents to get into troubles with these, fortunately in our case
the layers are split among many workspaces, which make things
quite a bit more manageable.

As for capabilities document generation, the GSIP for the Catalog API
extension
included the migration of Capabilities_1_3_0_Transformer in the wms module.
Others could follow suite.

Anyways... feedback, suggestions, very important bits that we migth
have missed?

Looks like a pretty good synthesis of what the current situation is. I'd
say lets create separate issues for each of them
and call for a volunteer effort to get them fixed. The UI ones you
mentioned (lists and dropboxes) seem like the most
complicated at a first glance.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

------------------------------------------------------------------------------

_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--

Gabriel Roldán
Software Developer | Boundless <http://boundlessgeo.com/&gt;
groldan@anonymised.com
@boundlessgeo <http://twitter.com/boundlessgeo/&gt;