ScaleOpImage - performance JAI vs. JAI-EXT

Hello,

we are facing an issue here with ScaleOpImage as part of jt-scale-1.1.24. I am not sure if this is the right place to ask, but I want to ask anyway, before creating a Jira ticket.

We are using GeoServer and have recently encountered significant performance problems with some large image data, which we are serving over WMS. To be more specific, these (Weather radar images) are ca. 6600x5500 pixel. The performance degrade was noticed especially in cases were the final WMS image size were significantly smaller than the size of the original data. For example are we generating small preview images (WIDTH=100&HEIGHT=100). The time to produce the image could last as long as 3-4mins.

After hours of debugging GeoServer code it turns out that the majority of the time is spent in the scaling of the image which is triggered by GridCoverageRenderer#affine() and eventually ends up in it.geosolutions.jaiext.scale.ScaleOpImage#computeTile(..). By accident we figured out that the problem disappears immediately when switching to the JAI version of the operator (switch Scaleoperator from JAI-EXT to JAI on the GeoServer admin page).

After comparing the two implementations we found the difference, and before I create a ticket I would like to ask here, if we are overseeing something.

The following figure illustrates the difference:

On the top we see the JAI ScaleOpImage class, on the bottom the JAI-EXT version. As far as I understand, the code in both versions basically runs through the X- and Y-splits, computes smaller rectangles from the original image and computes destination tiles.

For this, the code must obtain from the original image the smaller pieces. Therefore it calls source0.getData(Rect) with the rectangle that should be considered.

Looking closer at the both versions we see that there is only one major difference between them: While in JAI it passes the newSrcRect into the getData() call, JAI-EXT passes the srcRect into getData().

And that makes all the difference. On the right hand side of the above figure you can see that for our large image for the first x-/y-split:

  • srcRect := [x=64,y=389, width=5273, height=4371]
  • newSrcRect := [x=64, y=389, width=6, height=1]

Now when ScaleOpImage passes one or the other rectangles into PlanarImage#getData() it makes a huge performance difference. The reason is that the PlanarImage#getData() method takes a short-cut when startTileX == endTileX and startTileY == endTileY. That is the case if newSrcRect is used as can be seen in the following figure:

In case where srcRect is used, the short-cut cannot be taken and the code runs through all the cobble stuff below, which will take forever and sometimes never end at all.

The code that actually made this change was https://github.com/geosolutions-it/jai-ext/pull/51.

My question now is, why is this done differently than in JAI. And could this be a bug? I’m asking, because I am not an expert in JAI, but the code looks suspicious, especially since, in case where no border extender was defined, the method does in fact use the newSrcRectas input to the getData()call:

I hope someone can answer this question, or if not can point me to a more appropriate place / person that I could ask this question.

Thanks so much,
Sören

Hello @skalesse,

I expect you will get a discussion here, or in geotools-devel mailing list if nobody replies.

You already have (more than enough information) to open an issue for the jai-ext project GitHub · Where software is built and start crafting a pull-request.

Thanks @jive ,

I started off the discussion here, because we are experiencing the problems in the context of using GeoServer (WMS). But yeah, you’re right, maybe in the end a PR at jai-ext would be the result of the discussion.

What makes me nervous is the fact that Removed getExtendedData from all the modules by n-lagomarsini · Pull Request #51 · geosolutions-it/jai-ext · GitHub has so many places changed in many of the operators and even at so many places in ScaleOpImage alone. I can hardly believe that so many places could potentially be wrong.

That’s why I’d rather discuss this here first.

Btw, I was attempting to make a test-case in GeoServer WMS that shows the problem, but It’s not so easy to make. But if really it turns out there is no other way than showing the potentially wrong behaviour, then I will try harder to make that test-case.

Thanks!

Amusingly that #51 from 2014 is to resolve:

The getExtendedData() method causes bad performances. It must be changed.

Is it easier to make a test-case directly in JAI-EXT? It would be useful anyways when you have a patch to test.

A quick update. I was able to write a jai-ext test case, with approximately the input image that we use in GeoServer. Unfortunately (or maybe fortunately) I was not immediately able to reproduce that problem there. So I guess that’s good news for jai-ext.

But then, the problem must have something to do with the chain of operations that WMS getMap is doing while processing the data. Unfortunately I was not yet able to replicate the quite complex chain of operations of GeoServer when it processes the data. There’s a lot of warping and cropping and all that happening.

I guess, what I would really need to be doing , is to somehow write a WMS getMap test case. I’ll give it a try and keep you updated about the results.

What really makes we wonder though is how switching the Scale to JAI (instead of JAI-EXT) can then make such a big difference. 
 but I guess we’ll see.

Thanks! Sören

Hi all,
it’s hard to say anything about that pull request, it’s 10 years old and was done by the developer that
worked heavily on jai-ext for a couple of years, mostly on NODATA support, and eventually left a year after that commit.

Just out of curiosity, I’ve tried the suggested change in the incriminated lines of code, and run the build: it breaks several
tests, so I’m guessing there is more to it.
Given the numbers you’re reporting, I’m guessing you’re looking at one of those cases where a low resolution
image is upscaled 1000-10000 times and interpolated?

Like Jody, I also encourage you to reproduce the issue. To make things simpler, you can try to use the JAI chain visual debugger.
Steps involved:

  • Run GeoServer on a machine with a GUI (desktop/laptop)
  • Start the JVM with -Dwms.raster.enableRasterChainDebug=true
  • Find the GetMap request that triggers the issue. Make sure your’e working against a single raster layer or the chain won’t show
  • In the GetMap, add “&showChain=true” in the URL

Here is a sample display, based on the demo mosaic layer (many of many little images), reprojected to 3857:

For each step in the chain you can analyze inputs, outputs, operation parameters, and save a copy of the
any image in the chain as png/tiff (it’s a pure image viewer, has no idea about georeferencing)

If for any reason to reproduce you cannot use this, find the bit of code where the incriminated affine is built,
and modify it to call the operation browser programmatically:

RenderedImageBrowser.showChainAndWaitOnClose(theAffineImage);

Hope this helps

Hi @aaime-geosolutions ,

wow, thanks so much for the idea with the chain analyzer GUI. It worked perfctly out of the box. Here you can see the very typical operation stack for our data:

We’re always starting off with some writable raster (usually a buffered image) that holds the data itself. And then all the rest of the chain is what GeoServer is doing with the data: 1) crop by 2 pixel in y-direction, 2) warp, 3) scale, 4) classify, 5) mosaic. It’s always like that.

Here’s the dump of the op-stack:

JAI op: Mosaic(it.geosolutions.jaiext.mosaic.MosaicOpImage) at Level: 0, offset:0, 0, size:100 x 100, tile size:100 x 100
Params. Parameter 1:MOSAIC_TYPE_OVERLAY; Parameter 2:null; Parameter 3:[javax.media.jai.ROI@3f04f371]; Parameter 4:[[0.0]]; Parameter 5:[1.0]; Parameter 6:[RangeDouble[-Infinity, 0.0)]; 
Bands: 1, type: Byte; Color model:class java.awt.image.IndexColorModel, transparency: Translucent
Tile cache: it.geosolutions.concurrent.ConcurrentTileCacheMultiMap@222e782
Tile scheduler: com.sun.media.jai.util.SunTileScheduler@543ce332<global>, parallelism 20, priority 5
Number of sources: 1
   JAI op: RasterClassifier(it.geosolutions.jaiext.classifier.RasterClassifierOpImage) at Level: 1, offset:34, 29, size:31 x 42, tile size:31 x 42
   Params. Parameter 1:[Domain description:
name= no data
input range=RangeDouble(-Infinity, -990.0)
output range=RangeDouble[0.0, 0.0]
colors=java.awt.Color[r=133,g=133,b=133], Domain description:
name=(dBZ)
input range=RangeDouble[-990.0, 1.0)
output range=RangeDouble[1.0, 1.0]
colors=java.awt.Color[r=0,g=0,b=0], Domain description:
name=  1-5,5
input range=RangeDouble[1.0, 5.5)
output range=RangeDouble[2.0, 2.0]
colors=java.awt.Color[r=153,g=255,b=255], Domain description:
name=  5,5-10
input range=RangeDouble[5.5, 10.0)
output range=RangeDouble[3.0, 3.0]
colors=java.awt.Color[r=51,g=255,b=255], Domain description:
name= 10-14,5
input range=RangeDouble[10.0, 14.5)
output range=RangeDouble[4.0, 4.0]
colors=java.awt.Color[r=0,g=202,b=202], Domain description:
name= 14,5-19
input range=RangeDouble[14.5, 19.0)
output range=RangeDouble[5.0, 5.0]
colors=java.awt.Color[r=0,g=153,b=52], Domain description:
name= 19-23,5
input range=RangeDouble[19.0, 23.5)
output range=RangeDouble[6.0, 6.0]
colors=java.awt.Color[r=77,g=191,b=26], Domain description:
name= 23,5-28
input range=RangeDouble[23.5, 28.0)
output range=RangeDouble[7.0, 7.0]
colors=java.awt.Color[r=153,g=204,b=0], Domain description:
name= 28-32,5
input range=RangeDouble[28.0, 32.5)
output range=RangeDouble[8.0, 8.0]
colors=java.awt.Color[r=204,g=230,b=0], Domain description:
name= 32,5-37
input range=RangeDouble[32.5, 37.0)
output range=RangeDouble[9.0, 9.0]
colors=java.awt.Color[r=255,g=255,b=0], Domain description:
name= 37-41,5
input range=RangeDouble[37.0, 41.5)
output range=RangeDouble[10.0, 10.0]
colors=java.awt.Color[r=255,g=196,b=0], Domain description:
name= 41,5-46
input range=RangeDouble[41.5, 46.0)
output range=RangeDouble[11.0, 11.0]
colors=java.awt.Color[r=255,g=137,b=0], Domain description:
name= 46-50,5
input range=RangeDouble[46.0, 50.5)
output range=RangeDouble[12.0, 12.0]
colors=java.awt.Color[r=255,g=0,b=0], Domain description:
name= 50,5-55
input range=RangeDouble[50.5, 55.0)
output range=RangeDouble[13.0, 13.0]
colors=java.awt.Color[r=180,g=0,b=0], Domain description:
name= 55-60
input range=RangeDouble[55.0, 60.0)
output range=RangeDouble[14.0, 14.0]
colors=java.awt.Color[r=72,g=72,b=255], Domain description:
name= 60-65
input range=RangeDouble[60.0, 65.0)
output range=RangeDouble[15.0, 15.0]
colors=java.awt.Color[r=0,g=0,b=202], Domain description:
name= 65-75
input range=RangeDouble[65.0, 75.0)
output range=RangeDouble[16.0, 16.0]
colors=java.awt.Color[r=153,g=0,b=153], Domain description:
name= 75-85
input range=RangeDouble[75.0, 85.0)
output range=RangeDouble[17.0, 17.0]
colors=java.awt.Color[r=255,g=51,b=255], Domain description:
name=No data1
input range=RangeDouble[NaN, NaN]
output range=RangeDouble[18.0, 18.0]
colors=java.awt.Color[r=0,g=0,b=0]]; Parameter 2:-1; Parameter 3:javax.media.jai.ROI@3989839b; Parameter 4:null; 
   Bands: 1, type: Byte; Color model:class java.awt.image.IndexColorModel, transparency: Translucent
   Tile cache: it.geosolutions.concurrent.ConcurrentTileCacheMultiMap@222e782
   Tile scheduler: com.sun.media.jai.util.SunTileScheduler@543ce332<global>, parallelism 20, priority 5
   Number of sources: 1
      JAI op: Scale(it.geosolutions.jaiext.scale.ScaleNearestOpImage) at Level: 2, offset:34, 29, size:31 x 42, tile size:100 x 100
      Params. Parameter 1:0.004739226; Parameter 2:0.007972344; Parameter 3:34.369297; Parameter 4:29.058283; Parameter 5:InterpolationNearest; Parameter 6:javax.media.jai.ROI@7f1aebc2; Parameter 7:false; Parameter 8:null; Parameter 9:null; 
      Bands: 1, type: Float; Color model:class java.awt.image.ComponentColorModel, transparency: Opaque
      Tile cache: it.geosolutions.concurrent.ConcurrentTileCacheMultiMap@222e782
      Tile scheduler: com.sun.media.jai.util.SunTileScheduler@543ce332<global>, parallelism 20, priority 5
      Number of sources: 1
         JAI op: Warp(it.geosolutions.jaiext.warp.WarpNearestOpImage) at Level: 3, offset:0, 1, size:6500 x 5299, tile size:100 x 100
         Params. Parameter 1:javax.media.jai.WarpGrid@41fb4e3d; Parameter 2:InterpolationNearest; Parameter 3:[NaN]; Parameter 4:javax.media.jai.ROI@d01a0bb; Parameter 5:null; 
         Bands: 1, type: Float; Color model:class java.awt.image.ComponentColorModel, transparency: Opaque
         Tile cache: it.geosolutions.concurrent.ConcurrentTileCacheMultiMap@222e782
         Tile scheduler: com.sun.media.jai.util.SunTileScheduler@543ce332<global>, parallelism 20, priority 5
         Number of sources: 1
            JAI op: Crop(it.geosolutions.jaiext.crop.CropOpImage) at Level: 4, offset:0, 1, size:6500 x 5299, tile size:6500 x 5300
            Params. Parameter 1:0.0; Parameter 2:1.0; Parameter 3:6500.0; Parameter 4:5299.0; Parameter 5:null; Parameter 6:null; Parameter 7:[0.0]; 
            Bands: 1, type: Float; Color model:class java.awt.image.ComponentColorModel, transparency: Opaque
            Tile cache: it.geosolutions.concurrent.ConcurrentTileCacheMultiMap@222e782
            Tile scheduler: com.sun.media.jai.util.SunTileScheduler@543ce332<global>, parallelism 20, priority 5
            Number of sources: 1
               Non op: class javax.media.jai.WritableRenderedImageAdapter at Level: 5, offset:0, 0, size:6500 x 5300, tile size:6500 x 5300
               Bands: 1, type: Float; Color model:class java.awt.image.ComponentColorModel, transparency: Opaque
               Tile cache: null
               Tile scheduler: null

I am not expecting anyone to analyze the stack and point me to our problem, but it gives myself a good tool to figure out what might go wrong.

Btw, switching from JAI-EXT to JAI, won’t make a single difference in this above chain (except of course of the class of the scaleOp).

@aaime-geosolutions, thanks for taking a look at Scale of JAI-EXT. I agree with you that making that single change in JAI-EXT will probably not do. There must be more to it, if indeed the problem is there.

My ideas still is to try making a reproducable Junit that shows the problem. It’s just not that easy to make, because of all the parameters, properties and rendering hints that have to be passed into all the operations involved in the stack. I will try further.

One last question: Is there a test-example somewhere (presumably in WMS) that allows me to directly do the GetMap and infact return that rendered image? I tried to find some, but wasn’t yet able to find some good example.

Thanks!!

Hi,

just for completeness. It looks a s if I got a bit further with my analysis.

It seems now that the WarpNearest operation we are seeing in the chain, actually stems from the reprojection:

DirectRasterRenderer.readWithProjectionHandling(...)
|-> GridCoverageRenderer.renderImage(...)
     |-> GridCoverageRendererUtilities.reproject(...)
          |-> GridCoverageRendererUtilities.resample(...)
               |-> Resampler2D.reproject(...)

It looks to me as if we are running into the else case that handles the general case, where the transformation is not affine. Therefore it constructs a warp instead.

The transform that is used here for this data is this:

CONCAT_MT[PARAM_MT["Affine", 
    PARAMETER["num_row", 3], 
    PARAMETER["num_col", 3], 
    PARAMETER["elt_0_0", 2387.9519425806548], 
    PARAMETER["elt_0_2", -7204824.823130887], 
    PARAMETER["elt_1_1", -2039.256232067769], 
    PARAMETER["elt_1_2", 13097420.10137948]], 
  INVERSE_MT[PARAM_MT["Popular Visualisation Pseudo Mercator", 
      PARAMETER["semi_major", 6378137.0], 
      PARAMETER["semi_minor", 6378137.0], 
      PARAMETER["latitude_of_origin", 0.0], 
      PARAMETER["central_meridian", 0.0], 
      PARAMETER["scale_factor", 1.0], 
      PARAMETER["false_easting", 0.0], 
      PARAMETER["false_northing", 0.0]]], 
  PARAM_MT["Lambert_Azimuthal_Equal_Area", 
    PARAMETER["semi_major", 6378137.0], 
    PARAMETER["semi_minor", 6356752.314245179], 
    PARAMETER["latitude_of_center", 52.0], 
    PARAMETER["longitude_of_center", 10.0], 
    PARAMETER["false_easting", 3760756.2464729534], 
    PARAMETER["false_northing", -2656141.300687874]], 
  PARAM_MT["Affine", 
    PARAMETER["num_row", 3], 
    PARAMETER["num_col", 3], 
    PARAMETER["elt_0_0", 0.001], 
    PARAMETER["elt_0_2", 0.5], 
    PARAMETER["elt_1_1", -0.001], 
    PARAMETER["elt_1_2", 0.5]]]


 clearly not affine :wink:

The source/target BBOX are

GridEnvelope2D[0..6499, 1..5299]

and the image Layout is this:

ImageLayout[MIN_X=0, MIN_Y=1, WIDTH=6500, HEIGHT=5299, TILE_GRID_X_OFFSET=0, TILE_GRID_Y_OFFSET=0, TILE_WIDTH=10, TILE_HEIGHT=10]

It seems to me that if this warp is wrapped in the JAI-EXT scale, it behaves different than if wrapped in a JAI scale (or the slightly different implementation of the two scale ops makes the difference).

Just a gut feeling looking at the image
 the tile grid is very dense (too dense, unless it’s zoomed out a few times), what’s the tile structure of the original data?

Hi @aaime-geosolutions ,

What we are seeing on the screenshot above, should in fact be the tile grid of the Warp. If I understand it correctly, the warp gets the tile grid from the scale, and the tile grid of the scale is (for whatever reasons) the tile grid of the final image. In our example it should be 100x100, because that’s the requested final image size.

As for the original data, to be honest, the original data is not tiled at all. We are reading the data from the original file and simply convert it into a writable raster, which will later be turned into a buffered image by the grid coverage factory.

Do you think this could be a problem - esp. with respect to performance?

Thanks!
Sören

P.S. @aaime-geosolutions , you are right, the image was zoomed out a lot in the viewer, so that you could get a better impression on what it looks like.

I’m not sure if it’s related to the problem, but yes, that image structure is a problem in itself:

  • Reading 6500*6500 to generate a 100x100 output is bad to the point of being criminal, please try to have overviews in your source image
  • Also, make sure your original image is tiled, so that you can read only the portion you need when displaying data zoomed-in

Also have a look at the other recommendations in this presentation:

Cheers
Andrea

1 Like

Hi @aaime-geosolutions

you’re absolutely right. The way we are currently processing the data is not optimal and we will try work on improving that on the data management side.

I want to focus again on the JAI vs. JAI-EXT problem though for a minute.

I have analyzed it further, and still I have no clue what exactly is making the tremendous difference between the two, but here are some observations. For this I have made some flame graphs, which give you an insight on where the system actually spends time.

I have run the system twice, which the exact same WMS request:

http://localhost:8080/geoserver/ows?REQUEST=GetMap&SERVICE=WMS&VERSION=1.3.0&FORMAT=image/png&STYLES=dwd.radar:Radar.Reflectivity_dBz_DWD_RX_WV&TRANSPARENT=TRUE&LAYERS=dwd.radar:Radar_eucom-zhproduct_1x1km_eu&tiled=false&TIME=2025-03-04T07:30:00.000Z&WIDTH=100&HEIGHT=100&CRS=EPSG:3857&BBOX=-16300258.14747685,-1974264.273731744,20239495.021893382,17508820.823895693

Once I have set the Scale operation to JAI, the other time I have set it to JAI-EXT. During the request I have made sure, nothing else is ongoing on the server, and I have captured the flame graph.

Here we can see what time is spent where, and astonishingly also what different operations are involved in the two versions.

(DISCLAIMER: the flame graph is not a call stack. It’s rather a filtered version of a call stack, where only those methods are retained that contribute significant time to the entire operation).

A) flame graph using JAI Scale

Generating the image took about 1,4s. You can see that the major time is spent in two sub-tasks performed during the mosaic-ing. The time is equally distributed among the two. The two tasks are the

  • GenericPiecewiseOpImage
  • the scale

B) The flame graph using JAI-EXT Scale

Generating of the image took about 94,5s. Here as well the majority of the time is equally distributed between GenericPiecewiseOpImage and the scale.

What I find extremely interesting is that the ‘call stack’ is so different between the two versions.

The stack in the JAI version is much bigger but still performs better. In the JAI-EXT version a lot of time is spent in the JDKWorkarounds class as well as in the construction of data buffers. These parts seem to be completely missing in the JAI version (or just doesn’t take time at all).

For better comparison I have re-arranged the images that we just saw so that the two involved heavy operations are displayed next to each other.

C) Scale operation - JAI vs. JAI-EXT

The JAI op runs in 700ms, the JAI-EXT in 35s. You can also clearly see that the JAI version uses the getExtendedData() call that was removed from JAI-EXT a while ago with the mentioned PR.

D) GenericPiecewiseOpImage - JAI vs. JAI-EXT

The JAI op runs in 690ms and the JAI-EXT in 58s. The interesting part here is that even in the JAI version (left hand side) it uses the JAI-EXT scale. I have noticed that already a few days ago. It seems somehow that putting some operators from one panel to the other in the GeoServer admin page doesn’t always have an effect. My assumption is that it depends on how the operator is actually initialized.

Nonetheless, again it is very interesting to see how the ‘call stack’ is so different between the two versions.

In the JAI-EXT version, most of the time is really spent in PlanarImage.createWriteableRaster() and and PlanarImage.cobbleFloat().

What would be interesting to know (but the flame graph unfortunately doesn’t tell us) is

  • what are the arguments to the heavy-load methods. How do they differ?
  • how many times are these methods called. Clearly the flame graph won’t tell us if some of the methods have been called 10.000 times and therefore it adds up to these numbers.

So, even though, I do agree that we must work on our data ingest, to me it seems clear that something is extremely weird in how these operators are chained and called.

And again, from the @aaime-geosolutions 's great analyzer functionality we can see that operation-wise, there does not seem to be the slightest difference. The only one is the JAI scale vs the JAI-EXT scale.

So, I guess I will have to dig further. 
 But thanks for your valuable input!
Sören

Hi,

the more I think about it, the more I am convinced that it must be correct what JAI is doing and in return JAI-EXT must be wrong. Here’s my reasoning. Let’s look at what JAI is doing in ScaleOpImage.computeTile(..):

So we’re putting in a source image that should be scaled to 100x100. Now we’re getting into computeTile(
)

  1. It calculates the splits:

  1. Then it loops the splits

and for each split it calculates the split-rectangle that for obtaining the data from the image.

On the figure you can see that it consideres the tileWidth/Height, constructs a src tile and then intersects it with the srcRect of the original image.

That will be the newSrcRect: the cut out of the original image, but at the desired position and with width/height adjusted to the required tileWidth/height.

To me that makes sense, doesn’t it?

  1. Obtain the tile from the original image

now comes the interesting part. We need to obtain the data from the source image:

JAI even has a comment: “// Do the operations with these new rectangles”, then as can be seen, it passes the newSrcRect into the source0.getExtendedData().

So instead for a single split, I would be passing newSrcRect=[x=93,y=81,width=7,height=19] instead of srcRect=[x=93,y=81,width=6274,height=5161]

To me that makes perfect sense. Why would I passes the srcRect? It’s way too big! My splits are for 100x100 in this example.

Something doesn’t seem right the way JAI-EXT does it.

Oh btw, the PlanarImage.getExtendedData(..) that was removed from JAI-EXT whith the mentioned PR, isn’t all too bad after all, cos what happens here is this:

So that was what the JAI-EXT version had been replaced with. To me, the JAI_EXT source code seems correct, if replacing PlanarImage.getExtendedData(..) with PlanarImage.getData(..) was the intention of the PR.

But it seems as if since that migration, the wrong argument is passed into the getData(...) method.

But I will investigate further 
 maybe making a local path for JAI-EXT and see how it is performing. Still I wasn’t able to make a reproducible neither in jai-ext nor GeoServer

Thanks for reading.
Sören

Hi Soren,
I don’t believe you need to convince anyone that there is an issue with how JAI-Ext is behaving.
Since you’re deep in the code already, these are viable options:

  • Reproduce it with a single image call, and report it to the project, and wait for someone to act on it (with no expectation as to when it will be fixed)
  • Make a fix, issue a PR, and ensure it’s not breaking anything downstreams. Then patiently wait for review and release (again, with no expectation on when that will happen
 jai-ext does not have a regularly scheduled release, but in this case the fix will be able to hop in the next one, when it happens)

Uahhhhhhhh! SUCCESS! :slight_smile:

Based on the above depictions I was able to adjust JAI-EXT ScaleOpImage and fix all places in computeTile(...) to use the corresponding individual new src rectangles instead of just srcRect.

That made all the test-cases work, but one. It is really important that all places are fixed with their corresponding new src rects - not just the one place that I had depicted in one of the above comments.

The one still failing test-case was for bi-cubic interpolation, where now with the adjusted code, the interpolation runs into an ArrayOutOfBoundsExcpetion. I was able to find the cause and that is the initialization and computation of the extended images.

There is code in that computation that handles interpolation explicitly:

It will pad the image in left and top by one pixel, in case a border extender is used. So that’s good I guess.

But a similar special handling is missing in case we are using a ROI. See here:

That code doesn’t make that distinction and that is crashing the bi-cubic test-case.

Adding this extra paddding also for ROI handling will make all test-cases run successfully.

I ran the patched jai-scale library in our GeoServer and amazingly it turns out that It fixes the problem entirely!! :smiley:

The request I was always referring to in this discussion now runs in 2s instead of 94s. And even a 10x10 image composed from the original 6500x5300 image runs in 500ms instead of crashing the GeoServer, which it had done previously.

Guess I will now have to make that PR.

Question to you @aaime-geosolutions , looking again at the initial PR Removed getExtendedData from all the modules by n-lagomarsini · Pull Request #51 · geosolutions-it/jai-ext · GitHub, it seems to me as this was all a big copy-n-paste error, because if you look closer at the commit of ScaleOpImage, the first occurrence of the code block:

was used and copied to all the other places. The first occurrence though is right. There it has to be the srcRect that is to be taken for the getData(...). But all the other occurrences are wrong.

Now I have understood that and it can be fixed.

But now looking at other places - like e.g. RescaleOpImage. The same problem is there as well:

How to deal with that? I get the feeling, all those places in all operations need the same fix. Should this be part of the PR?

Thanks
Sören

oh wait. Never mind about the other operations. It looks as if I was mistaken about the copy-n-paste to the other ones. I have checked again, and it seems maybe indeed only ScaleOpImage might be affected. But I will check the others too anyway.

An issue has been filed against jai-ext Scale operator performs poorly on very large scale ratios and tiling · Issue #309 · geosolutions-it/jai-ext · GitHub and a corresponding PR Scale operator performs poorly on very large scale ratios and tiling by skalesse · Pull Request #310 · geosolutions-it/jai-ext · GitHub.

I have marked the PR as draft so far, because even though it looks as if only ScaleOpImage might be affected, I would rather go and check the other files that were part of the initial PR to see if the error has been repeated there.

Thanks for the discussion and your patience!
Sören

1 Like

Thanks for investigation and sharing your notes; it will really help the next person troubleshooting JAI.

Hi,

just wanted to let you know, that I have now marked PR Scale operator performs poorly on very large scale ratios and tiling by skalesse · Pull Request #310 · geosolutions-it/jai-ext · GitHub as ready.

Indeed only the two classes ScaleOpImage of jai-ext/jt-scale and Scale2OpImage of jai-ext/jt-scale2 have directly been affected by the bug. I have looked through all other classes of the PR#51, but there the changes seemed correct.

I was also able to resolve my local failing test-cases by switching to Java 8, which seems to be the required platform. All test-cases now run successful.

I am looking forward to a review of the PR#310.

Thank you so much!
Sören

Hi again, i

I just wanted to let you now that with Scale operator performs poorly on very large scale ratios and tiling · Issue #309 · geosolutions-it/jai-ext · GitHub fixed, we where able to patch our GeoServer with jai-ext version 1.1.30 and that relieves a lot of stress from the server.

Not that it’s not crashing anymore on the preview pictures, also we have the ‘feeling’ that its liveliness has increased. We are still evaluating the metrics but it looks as this fix helps a lot.

Here’s an example of the images that gave us the headache over the last weeks

Thanks everybody who was involved in this. Especially @aaime-geosolutions for guiding us through the whole workflow. If ever we meet I owe you a beer. :smiley:

Reghards
Sören

1 Like