[Geoserver-users] Improved code in ShapeZipOutputFormat

Hello everybody,

I'm not sure if this is the right place to post this but...

I have corrected a method in ShapeZipOutputFormat (in package
org.geoserver.wfs.response) that makes the shape-zip output format more
than 10 times faster. Especially for large shape-zip outputs.

The new code looks like this and it would be nice if it could be
corrected in the next release:

/**
  * readInWriteOutBytes Description: Reads in the bytes from the input
stream
  * and writes them to the output stream.
  *
  * @param output
  * @param in
  *
  * @throws IOException
  */
private void readInWriteOutBytes(OutputStream output, InputStream in)
throws IOException {
  int c;

  byte buffer = new byte[1024 * 1024];
  while (-1 != (c = in.read(buffer))) {
   output.write(buffer, 0, c);
  }
}

(In the current trunk version the method reads one byte at the time
which seems inefficient by nature and possibly even worse when writing
to a ZipOutputStream.)

Regards,
Albin

Lundmark Albin ha scritto:

Hello everybody,
I'm not sure if this is the right place to post this but...
I have corrected a method in ShapeZipOutputFormat (in package
org.geoserver.wfs.response) that makes the shape-zip output format more
than 10 times faster. Especially for large shape-zip outputs.
The new code looks like this and it would be nice if it could be
corrected in the next release:
/**
  * readInWriteOutBytes Description: Reads in the bytes from the input
stream
  * and writes them to the output stream.
  * * @param output
  * @param in
  * * @throws IOException
  */
private void readInWriteOutBytes(OutputStream output, InputStream in)
throws IOException {
  int c;
   byte buffer = new byte[1024 * 1024];
  while (-1 != (c = in.read(buffer))) {
   output.write(buffer, 0, c);
  }
}
(In the current trunk version the method reads one byte at the time
which seems inefficient by nature and possibly even worse when writing
to a ZipOutputStream.)

Ouch, you're right, this is bad, thanks a ton for noticing.
Can I ask you a couple of extra things:
- open a jira issue on jira.codehaus.org
- have you tested with a smaller buffer, such as 4KB? That's
   the size I usually use for file reads, my experience is that
   speedups go down very steeply after that size. A buffer of
   one meg has the side effect of visibly limiting the number
   of concurrent clients that can issue such a request (we already
   have WMS which is very memory hungry by nature), so I'd check
   with different sizes and see what's the speedup for each,
   trying to find a balance between single user speed and memory
   consuption.

If you can't do the latter, never mind, I'll find some time to do
it myself. Thanks a lot again for spotting that issue and providing
a patch, this is most welcome.

Cheers
Andrea

Hello again,

I have opened a jira issue that got the name GEOS-2121.

Please correct it if needed, not sure if I choose the right version numbers and stuff.

I have not yet tested any other buffer sizes but will update the jira issue if I do. (If I can update it?)

Regards,
Albin

-----
Från: Andrea Aime [mailto:aaime@anonymised.com]
Skickat: den 11 augusti 2008 14:14
Till: Lundmark Albin
Kopia: geoserver users; Harrtell Björn
Ämne: Re: [Geoserver-users] Improved code in ShapeZipOutputFormat

Lundmark Albin ha scritto:

Hello everybody,

I'm not sure if this is the right place to post this but...

I have corrected a method in ShapeZipOutputFormat (in package
org.geoserver.wfs.response) that makes the shape-zip output format
more than 10 times faster. Especially for large shape-zip outputs.

The new code looks like this and it would be nice if it could be
corrected in the next release:

/**
  * readInWriteOutBytes Description: Reads in the bytes from the input
stream
  * and writes them to the output stream.
  *
  * @param output
  * @param in
  *
  * @throws IOException
  */
private void readInWriteOutBytes(OutputStream output, InputStream in)
throws IOException {
  int c;

  byte buffer = new byte[1024 * 1024];
  while (-1 != (c = in.read(buffer))) {
   output.write(buffer, 0, c);
  }
}

(In the current trunk version the method reads one byte at the time
which seems inefficient by nature and possibly even worse when writing
to a ZipOutputStream.)

Ouch, you're right, this is bad, thanks a ton for noticing.
Can I ask you a couple of extra things:
- open a jira issue on jira.codehaus.org
- have you tested with a smaller buffer, such as 4KB? That's
   the size I usually use for file reads, my experience is that
   speedups go down very steeply after that size. A buffer of
   one meg has the side effect of visibly limiting the number
   of concurrent clients that can issue such a request (we already
   have WMS which is very memory hungry by nature), so I'd check
   with different sizes and see what's the speedup for each,
   trying to find a balance between single user speed and memory
   consuption.

If you can't do the latter, never mind, I'll find some time to do it myself. Thanks a lot again for spotting that issue and providing a patch, this is most welcome.

Cheers
Andrea

From the information in GEOS-2121, “Fixed in revisions 9971 , 9969 and 9968 for trunk, 1.6.x and 1.7.x respectively.” I would assume this issue is fixed in the 1.6.5 release, but it’s not labeled as such and is not included in the list of fixed issues for 1.6.5

Can someone clear this up for me?

Regards,

/Björn Harrtell

On Tue, Aug 12, 2008 at 9:26 AM, Lundmark Albin <Albin.Lundmark@…1668…> wrote:

Hello again,

I have opened a jira issue that got the name GEOS-2121.

Please correct it if needed, not sure if I choose the right version numbers and stuff.

I have not yet tested any other buffer sizes but will update the jira issue if I do. (If I can update it?)

Regards,
Albin


Från: Andrea Aime [mailto:aaime@…1671…]
Skickat: den 11 augusti 2008 14:14
Till: Lundmark Albin
Kopia: geoserver users; Harrtell Björn
Ämne: Re: [Geoserver-users] Improved code in ShapeZipOutputFormat

Lundmark Albin ha scritto:

Hello everybody,

I’m not sure if this is the right place to post this but…

I have corrected a method in ShapeZipOutputFormat (in package
org.geoserver.wfs.response) that makes the shape-zip output format
more than 10 times faster. Especially for large shape-zip outputs.

The new code looks like this and it would be nice if it could be
corrected in the next release:

/**

  • readInWriteOutBytes Description: Reads in the bytes from the input
    stream
  • and writes them to the output stream.
  • @param output
  • @param in
  • @throws IOException
    */
    private void readInWriteOutBytes(OutputStream output, InputStream in)
    throws IOException {
    int c;

byte buffer = new byte[1024 * 1024];
while (-1 != (c = in.read(buffer))) {
output.write(buffer, 0, c);
}
}

(In the current trunk version the method reads one byte at the time
which seems inefficient by nature and possibly even worse when writing
to a ZipOutputStream.)

Ouch, you’re right, this is bad, thanks a ton for noticing.
Can I ask you a couple of extra things:

  • open a jira issue on jira.codehaus.org
  • have you tested with a smaller buffer, such as 4KB? That’s
    the size I usually use for file reads, my experience is that
    speedups go down very steeply after that size. A buffer of
    one meg has the side effect of visibly limiting the number
    of concurrent clients that can issue such a request (we already
    have WMS which is very memory hungry by nature), so I’d check
    with different sizes and see what’s the speedup for each,
    trying to find a balance between single user speed and memory
    consuption.

If you can’t do the latter, never mind, I’ll find some time to do it myself. Thanks a lot again for spotting that issue and providing a patch, this is most welcome.

Cheers
Andrea

---------- Forwarded message ----------
From: “Albin Lundmark (JIRA)” <jira@…159…>
To: “Lundmark Albin” <Albin.Lundmark@…1668…>
Date: Tue, 12 Aug 2008 09:19:26 +0200
Subject: [jira] Created: (GEOS-2121) Change of buffer size in ShapeZipOutputFormat
Change of buffer size in ShapeZipOutputFormat

Key: GEOS-2121
URL: http://jira.codehaus.org/browse/GEOS-2121
Project: GeoServer
Issue Type: Improvement
Components: WFS
Affects Versions: 1.6.4
Environment: Irrelevant
Reporter: Albin Lundmark
Assignee: Andrea Aime
Fix For: 1.6.5

I have corrected a method in ShapeZipOutputFormat (in package
org.geoserver.wfs.response) that makes the shape-zip output format more than 10 times faster. Especially for large shape-zip outputs.

The new code looks like this and it would be nice if it could be corrected in the next release:

/**

  • readInWriteOutBytes Description: Reads in the bytes from the input stream
  • and writes them to the output stream.
  • @param output
  • @param in
  • @throws IOException
    */
    private void readInWriteOutBytes(OutputStream output, InputStream in) throws IOException {
    int c;

byte buffer = new byte[1024 * 1024];
while (-1 != (c = in.read(buffer))) {
output.write(buffer, 0, c);
}
}

(In the current trunk version the method reads one byte at the time which seems inefficient by nature and possibly even worse when writing to a ZipOutputStream.)

The above buffer size is not optimized and propably does not need to be that big as Andrea Aime stated in out conversation on the geoserver-users list:
“- have you tested with a smaller buffer, such as 4KB? That’s
the size I usually use for file reads, my experience is that
speedups go down very steeply after that size. A buffer of
one meg has the side effect of visibly limiting the number
of concurrent clients that can issue such a request (we already
have WMS which is very memory hungry by nature), so I’d check
with different sizes and see what’s the speedup for each,
trying to find a balance between single user speed and memory
consuption.”

I have not yet tested any other buffer size than the one above.


This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa

For more information on JIRA, see: http://www.atlassian.com/software/jira


This SF.Net email is sponsored by the Moblin Your Move Developer’s challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/


Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

Hi,

it was committed to the branch that 1.6.5 is built from [1], so it should be in there.

I think what happened was that Justin upgraded the issue to be fixed for RC1 rather than RC2 [2], and the select box makes it impossible to see that more releases were selected.

-Arne

1: http://fisheye.codehaus.org/changelog/geoserver/?cs=9969
2: http://jira.codehaus.org/browse/GEOS-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:changehistory-tabpanel

Björn Harrtell wrote:

From the information in GEOS-2121, "Fixed in revisions 9971 , 9969 and 9968 for trunk, 1.6.x and 1.7.x respectively." I would assume this issue is fixed in the 1.6.5 release, but it's not labeled as such and is not included in the list of fixed issues for 1.6.5

Can someone clear this up for me?

Regards,

/Björn Harrtell

On Tue, Aug 12, 2008 at 9:26 AM, Lundmark Albin <Albin.Lundmark@anonymised.com <mailto:Albin.Lundmark@anonymised.com>> wrote:

    Hello again,

    I have opened a jira issue that got the name GEOS-2121.

    Please correct it if needed, not sure if I choose the right
    version numbers and stuff.

    I have not yet tested any other buffer sizes but will update the
    jira issue if I do. (If I can update it?)

    Regards,
    Albin

<snip>

--
Arne Kepp
OpenGeo - http://opengeo.org
Expert service straight from the developers