[Geoserver-devel] Excessive method length code smell

Hey all,
sorry for cross-posting but this relates to both projects.

One of the PMD rules I’d like to set up the most is ExcessiveMEthodLength. It’ll just fail on a threshold for lines of code in a single method.

I’ve made some assessments for 500 and 100 lines of code limits, check this out:

GeoTools:
https://github.com/groldan/geoserver/wiki/ExcessiveMethodLength-PMD-assessment—GeoTools

GeoServer: https://github.com/groldan/geoserver/wiki/ExcessiveMethodLength-PMD-assessment—GeoServer

I think avoiding too large methods is important for maintainability and to lower the barrier entry to the project for other people, because trying to keep methods concise forces you to think of the design and to split them into smaller methods. Ideally, a single method would be succinct and easy to read, for anyone to be able to figure out what it does at a given level of abstraction, and having to drill down to a lower level of abstraction method only if necessary.

I myself often feel overwhelmed when facing such methods, despite having so many years of experience on the codebase, it’s discouraging. So maybe we could do something at least in identifying the most relevant ones and trying to make them cleaner.
Of course some of them would not be a problem cause they’re just, either generated or not, linear setups of state, like org.geotools.gml3.GMLSchema:GMLSchema(), which is 978 lines long. But mostly there’s a lot of code that would benefit from a second round of thinking.

As an example, I tried to tidy up org.geotools.coverage.processing.operation.Resampler2D:reproject(…), which is currently 618 lines long, and made it 49. Most important thing I think is it now reads as intended, instead of having to have so many comments explaining what the following hundred lines of code do.

This is the original:

https://github.com/geotools/geotools/blob/64e0026c62cf37662194b92e8052594d57ef826e/modules/library/coverage/src/main/java/org/geotools/coverage/processing/operation/Resampler2D.java#L245-L863

And this the refactored:

https://github.com/groldan/geotools/blob/d25f6424f8f4188824c840f400c3414a9ce12531/modules/library/coverage/src/main/java/org/geotools/coverage/processing/operation/Resampler2D.java#L302-L351

Not saying it’s perfect, that shared catch-all state class is debatable, but I can certainly get an idea of what the method does at a glimpse and follow through if needed.

I know it’s something that’d require quite significant and lacking resources, but also think it’s worth pursuing. Probably a first round to set a soft limit of 200 lines long or so, with a longer term goal of 100?

Looking forward to your comments,
Cheers.

···

Gabriel Roldán

I think ExcessiveMethodLength is an important check for the readability and maintainability for code. Is the opposite check also available? Something that would flag too much abstraction and/or too much inheritance or too many interfaces, leading to confusing, unreadable empty methods and classes? In my experience both projects also have a fair amount of that problem.

Joe Miller

On Wed, Mar 10, 2021 at 9:12 AM Gabriel Roldan <gabriel.roldan@anonymised.com> wrote:

Hey all,
sorry for cross-posting but this relates to both projects.

One of the PMD rules I’d like to set up the most is ExcessiveMEthodLength. It’ll just fail on a threshold for lines of code in a single method.

I’ve made some assessments for 500 and 100 lines of code limits, check this out:

GeoTools:
https://github.com/groldan/geoserver/wiki/ExcessiveMethodLength-PMD-assessment—GeoTools

GeoServer: https://github.com/groldan/geoserver/wiki/ExcessiveMethodLength-PMD-assessment—GeoServer

I think avoiding too large methods is important for maintainability and to lower the barrier entry to the project for other people, because trying to keep methods concise forces you to think of the design and to split them into smaller methods. Ideally, a single method would be succinct and easy to read, for anyone to be able to figure out what it does at a given level of abstraction, and having to drill down to a lower level of abstraction method only if necessary.

I myself often feel overwhelmed when facing such methods, despite having so many years of experience on the codebase, it’s discouraging. So maybe we could do something at least in identifying the most relevant ones and trying to make them cleaner.
Of course some of them would not be a problem cause they’re just, either generated or not, linear setups of state, like org.geotools.gml3.GMLSchema:GMLSchema(), which is 978 lines long. But mostly there’s a lot of code that would benefit from a second round of thinking.

As an example, I tried to tidy up org.geotools.coverage.processing.operation.Resampler2D:reproject(…), which is currently 618 lines long, and made it 49. Most important thing I think is it now reads as intended, instead of having to have so many comments explaining what the following hundred lines of code do.

This is the original:

https://github.com/geotools/geotools/blob/64e0026c62cf37662194b92e8052594d57ef826e/modules/library/coverage/src/main/java/org/geotools/coverage/processing/operation/Resampler2D.java#L245-L863

And this the refactored:

https://github.com/groldan/geotools/blob/d25f6424f8f4188824c840f400c3414a9ce12531/modules/library/coverage/src/main/java/org/geotools/coverage/processing/operation/Resampler2D.java#L302-L351

Not saying it’s perfect, that shared catch-all state class is debatable, but I can certainly get an idea of what the method does at a glimpse and follow through if needed.

I know it’s something that’d require quite significant and lacking resources, but also think it’s worth pursuing. Probably a first round to set a soft limit of 200 lines long or so, with a longer term goal of 100?

Looking forward to your comments,
Cheers.

Gabriel Roldán


GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Hi Joseph,

I’m not sure there’s anything that specific, but not an expert either. There are a number of design checks though[1], maybe you want to explore a little bit.

Trying to be too strict will get us in trouble, for example, if we wanted to honor the LawOfDemeter one.
To be clear, I’m not advocating to set up all possible rules, but some of them do make a lot of sense given the size and complexity of our projects.

[1] https://pmd.github.io/latest/pmd_rules_java_design.html

···

Gabriel Roldán

Hi Gabriel,
I would love to see having a limit on the method length… but remember when running these QAs you’re pretty much alone
(as I’ve been for most of the QA work, even when other people participated, I ended up doing well over 90% of the job).

I would suggest you start with a very high limit, get it fixed, and then you have a limit that PMD can apply, even if stupidly high.
Then it can go down little by little, until you exhaust the time you want to dedicate on it.

These days I won’t be able to do much, with full lockdown in place and kids at home, it’s a lot if I can run some cleanups
where IntelliJ does most of the work for me :smiley: (which is, what I’ve been doing for the past month or two).

Cheers
Andrea

···

Regards, Andrea Aime

== GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information. == Ing. Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via di Montramito 3/A 55054 Massarosa (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it ------------------------------------------------------- Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia. This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail.

Hey Andrea,

On Wed, 10 Mar 2021 at 12:02, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi Gabriel,
I would love to see having a limit on the method length… but remember when running these QAs you’re pretty much alone
(as I’ve been for most of the QA work, even when other people participated, I ended up doing well over 90% of the job).

Understood. No rush though, we can consider most of this 15+ years old technical debt.

I would suggest you start with a very high limit, get it fixed, and then you have a limit that PMD can apply, even if stupidly high.
Then it can go down little by little, until you exhaust the time you want to dedicate on it.

Yeah, 500 sounded like a stupidly high limit, sounds like a good one to start with, with 4 offenders per project.

These days I won’t be able to do much, with full lockdown in place and kids at home, it’s a lot if I can run some cleanups
where IntelliJ does most of the work for me :smiley: (which is, what I’ve been doing for the past month or two).

Don’t worry, and thanks for the interest. I know you’ve worked hard to get us where we’re. As said, no rush, and let’s see if others feel like contributing.

Cheers,
Gabriel.

Cheers
Andrea

On Wed, Mar 10, 2021 at 3:13 PM Gabriel Roldan <gabriel.roldan@anonymised.com…403…> wrote:

Hey all,
sorry for cross-posting but this relates to both projects.

One of the PMD rules I’d like to set up the most is ExcessiveMEthodLength. It’ll just fail on a threshold for lines of code in a single method.

I’ve made some assessments for 500 and 100 lines of code limits, check this out:

GeoTools:
https://github.com/groldan/geoserver/wiki/ExcessiveMethodLength-PMD-assessment—GeoTools

GeoServer: https://github.com/groldan/geoserver/wiki/ExcessiveMethodLength-PMD-assessment—GeoServer

I think avoiding too large methods is important for maintainability and to lower the barrier entry to the project for other people, because trying to keep methods concise forces you to think of the design and to split them into smaller methods. Ideally, a single method would be succinct and easy to read, for anyone to be able to figure out what it does at a given level of abstraction, and having to drill down to a lower level of abstraction method only if necessary.

I myself often feel overwhelmed when facing such methods, despite having so many years of experience on the codebase, it’s discouraging. So maybe we could do something at least in identifying the most relevant ones and trying to make them cleaner.
Of course some of them would not be a problem cause they’re just, either generated or not, linear setups of state, like org.geotools.gml3.GMLSchema:GMLSchema(), which is 978 lines long. But mostly there’s a lot of code that would benefit from a second round of thinking.

As an example, I tried to tidy up org.geotools.coverage.processing.operation.Resampler2D:reproject(…), which is currently 618 lines long, and made it 49. Most important thing I think is it now reads as intended, instead of having to have so many comments explaining what the following hundred lines of code do.

This is the original:

https://github.com/geotools/geotools/blob/64e0026c62cf37662194b92e8052594d57ef826e/modules/library/coverage/src/main/java/org/geotools/coverage/processing/operation/Resampler2D.java#L245-L863

And this the refactored:

https://github.com/groldan/geotools/blob/d25f6424f8f4188824c840f400c3414a9ce12531/modules/library/coverage/src/main/java/org/geotools/coverage/processing/operation/Resampler2D.java#L302-L351

Not saying it’s perfect, that shared catch-all state class is debatable, but I can certainly get an idea of what the method does at a glimpse and follow through if needed.

I know it’s something that’d require quite significant and lacking resources, but also think it’s worth pursuing. Probably a first round to set a soft limit of 200 lines long or so, with a longer term goal of 100?

Looking forward to your comments,
Cheers.

Gabriel Roldán


GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Regards, Andrea Aime

== GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information. == Ing. Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via di Montramito 3/A 55054 Massarosa (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it ------------------------------------------------------- Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia. This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail.

Gabriel Roldán

Ah yeah sorry… I quickly scanned through the thread and missed the report (I don’t really read mails anymore, just glance at them…).
Yup, 4 per project seem to be a good starting point. Some of those are dragons, they will resist splitting (GetFeature.run).

Cheers
Andrea

···

== GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information. == Ing. Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via di Montramito 3/A 55054 Massarosa (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it ------------------------------------------------------- Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia. This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail.