[GRASS-dev] Python and script header definitions for modules

wenzeslaus · April 21, 2021, 3:11am

Dear all,

After PR 1446, g.parser now accepts # % in addition to #% in order to allow following of the Python practice starting a block comment with # . This is checked by Flake8 E265. This resolved numerous warnings which you would otherwise get for each module when using Flake8 with default, that is, expected, settings. Consequently, users writing their own scripts/modules can use Flake8 with default settings without getting a flood of messages.

However, not all issues were addressed by that. Many modules generate “E501 Line too long” warnings. Following default settings for the Black formater, we are now using 88 characters as line limit. Option and flag descriptions and the value of descriptions describing options* have often more characters than that.

Currently, the E501 warning is disabled for all files in scripts and temporal directories and for selected g.gui.*.py files in gui/wxpython. However, we want to have the warning enabled globally. Although, Black takes care of most of the long lines, it does not touch some lines, namely long strings, and thus we want each file to be checked for Flake8 E501 compliance.

I would like to disable the check in each file just for the specific block of code, however, this is not possible because Flake8 does not allow disabling for blocks of code. Requiring the lines to be shorter won’t work either because the descriptions item needs to be long. This leaves us with the following options:

Use per-file ignores to disable the warning and keep adding the files which need it. This makes the Flake8 configuration larger over time while our goal is to make it smaller over time. It also leaves the warning disabled for (other code in) these files.
Add inline Flake8 ignore comment to the offending line. This will make the line little longer, but it would be a good solution for normal Python code. However, in case of the script header, we would need to teach g.parser to understand trailing comments inside the relevant fields so that the Flake8 ignore comment does not leak into the user interface description.
According to the PEP 257 - Docstring Conventions document, the ‘docstring of a script (a stand-alone program) should be usable as its “usage” message.’ I don’t think sticking something like our parser instructions into the docstring was what the authors had in mind. Additionally, it is not used like this either as far as I can tell. However, it would solve our issue. Just adding """ before the definition and adding """ # noqa: E501 after that disables the warning for the definition. The nice bonus is that we comply with PEP 257 by providing module docstring and by describing its interface there (in some way). The docstring presence is checked by Pylint’s C0111 “Missing … docstring”.
We modify the parser so that at least some of the items can have multiple lines. However, the parser is currently quite line-oriented and the cost-benefit ratio may be low.
We change the script header definition format to some existing format that can break lines, i.e. allowing multi-line values. A clear candidate for the format is YAML or rather its simpler subset. This would have additional benefits of making the format a standard format. Which in turn would be beneficial for other things, e.g., for easier learning of the syntax. Combining this with option 3, we could drop the # % part to make the YAML more readily readable.

Let me know what you think, what option you like, what you would add, and what you can help with.

Thank you,

Vaclav

Clearly, we should do something about the naming here, too!

Example for option 3:

“”"

%module

% description: Imports raster data into a GRASS raster map using GDAL library and reprojects on the fly.

…

%option

% key: resample

% type: string

% required: no

% multiple: no

% options: nearest,bilinear,bicubic,lanczos,bilinear_f,bicubic_f,lanczos_f

% description: Resampling method to use for reprojection

% descriptions: nearest;nearest neighbor;bilinear;bilinear interpolation;bicubic;bicubic interpolation;lanczos;lanczos filter;bilinear_f;bilinear interpolation with fallback;bicubic_f;bicubic interpolation with fallback;lanczos_f;lanczos filter with fallback

% answer: nearest

% guisection: Output

%end

…

%rules

% required: output,-e

%end

“”" # noqa: E501

Links:

https://github.com/OSGeo/grass/pull/1446
https://www.flake8rules.com/rules/E265.html
https://www.flake8rules.com/rules/E501.html
http://pylint-messages.wikidot.com/messages:c0111

hcho · June 13, 2021, 3:02pm

Vaclav,

It’s a great discussion. I personally think that enforcing Black or Flake8 is less ideal [1], but coding consistency is always good, I agree. Just one excerpt from [1]: “Autoformatters like Black are soulless, they won’t understand how each specific case will be most readable. This should always be the developer’s concern.” Anyway, I consider it a necessary evil.

Option 3 is readily available with no changes in the core library, which is great. It just takes mental effort to add the extra comment (# noqa: E501, especially this number, I’ll keep forgetting it). Does it even solve no space between # and %, I hope?

Like option 5 as well, but can we embed YAML code as Python comments, not as a separate file, which I don’t like. Maybe, as option 6, we write a YAML file and create a small utility that translates it to the current parser format in option 3 and replace it in the Python script? Maybe, then, some developers just write header definitions manually without YAML at all and running the extra script is an additional burden.

Option 4 may not be too bad. We can just concatenate any lines with more than a certain number (4?) of leading spaces to the current definition.

My biggest complaint is “#-space-%-space-key:-space-value”. I just have to type too many spaces manually. We could write a small script that handles this, but again, that’s an extra effort. Is it possible to create hooks so we checkout “#%” and commit “#-space-%” automatically? Maybe, even long definitions can be handled in the same way?

Best,
Huidae

[1] https://luminousmen.com/post/my-unpopular-opinion-about-black-code-formatter

···

Huidae Cho, Ph.D., GISP, /hidɛ t͡ɕo/, 조희대, 曺喜大
GRASS GIS Developer
https://idea.isnew.info/

wenzeslaus · June 16, 2021, 3:53am

On Sun, Jun 13, 2021 at 11:02 AM Huidae Cho <grass4u@gmail.com> wrote:

Option 3 is readily available with no changes in the core library, which is great. It just takes mental effort to add the extra comment (# noqa: E501, especially this number, I’ll keep forgetting it). Does it even solve no space between # and %, I hope?

In that case, you can write #% and no checks will care. When it is in a docstring, we could drop the # part too. Or you somehow mark the beginning and optionally the end and you can drop % or % too, getting close to YAML in the basic syntax (not in the value nesting or indentation).

You need # noqa: E501 only when your lines are long, which is not always. It should/would be part of submitting guidelines for Python code, so then there would be no harm done if one needs to review those during writing of a module.

Like option 5 as well, but can we embed YAML code as Python comments, not as a separate file, which I don’t like. Maybe, as option 6, we write a YAML file and create a small utility that translates it to the current parser format in option 3 and replace it in the Python script? Maybe, then, some developers just write header definitions manually without YAML at all and running the extra script is an additional burden.

I’m not sure what option you prefer, even if you want to write YAML or not, but YAML can be part of the comment, docstring, or a separate file. Keeping it in the Python file has its advantages like you always know you have the file and you can programmatically access it from there. Generating it and placing it into the file is quite doable, but R packages work in the way that many of the files in your package repo are generated and it becomes quite hard to navigate I think, unless you are very familiar with it.

Python script itself is actually calling the g.parser module, so with a proper g.parser interface and wrappers in grass.script, we can be quite flexible in what is accepted without modifying g.parser that much. I’m thinking here of having the YAML as a(any type of) string in Python and then passing it to grass.script.parse_cli_yaml(). Let’s call this option 7. It is kind of 3 and 5 together but with most YAML burden in Python and maybe slower parsing. The nice thing is that you need the same YAML-to-current-parser as for option 6, so doing the conversion part does not commit you to one solution or the other.

Option 4 may not be too bad. We can just concatenate any lines with more than a certain number (4?) of leading spaces to the current definition.

From what I have seen in the source code, this is not a trivial change.

In syntax, it is again going closer to YAML which would be | instead of the value for the first/zeroth row and typically 2 spaces for indent.

My biggest complaint is “#-space-%-space-key:-space-value”. I just have to type too many spaces manually.

If you write it like that, it seems really frightening, but # % key: value doesn’t look that bad. I’m copy-pasting these a lot and I think a lot of people do, so a space here and there is not a big deal. Since the space between % and key is optional, you can also see a lot of inconsistencies related to that. For typing, I don’t like the whole # % piece with spaces or not.

I think we would design the YAML structure so that a typical line would be key: value, i.e., “key:value” assuming it is in a separate file or string, not a comment.

We could write a small script that handles this, but again, that’s an extra effort. Is it possible to create hooks so we checkout “#%” and commit “#-space-%” automatically?

Perhaps a sed one-liner is all that is needed? This looks like what I used to convert things in the source code:

sed -i ‘s/^#%/# %/g’ /.py

Significantly simpler than transitioning to YAML in a docstring, but that would have many other benefits.

Maybe, even long definitions can be handled in the same way?

Seems like too much work, too fragile. I’m afraid that in any scenario, people will have to break their lines manually as they need to do, e.g., with strings in Python and C.

Best,
Vaclav

On Tue, Apr 20, 2021 at 11:11 PM Vaclav Petras <wenzeslaus@gmail.com> wrote:

I would like to disable the check in each file just for the specific block of code, however, this is not possible because Flake8 does not allow disabling for blocks of code. Requiring the lines to be shorter won’t work either because the descriptions item needs to be long. This leaves us with the following options:

Use per-file ignores to disable the warning and keep adding the files which need it. This makes the Flake8 configuration larger over time while our goal is to make it smaller over time. It also leaves the warning disabled for (other code in) these files.

Add inline Flake8 ignore comment to the offending line. This will make the line little longer, but it would be a good solution for normal Python code. However, in case of the script header, we would need to teach g.parser to understand trailing comments inside the relevant fields so that the Flake8 ignore comment does not leak into the user interface description.

According to the PEP 257 - Docstring Conventions document, the ‘docstring of a script (a stand-alone program) should be usable as its “usage” message.’ I don’t think sticking something like our parser instructions into the docstring was what the authors had in mind. Additionally, it is not used like this either as far as I can tell. However, it would solve our issue. Just adding """ before the definition and adding """ # noqa: E501 after that disables the warning for the definition. The nice bonus is that we comply with PEP 257 by providing module docstring and by describing its interface there (in some way). The docstring presence is checked by Pylint’s C0111 “Missing … docstring”.

We modify the parser so that at least some of the items can have multiple lines. However, the parser is currently quite line-oriented and the cost-benefit ratio may be low.

We change the script header definition format to some existing format that can break lines, i.e. allowing multi-line values. A clear candidate for the format is YAML or rather its simpler subset. This would have additional benefits of making the format a standard format. Which in turn would be beneficial for other things, e.g., for easier learning of the syntax. Combining this with option 3, we could drop the # % part to make the YAML more readily readable.

wenzeslaus · June 28, 2021, 3:53am

Black formatting is unrelated to the main issue here and, as Huidae mentioned in #560, it is too late to discuss it. The place to do that was #543 as stated in on grass-dev in February-April and June 1 threads and in #766.

However, I think it is worth clarifying what went into the decision to use Black.

https://github.com/OSGeo/grass-addons/pull/560#issuecomment-866803190

https://github.com/OSGeo/grass/issues/543
https://github.com/OSGeo/grass/pull/766#issuecomment-731637722
[February-April] https://lists.osgeo.org/pipermail/grass-dev/2021-February/094993.html

[June 1] https://lists.osgeo.org/pipermail/grass-dev/2021-June/095199.html

On Sun, Jun 13, 2021 at 11:02 AM Huidae Cho <grass4u@gmail.com> wrote:

It’s a great discussion. I personally think that enforcing Black or Flake8 is less ideal [1], but coding consistency is always good, I agree. Just one excerpt from [1]: “Autoformatters like Black are soulless, they won’t understand how each specific case will be most readable. This should always be the developer’s concern.” Anyway, I consider it a necessary evil.

I considered the blogpost when considering Black. It makes three “I would only use Black if” points. Here are my thoughts.

“I would only use Black if: You have an old code base that has been made by more than one generation of developers and you want to make it more or less readable.” Well, that’s exactly our case. Our formatting was all over the place even with many fixes we did in the past. I would add that if you need to do it once as one of the blogspot comments mentions, you may as well enforce it because you will end up in the same situation after a couple years which I think was not acknowledged there. The other comments are mentioning Darker which is enforcing Black, but only for the new code. Combining that with the ‘apply once to an old code base’ approach, you get our solution.

“I would only use Black if: You don’t want to bother with the styling guide on your little project.” We don’t want to bother with a styling guide and way of enforcing it regardless of project size.

“I would only use Black if: You have a team of juniors that you don’t really want to keep track of but end up wanting a consistent code.” We are in a similar situation. We want clear rules and consistent results, just instead of a team of juniors, we have all kinds of contributors.

It also mentions how Black aims at diffs which is exactly why Black is so appealing since many people spend a lot of time reading diffs.

Finally, the statement from the blogpost, “…they won’t understand how each specific case will be most readable,” seems to go directly against “Special cases aren’t special enough to break the rules.” from PEP 20 – The Zen of Python [2] or the more general one “your case is not special.”

To address the line length issue as well, the number 88 we are using (coming from Black) breaks PEP8 and this is unfortunate. However, I think it falls under the next point from Zen of Python: “Although practicality beats purity.” for reasons discussed, e.g., in Black documentation [3].

[1] https://luminousmen.com/post/my-unpopular-opinion-about-black-code-formatter
[2] https://www.python.org/dev/peps/pep-0020/
[3] https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length