Hi,
currently wingrass78 builds on wingrass.fsv.cvut.cz fails with [1]:
VERSION_NUMBER=7.8.8dev VERSION_DATE=2022 MODULE_TOPDIR=../.. \
python3 /usr/src/grass78/dist.x86_64-w64-mingw32/tools/mkhtml.py v.random > /usr/src/grass78/dist.x86_64-w64-mingw32/docs/html/v.random.html
Traceback (most recent call last):
File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", line 648, in <module>
git_commit = get_last_git_commit(
File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", line 235, in get_last_git_commit
stdout = decode(stdout)
File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", line 111, in decode
return bytes_.decode(enc)
File "C:\\OSGeo4W\\apps\\Python39\lib\encodings\cp1250.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 58: character maps to <undefined>
On Windows 2016 I can't change encoding to UTF-8 (I am getting a similar error with cp1252). I tried to set up LC_ALL or PYTHONIOENCODING, but nothing helped. Default encoding reported by locale.getdefaultlocale() is still cp1250/cp1252 and not UTF-8. Any idea how I can change default encoding on Windows to UTF-8?
Thanks in advance! Martin
[1] [https://wingrass.fsv.cvut.cz/grass78/x86_64/logs/log-r1f724052b-1/package.log](https://wingrass.fsv.cvut.cz/grass78/x86_64/logs/log-r1f724052b-1/package.log)
···
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
Dear all,
út 23. 8. 2022 v 18:49 odesílatel Martin Landa <landa.martin@gmail.com> napsal:
python3 /usr/src/grass78/dist.x86_64-w64-mingw32/tools/mkhtml.py v.random > /usr/src/grass78/dist.x86_64-w64-mingw32/docs/html/v.random.html
Traceback (most recent call last):
File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", line 648, in <module>
git_commit = get_last_git_commit(
File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", line 235, in get_last_git_commit
stdout = decode(stdout)
File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", line 111, in decode
return bytes_.decode(enc)
File "C:\\OSGeo4W\\apps\\Python39\lib\encodings\cp1250.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 58: character maps to <undefined
the question is also why we are using default OS encoding to decode HTML pages [1]. Couldn’t we simply use UTF-8 regardless of OS system locale?
Martin
[1] https://github.com/OSGeo/grass/blob/releasebranch_7_8/tools/mkhtml.py#L93
···
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
st 24. 8. 2022 v 10:24 odesílatel Martin Landa <landa.martin@gmail.com> napsal:
the question is also why we are using default OS encoding to decode HTML pages [1]. Couldn’t we simply use UTF-8 regardless of OS system locale?
see also related PR: https://github.com/OSGeo/grass/pull/2533
Martin
···
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
Hi Martin,
On Wed, 24 Aug 2022 at 04:25, Martin Landa <landa.martin@gmail.com> wrote:
the question is also why we are using default OS encoding to decode HTML pages [1]. Couldn’t we simply use UTF-8 regardless of OS system locale?
This seems to be some general confusion around that, or more likely just some legacy code.
The lib/gis/parser_html.c puts iso-8859-1 into the HTML files (I just checked that now), so that’s what an HTML reader should be using. That’s of course not what we want at this point. It just should be UTF-8 everywhere.
The HTML files may already use UTF-8 (?), but the parser may emit HTML in system-dependent encoding. However, the source code it is using should be UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about.
Vaclav
Hi Vaclav,
st 24. 8. 2022 v 10:41 odesílatel Vaclav Petras <wenzeslaus@gmail.com> napsal:
The lib/gis/parser_html.c puts iso-8859-1 into the HTML files (I just checked that now), so that’s what an HTML reader should be using. That’s of course not what we want at this point. It just should be UTF-8 everywhere.
+1 for switching to UTF-8
The HTML files may already use UTF-8 (?), but the parser may emit HTML in system-dependent encoding. However, the source code it is using should be UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about.
I am not sure why the parser should emit HTML in system-dependent encoding. Why simply not use UTF-8 as suggested in PR [1]?
Back to the original problem, how can we solve the problem with compilation on Windows 2016 without changing the code base of grass78 significantly? BTW, I was able to compile grass78 on the same machine a few weeks ago and I don’t see any related changes in v.random.html… (?)
Martin
···
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
On Wed, Aug 24, 2022, 4:51 AM Martin Landa <landa.martin@gmail.com> wrote:
Hi Vaclav,
st 24. 8. 2022 v 10:41 odesílatel Vaclav Petras <wenzeslaus@gmail.com> napsal:
The lib/gis/parser_html.c puts iso-8859-1 into the HTML files (I just checked that now), so that’s what an HTML reader should be using. That’s of course not what we want at this point. It just should be UTF-8 everywhere.
+1 for switching to UTF-8
The HTML files may already use UTF-8 (?), but the parser may emit HTML in system-dependent encoding. However, the source code it is using should be UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about.
I am not sure why the parser should emit HTML in system-dependent encoding. Why simply not use UTF-8 as suggested in PR [1]?
It should emit UTF-8, I don’t know what it does now.
Back to the original problem, how can we solve the problem with compilation on Windows 2016 without changing the code base of grass78 significantly? BTW, I was able to compile grass78 on the same machine a few weeks ago and I don’t see any related changes in v.random.html… (?)
The PR looks okay on the surface. Maybe you can just remove the problematic character in 7.8.
Martin
–
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
Hi,
st 24. 8. 2022 v 12:03 odesílatel Vaclav Petras <wenzeslaus@gmail.com> napsal
Back to the original problem, how can we solve the problem with compilation on Windows 2016 without changing the code base of grass78 significantly? BTW, I was able to compile grass78 on the same machine a few weeks ago and I don’t see any related changes in v.random.html… (?)
The PR looks okay on the surface. Maybe you can just remove the problematic character in 7.8.
Source of the problem is a git log message [1], not the manual page itself. I modified PR [2]:
- HTML file is decode using ISO-8859-1
- git log message is decoded using UTF-8
Martin
[1] https://github.com/OSGeo/grass/commit/3b6d257bdfc18a58dd42c5ab06c69ada99c56a24
[2] https://github.com/OSGeo/grass/pull/2533/files
···
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
Hi,
st 24. 8. 2022 v 12:03 odesílatel Vaclav Petras <wenzeslaus@gmail.com> napsal:
+1 for switching to UTF-8
The HTML files may already use UTF-8 (?), but the parser may emit HTML in system-dependent encoding. However, the source code it is using should be UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about.
I am not sure why the parser should emit HTML in system-dependent encoding. Why simply not use UTF-8 as suggested in PR [1]?
It should emit UTF-8, I don’t know what it does now.
I have created new PR [1]. Martin
[1] https://github.com/OSGeo/grass/pull/2547
···
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa