#612: g.html2man: parsing leads to man page errors
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@lists.osgeo.org
Type: defect | Status: new
Priority: normal | Milestone: 6.5.0
Component: Docs | Version: 6.4.0 RCs
Keywords: g.html2man | Platform: Unspecified
Cpu: All |
------------------------+---------------------------------------------------
Hi,
tools/g.html2man has a number of parsing problems.
there are a few like cairodriver.1 which happen to start lines with ".",
which gets parsed incorrectly by the man program. e.g. in cairodriver it
lists ouput formats, and the '.pn' of .png gets hijacked and all those
image types end up missing from the resulting man page.
another popular one is <OL><LI> becoming ..IP instead of .IP (e.g.
pngdriver.1 just after "Example")
and yet another is g.parser.1 where #%multiple: gets eaten.
#612: g.html2man: parsing leads to man page errors
--------------------------+-------------------------------------------------
Reporter: hamish | Owner: grass-dev@lists.osgeo.org
Type: defect | Status: new
Priority: normal | Milestone: 6.5.0
Component: Docs | Version: 6.4.0 RCs
Resolution: | Keywords: g.html2man
Platform: Unspecified | Cpu: All
--------------------------+-------------------------------------------------
Comment (by glynn):
Replying to [ticket:612 hamish]:
> tools/g.html2man has a number of parsing problems.
>
> there are a few like cairodriver.1 which happen to start lines with ".",
which gets parsed incorrectly by the man program. e.g. in cairodriver it
lists ouput formats, and the '.pn' of .png gets hijacked and all those
image types end up missing from the resulting man page.
I've committed some fixes in r37386. Apart from escaping dots and single
quotes at the beginning of a line, it doesn't remove leading whitespace
from pre-formatted text and doesn't insert line breaks within .IP "..."
(this last one only affected d.graph).
> another popular one is <OL><LI> becoming ..IP instead of .IP (e.g.
pngdriver.1 just after "Example")
>
> and yet another is g.parser.1 where #%multiple: gets eaten.
Note that the "bad whatis" entries correspond to an HTML file which lacks
a description in the NAME section. This generally only occurs with HTML
files which aren't generated from --html-description.
How is the script supposed to determine whether a '-' in the HTML is a
minus or a hyphen? For now, I've changed it to convert all occurrences of
'-' to '\-'.
> I looked, but I've got no idea how to backport this stuff to the perl
version. does the perl version still need to be there in trunk?
#612: g.html2man: parsing leads to man page errors
--------------------------+-------------------------------------------------
Reporter: hamish | Owner: grass-dev@lists.osgeo.org
Type: defect | Status: new
Priority: normal | Milestone: 6.5.0
Component: Docs | Version: 6.4.0 RCs
Resolution: | Keywords: g.html2man
Platform: Unspecified | Cpu: All
--------------------------+-------------------------------------------------
Comment (by hamish):
Replying to [comment:3 glynn]:
> How is the script supposed to determine whether a '-' in the HTML
> is a minus or a hyphen?
fwiw, lintian's perl detection goes like: http://ftp.de.debian.org/debian/pool/main/l/lintian/lintian_2.2.10.tar.gz
{{{
# Catch hyphens used as minus signs by looking for ones at the
# beginning of a word, but don't generate false positives on \s-1
# (small font), \*(-- (pod2man long dash), or things like \h'-1'.
if ($line =~ /^(
([^\.].*)?
[\s\'\"\`\(\
(?<! \\s | \*\( | \(- | \w\' )
)?
(--?\w+)/ox) {
}}}
> For now, I've changed it to convert all occurrences of '-' to '\-'.
ok; cosmetic rendering errors are better than syntax ones I guess.
#612: g.html2man: parsing leads to man page errors
--------------------------+-------------------------------------------------
Reporter: hamish | Owner: grass-dev@lists.osgeo.org
Type: defect | Status: new
Priority: normal | Milestone: 6.5.0
Component: Docs | Version: 6.4.0 RCs
Resolution: | Keywords: g.html2man
Platform: Unspecified | Cpu: All
--------------------------+-------------------------------------------------
Comment (by glynn):
Replying to [comment:5 hamish]:
> some fixes for non-module help pages (were causing 'mandb -c' whatis
errors) in devbr6 and trunk in r37877, ..
The changes are meaningless; g.html2man.py discards all comments.
To get a suitable whatis entry, the HTML file needs to include a
{{{<h2>NAME</h2>}}} section containing the module name followed by a dash
then the description. This is added automatically by --html-description,
but non-module pages will need to have it added manually.
#612: g.html2man: parsing leads to man page errors
--------------------------+-------------------------------------------------
Reporter: hamish | Owner: grass-dev@lists.osgeo.org
Type: defect | Status: new
Priority: normal | Milestone: 6.5.0
Component: Docs | Version: 6.4.0 RCs
Resolution: | Keywords: g.html2man
Platform: Unspecified | Cpu: All
--------------------------+-------------------------------------------------
Comment (by hamish):
Replying to [comment:6 glynn]:
> The changes are meaningless;
a few qualifiers on that are appropriate: a) currently; b) just for the
python version in gr7. (the perl version in all Gr versions now knows
about it)
> g.html2man.py discards all comments.
the solution I used in the perl version was to check for that meta tag
before the comment stripping code.
> To get a suitable whatis entry, the HTML file needs to include
> a {{{<h2>NAME</h2>}}} section containing the module name
> followed by a dash then the description. This is added
> automatically by --html-description, but non-module pages will
> need to have it added manually.
yeah, I look at doing that first. But the <H2>NAME really wasn't
appropriate for the intro and driver custom HTML pages I looked at and so
I went with the meta-tag solution.
I couldn't see how to make that work with the python version (does
HTMLParser.py strip out the comments before we can get our hands on
them?), and so I left it for now.
> I couldn't see how to make that work with the python version (does
HTMLParser.py strip out the comments before we can get our hands on
them?), and so I left it for now.
It's possible to add a handler for comments, but I don't consider this
appropriate.
Comments are comments; you are supposed to be able to use them as you
wish, without any consequences. The only situation where it's appropriate
for an application to take note of comments in its input is if it intends
to include them as comments in its output.
#612: g.html2man: parsing leads to man page errors
------------------------------+---------------------------------------------
Reporter: hamish | Owner: grass-dev@…
Type: defect | Status: new
Priority: normal | Milestone: 6.5.0
Component: Docs | Version: svn-develbranch6
Keywords: g.html2man, utf8 | Platform: Unspecified
Cpu: All |
------------------------------+---------------------------------------------
Changes (by hamish):
* keywords: g.html2man => g.html2man, utf8
Comment:
(G6.x only)
re. `man` treating flag names as hyphens and breaking them for cut & paste
when utf8 is used, here's some post-processing sed regex to catch many of
them: