... the `[[<GISDBASE>/]<LOCATION_NAME>/] <MAPSET>]` part has lost its
words even though >, < were used and not something which could be
mistaken for a <html tag>. Is the `DoEscape` subroutine converting '>'
to '<' before any unknown html tags are thrown away? If so it should be
moved to after that; see lines 136 and 141:
#2087: grass64 man page: missing words
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
Type: defect | Status: new
Priority: critical | Milestone: 6.4.4
Component: Docs | Version: 6.4.3
Keywords: g.html2man | Platform: Linux
Cpu: All |
------------------------+---------------------------------------------------
Comment(by mlennert):
Replying to [comment:1 neteler]:
> Also in GRASS 7, the final part which is in HTML
>
> {{{
> ... -wxpython | -wx]] [[[<GISDBASE>/]<LOCATION_NAME>/]
> }}}
>
>
> becomes in MAN:
>
> {{{
> ... -wxpython | -wx]] [[[/]/] ]
> }}}
I cannot confirm this. With a freshly checked out and compiled
grass_trunk, I get:
#2087: grass64 man page: missing words
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
Type: defect | Status: new
Priority: critical | Milestone: 6.4.4
Component: Docs | Version: 6.4.3
Keywords: g.html2man | Platform: Linux
Cpu: All |
------------------------+---------------------------------------------------
Comment(by mlennert):
The problem seems to be in the function DoLine, lines 136ff:
{{{
&DoEscape($_);
&DoPara($_);
if (! $preformat) {
if (m/^$/) {return 0};
s#^[ \t]*##;
s#<[^>]*>##g;
}}}
DoEscape is called first, which replaces the < and > by the
respective symbols, and then, in the last line of DoLine, these symbols
and everything between them is replace by an empty string. Commenting out
the last line, i.e. s#<[^>]*>##g;, solves the problem for grass6.html, but
I don't know what other effects this has.
#2087: grass64 man page: missing words
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
Type: defect | Status: new
Priority: critical | Milestone: 6.4.4
Component: Docs | Version: 6.4.3
Keywords: g.html2man | Platform: Linux
Cpu: All |
------------------------+---------------------------------------------------
Comment(by mlennert):
Replying to [comment:3 mlennert]:
> The problem seems to be in the function DoLine, lines 136ff:
>
>
> {{{
> &DoEscape($_);
> &DoPara($_);
> if (! $preformat) {
> if (m/^$/) {return 0};
> s#^[ \t]*##;
> s#<[^>]*>##g;
> }}}
>
> DoEscape is called first, which replaces the < and > by the
respective symbols, and then, in the last line of DoLine, these symbols
and everything between them is replace by an empty string. Commenting out
the last line solves the problem for grass6.html, but I don't know what
other effects this has.
It leaves in a series of HTML tags. So the art will be to erase all these
tags, without erasing the <> around the variable names. This said, do we
really need those ?
#2087: grass64 man page: missing words
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
Type: defect | Status: new
Priority: critical | Milestone: 6.4.4
Component: Docs | Version: 6.4.3
Keywords: g.html2man | Platform: Linux
Cpu: All |
------------------------+---------------------------------------------------
Comment(by mlennert):
I've attached a very quick and dirty hack that solves this specific issue
for me. I don't find it particularly elegant, though. Maybe someone with
more perl/regex foo can find a better solution.
#2087: grass64 man page: missing words
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
Type: defect | Status: new
Priority: critical | Milestone: 6.4.4
Component: Docs | Version: 6.4.3
Keywords: g.html2man | Platform: Linux
Cpu: All |
------------------------+---------------------------------------------------
Comment(by wenzeslaus):
Replying to [comment:4 mlennert]:
> without erasing the <> around the variable names. This said, do we
really need those ?
I would say no. We are still following man pages formatting in HTML and I
don't think that `<` and `>` are part of it. For example, this is my `man
grep`:
By the way, I'm still not sure if parsing whole HTML pages is a good idea.
If I would start from scratch I would probably use module's HTML stub and
XML interface description because XML is easier to parse than HTML tag
soup (but since we already have the parsing and Makefiles are also
designed for parsing whole HTML it is probably not worth trying).
#2087: grass64 man page: missing words
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
Type: defect | Status: new
Priority: critical | Milestone: 6.4.4
Component: Docs | Version: 6.4.3
Keywords: g.html2man | Platform: Linux
Cpu: All |
------------------------+---------------------------------------------------
Comment(by glynn):
Replying to [comment:5 mlennert]:
> Maybe someone with more perl/regex foo can find a better solution.
Does using g.html2man.py from GRASS 7 qualify?
The main drawbacks are that it makes Python a build-time dependency (but
eliminates the Perl dependency), and may require some clean-up of the HTML
files (the Python version will fail hard on invalid HTML).
#2087: grass64 man page: missing words
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
Type: defect | Status: new
Priority: critical | Milestone: 6.4.4
Component: Docs | Version: 6.4.3
Keywords: g.html2man | Platform: Linux
Cpu: All |
------------------------+---------------------------------------------------
Comment(by wenzeslaus):
Replying to [comment:7 glynn]:
> Replying to [comment:5 mlennert]:
>
> > Maybe someone with more perl/regex foo can find a better solution.
>
> Does using g.html2man.py from GRASS 7 qualify?
>
> The main drawbacks are that it makes Python a build-time dependency (but
eliminates the Perl dependency), and may require some clean-up of the HTML
files (the Python version will fail hard on invalid HTML).
I would just remove the problematic `<` and `>` and leave GRASS 6 (core)
without Python (build) dependency. (We have two versions of GRASS, let's
keep them different from each other.)
#2087: grass64 man page: missing words
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
Type: defect | Status: new
Priority: critical | Milestone: 6.4.4
Component: Docs | Version: 6.4.3
Keywords: g.html2man | Platform: Linux
Cpu: All |
------------------------+---------------------------------------------------
Comment(by mlennert):
Replying to [comment:8 wenzeslaus]:
> Replying to [comment:7 glynn]:
> > Replying to [comment:5 mlennert]:
> >
> > > Maybe someone with more perl/regex foo can find a better solution.
> >
> > Does using g.html2man.py from GRASS 7 qualify?
> >
> > The main drawbacks are that it makes Python a build-time dependency
(but eliminates the Perl dependency), and may require some clean-up of the
HTML files (the Python version will fail hard on invalid HTML).
>
> I would just remove the problematic `<` and `>` and leave GRASS 6 (core)
without Python (build) dependency.
As there were no objections to this, I took the liberty to just erase
these symbols from the file. The resulting html page and man file appear
easily readable to me and I don't think that this issue warrants changing
g.html2man.
Leaving this ticket open for now in case anyone objects now or in case
someone sees the same problem in another man page.
#2087: grass64 man page: missing words
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
Type: defect | Status: new
Priority: critical | Milestone: 6.4.4
Component: Docs | Version: 6.4.3
Keywords: g.html2man | Platform: Linux
Cpu: All |
------------------------+---------------------------------------------------
Comment(by mlennert):
Replying to [comment:9 mlennert]:
> Replying to [comment:8 wenzeslaus]:
> > Replying to [comment:7 glynn]:
> > > Replying to [comment:5 mlennert]:
> > >
> > > > Maybe someone with more perl/regex foo can find a better solution.
> > >
> > > Does using g.html2man.py from GRASS 7 qualify?
> > >
> > > The main drawbacks are that it makes Python a build-time dependency
(but eliminates the Perl dependency), and may require some clean-up of the
HTML files (the Python version will fail hard on invalid HTML).
> >
> > I would just remove the problematic `<` and `>` and leave GRASS 6
(core) without Python (build) dependency.
>
> As there were no objections to this, I took the liberty to just erase
these symbols from the file.
Forgot to mention: r60237 for develbranch6 and r60238 for
releasebranch_6_4.