[GRASS-dev] [GRASS GIS] #151: make documentation be full text searchable

#151: make documentation be full text searchable
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: grass-dev@lists.osgeo.org
     Type: enhancement | Status: new
Priority: major | Milestone: 7.0.0
Component: default | Version: unspecified
Keywords: |
-------------------------+--------------------------------------------------
The current HTML documentation consists of different HTML formated man
pages linked together which offers good help for the experienced user.
But an advantage would be to have a full text search on the documentation:

Use case:
A user wants to remove a mapset or georeference a file but tdoesn't know
which commands to use.

Good example for a full text searchable documentation:
http://docs.python.org/dev/

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable
--------------------------+-------------------------------------------------
  Reporter: timmie | Owner: epatton
      Type: enhancement | Status: assigned
  Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Resolution: | Keywords:
  Platform: Unspecified | Cpu: Unspecified
--------------------------+-------------------------------------------------
Changes (by hamish):

* cc: grass-dev@lists.osgeo.org (added)

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/151#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
--------------------------+-------------------------------------------------
  Reporter: timmie | Owner: epatton
      Type: enhancement | Status: assigned
  Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Resolution: | Keywords:
  Platform: Unspecified | Cpu: Unspecified
--------------------------+-------------------------------------------------
Changes (by neteler):

  * summary: make documentation be full text searchable => make
              documentation be full text searchable: use
              sphinx

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
--------------------------+-------------------------------------------------
  Reporter: timmie | Owner: epatton
      Type: enhancement | Status: assigned
  Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Resolution: | Keywords:
  Platform: Unspecified | Cpu: Unspecified
--------------------------+-------------------------------------------------
Comment (by neteler):

I have locally converted '''most''' pages (using html2rest.py by Gerard
Flanagan at http://bazaar.launchpad.net/~grflanagan/python-
rattlebag/trunk/annotate/head:/src/html2rest.py ), a set of the GRASS HTML
files fails with problems like
{{{
reST markup error:
/home/neteler/grass65/dist.x86_64-unknown-linux-
gnu/docs/html/rst/source/r.coin.rst:66: (SEVERE/4) Title level
inconsistent:

:
:
make: *** [html] Error 1
}}}

or

{{{
reST markup error:
/home/neteler/grass65/dist.x86_64-unknown-linux-
gnu/docs/html/rst/source/r.cost.rst:183: (SEVERE/4) Title level
inconsistent:

Algorithm notes
```````````````
make: *** [html] Error 1
}}}

This indicates to some extent HTML errors in the original as well as
Sphinx problems with the tags
{{{
<dt> ...
<dd> ...
}}}

So with some effort the HTML pages could be made Sphinx compliant (perfect
power user job).

Here the list of failing HTML files in 6.5.svn: d.graph.rst, d.his.rst,
d.linegraph.rst, d.mapgraph.rst, d.menu.rst, d.out.file.rst,
d.text.freetype.rst, g.gisenv.rst, g.message.rst, grass6.rst,
g.region.rst, i.ortho.photo.rst, m.proj.rst, ps.map.rst, r.category.rst,
r.coin.rst, r.cost.rst, r.distance.rst, r.in.gdal.rst, r.in.xyz.rst,
r.mfilter.fp.rst, r.mfilter.rst, r.out.gdal.rst, r.proj.rst, r.ros.rst,
r.spreadpath.rst, r.spread.rst, r.terraflow.rst, r.tileset.rst,
r.watershed.rst, r.what.rst, v.label.rst, v.lidar.correction.rst,
v.lidar.edgedetection.rst, v.lidar.growing.rst, v.outlier.rst,
v.reclass.rst, v.segment.rst, v.surf.bspline.rst.

I've put everything online give you an impression (yes, partially messy
but not so bad...):

http://grass.osgeo.org/grass65/manuals/sphinx/

Markus

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
--------------------------+-------------------------------------------------
  Reporter: timmie | Owner: epatton
      Type: enhancement | Status: assigned
  Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Resolution: | Keywords:
  Platform: Unspecified | Cpu: Unspecified
--------------------------+-------------------------------------------------
Comment (by neteler):

Here the procedure:

{{{
cd dist.x86_64-unknown-linux-gnu/docs/html/

# convert HTML to rEST:
mkdir rst
cd rst
for i in ../*.html ; do echo "$i:"; html2rest.py < $i > `basename $i
.html`.rst ; done

sphinx-quickstart

# to avoid name conflict or define better in sphinx-quickstart:
mv index.rst oldindex.rst
mv *.rst source/

# convert with sphinx
make html
}}}

The resulting Sphinx-HTML manual is stored in the build/ directory.

Markus

PS: once the Wiki is back this should go there

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
--------------------------+-------------------------------------------------
  Reporter: timmie | Owner: epatton
      Type: enhancement | Status: assigned
  Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Resolution: | Keywords:
  Platform: Unspecified | Cpu: Unspecified
--------------------------+-------------------------------------------------
Comment (by hamish):

Replying to [comment:6 neteler]:
...
> This indicates to some extent HTML errors in the original as well as
Sphinx problems with the tags
> {{{
> <dt> ...
> <dd> ...
> }}}
>
> So with some effort the HTML pages could be made Sphinx compliant
(perfect power user job).
>
> Here the list of failing HTML files in 6.5.svn: d.graph.rst, d.his.rst,
d.linegraph.rst, d.mapgraph.rst, d.menu.rst, d.out.file.rst,
d.text.freetype.rst, g.gisenv.rst, g.message.rst, grass6.rst,
g.region.rst, i.ortho.photo.rst, m.proj.rst, ps.map.rst, r.category.rst,
r.coin.rst, r.cost.rst, r.distance.rst, r.in.gdal.rst, r.in.xyz.rst,
r.mfilter.fp.rst, r.mfilter.rst, r.out.gdal.rst, r.proj.rst, r.ros.rst,
r.spreadpath.rst, r.spread.rst, r.terraflow.rst, r.tileset.rst,
r.watershed.rst, r.what.rst, v.label.rst, v.lidar.correction.rst,
v.lidar.edgedetection.rst, v.lidar.growing.rst, v.outlier.rst,
v.reclass.rst, v.segment.rst, v.surf.bspline.rst.

all of the above should (now) be html bug-free, as checked by dillo's lint
verifier.

if that is so and all is valid HTML, remaining problems should are for the
sphinx people to fix IMO.

> I've put everything online give you an impression (yes,
> partially messy but not so bad...):
> http://grass.osgeo.org/grass65/manuals/sphinx/

specifically, bolds and newlines need work.

Hamish

ps- reST is good stuff.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
--------------------------+-------------------------------------------------
  Reporter: timmie | Owner: epatton
      Type: enhancement | Status: assigned
  Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Resolution: | Keywords:
  Platform: Unspecified | Cpu: Unspecified
--------------------------+-------------------------------------------------
Comment (by neteler):

If HTML is bugfree then it depends on
http://bazaar.launchpad.net/~grflanagan/python-
rattlebag/trunk/annotate/head:/src/html2rest.py which perhaps needs some
tweaks to write clean reST.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:9&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
--------------------------+-------------------------------------------------
  Reporter: timmie | Owner: epatton
      Type: enhancement | Status: assigned
  Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Resolution: | Keywords:
  Platform: Unspecified | Cpu: Unspecified
--------------------------+-------------------------------------------------
Comment (by hamish):

FWIW, reStructuredText (reST) docs:
http://docutils.sourceforge.net/rst.html

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/151#comment:10&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
--------------------------+-------------------------------------------------
  Reporter: timmie | Owner: epatton
      Type: enhancement | Status: assigned
  Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Resolution: | Keywords:
  Platform: Unspecified | Cpu: Unspecified
--------------------------+-------------------------------------------------
Comment (by neteler):

Came across another HTML to reST (Sphinx) converter:

http://johnmacfarlane.net/pandoc/

Online try (throw in GRASS HTML file):
http://johnmacfarlane.net/pandoc/try

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:11&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: epatton
     Type: enhancement | Status: assigned
Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by martinl):

Please follow wiki page
http://grass.osgeo.org/wiki/Man_Pages_Improvement_Sprint

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:12&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: epatton
     Type: enhancement | Status: assigned
Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by hamish):

Hi,

after extensive use of reST + sphinx for the osgeo LiveDVD*
(live.osgeo.org) documentation and website over the last year+, I am now
of the opinion that GRASS's current html-source man pages are far superior
to what would be accomplished by reST-source man pages; both in terms of
expressibility and aggravation. The critical thing is to get it into a
stable mark-up language, once there there's little reason (besides the
usual bugs) why html2rest or some any2pdf style program couldn't translate
between them and make a search index. Maybe wikimedia is a bit easier
markup language than html, but if you are reading this you are highly
likely to be smart enough to learn that <b> means bold and we don't
actually do much complicated with it. I think we forget how simple stock
HTML really is, and that when it comes to documentation, the steak is much
more important than the sizzle.

[*] https://trac.osgeo.org/osgeo/browser/livedvd/gisvm/trunk/doc/

best,
Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/151#comment:13&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: epatton
     Type: enhancement | Status: assigned
Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by hamish):

i.e. to say, I'd rather invest the time in helping to debug htDig.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/151#comment:14&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: epatton
     Type: enhancement | Status: assigned
Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by neteler):

New URL:
https://bitbucket.org/djerdo/musette/src/tip/musette/html/html2rest.py

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/151#comment:15&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: epatton
     Type: enhancement | Status: assigned
Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by lucadelu):

Some improvements, I obtain a working version of documentation with
sphinx. I really like it but there are some think to fix.
Here the procedure (use a recent version of pandoc, older it's buggy for
me):

{{{
cd dist.x86_64-unknown-linux-gnu/docs/
mkdir rst
# convert html to rst
for i in `ls ../html/*.html`; do pandoc -s -c ../html/grassdocs.css -r
html $i -w rst -o `basename $i .html`.rst; done
# move other files
cp ../html/*.png ../html/*.jpg .
cp ../html/grassdocs.css .
cp ../html/grass_logo.txt .
cp -rf ../html/icons/ .
# start sphinx
sphinx-quickstart
# move all to source directory
mv *.rst *.png *.jpg icons/ grass* source/
# create html documentation
make html
}}}

In the next weeks I'll try to study a little bit of sphinx to fix some
problems

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:16&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: epatton
     Type: enhancement | Status: assigned
Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by lucadelu):

In r52658 I added a first implementation of reStructuredText documentation
for grass7. It uses the --rest-description flag and the
[http://johnmacfarlane.net/pandoc/ pandoc] software.[[BR]]
You can simple run
{{{
  run make restdocs
  cd dist.XXXX/doc/rest
  make html
}}}
to create the documentation in rest format and to convert to beautiful
HTML using sphinx. There are some issues still open, in order of
importance level (if someone with good skill in makefile system wants to
help me it would be really appreciated):
  * launching only "make", the reStructuredText documentation should not be
created but some documents are created;
  * I cannot convert helptext.html and wxgui documentation due to some Make
problems;
  * There are some documents with bad indentation because "pandoc" wrongs
to convert <br> tag, the solution should be: remove white space if second
character is not another white space, but some problem could remain ;
  * Some other problems remain (special chars, formatting) in the new rest
pages.

Once solved, the resulting HTML pages could replace the current manual
pages (since also search is provided).

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:17&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: epatton
     Type: enhancement | Status: assigned
Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by hamish):

Replying to [comment:17 lucadelu]:
> Once solved, the resulting HTML pages could replace the current manual
> pages (since also search is provided).

erhm, once solved and building in parallel ''discussion'' on if that
should happen could begin. Personally I am not in favour of throwing away
all the strongly marked up work we have done in the description.html files
in favour of the rather erratic and obscure markup of reSt for those
pages. Perhaps 'finicky' is a better word. I'm happy to see the help pages
get pretty, and yes reSt+sphinx-alikes is very pretty, but would like it
to be in parallel, and reSt translated from our existing HTML docs
automatically (ie the description.html parts), in the same way (or better)
than the man pages are now.

I don't think lack of a working htDig install*, or reliance on "site:"
google search, is a fatal blow for html.

[*] (is that still the case? if so as offered earlier, I'm happy to spend
a little time on it)

thanks,
Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/151#comment:18&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: epatton
     Type: enhancement | Status: assigned
Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by hellik):

Replying to [comment:17 lucadelu]:
> In r52658 I added a first implementation of reStructuredText
documentation for grass7. It uses the --rest-description flag and the
[http://johnmacfarlane.net/pandoc/ pandoc] software.[[BR]]

does this mean that pandoc would be another extern dependecy to get the
docs?

on windows there would be needed an extra step installing pandoc
(http://johnmacfarlane.net/pandoc/installing.html).

Helmut

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:19&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: epatton
     Type: enhancement | Status: assigned
Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by hamish):

[slight follow up]

Hamish wrote:

> , but would like it to be in parallel, and reSt translated from our
existing
> HTML docs automatically

I am glad to see that is indeed the case, but <br> -> <br /> and <br> ->
<p> in all the html files?! if the converter is broken, fix the converter!
"<br>" is not a tough one to parse..

thanks,
Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/151#comment:20&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: epatton
     Type: enhancement | Status: assigned
Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by lucadelu):

Replying to [comment:18 hamish]:
> Replying to [comment:17 lucadelu]:
> > Once solved, the resulting HTML pages could replace the current manual
> > pages (since also search is provided).
>
> erhm, once solved and building in parallel ''discussion'' on if that
should happen could begin. Personally I am not in favour of throwing away
all the strongly marked up work we have done in the description.html files
in favour of the rather erratic and obscure markup of reSt for those
pages. Perhaps 'finicky' is a better word. I'm happy to see the help pages
get pretty, and yes reSt+sphinx-alikes is very pretty, but would like it
to be in parallel, and reSt translated from our existing HTML docs
automatically (ie the description.html parts), in the same way (or better)
than the man pages are now.
>

yes no problem for me to keep both versions

>
> thanks,
> Hamish

best
Luca

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:21&gt;
GRASS GIS <http://grass.osgeo.org>

#151: make documentation be full text searchable: use sphinx
-------------------------+--------------------------------------------------
Reporter: timmie | Owner: epatton
     Type: enhancement | Status: assigned
Priority: major | Milestone: 7.0.0
Component: Website | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by lucadelu):

Replying to [comment:19 hellik]:
> Replying to [comment:17 lucadelu]:
> > In r52658 I added a first implementation of reStructuredText
documentation for grass7. It uses the --rest-description flag and the
[http://johnmacfarlane.net/pandoc/ pandoc] software.[[BR]]
>
> does this mean that pandoc would be another extern dependecy to get the
docs?
>

so right now I only test on Linux, if pandoc it missing return an error
but it is not reported at the end of make process. For the future I hope
to fix compilation issue and run compile restructured text only with make
restdocs and not like now only with make. If someone can help in Make
configuration it's really appreciated.[[BR]]

Could you test compilation on windows please?

> Helmut

best
Luca

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/151#comment:22&gt;
GRASS GIS <http://grass.osgeo.org>