On Jul 25, 2008, at 6:56 PM, Glynn Clements wrote:
Michael Barton wrote:
ax.hist() and numpy.histogram() are broken by design. They should
accept an iterator as an argument. Requiring the entire data to be
passed as a list makes them useless for large amounts of data.
Well. For *really* large amounts of data I suppose. And indeed
sometimes people have Gb maps to work with. However, 15 sec. for
histogramming a 30 million cell, 23Mb ASTER file isn't too bad. As you
say, it would faster to use C modules in GRASS for binning if we had
them.
Your comments bring out a number of important points that we need to consider both in this particular case, and more in general too. Here are a few responses.
Things can be written if there is a need for them. But doing things
"right" is more important than doing things "right now".
I agree. This is a proposal accompanied by examples so that others can try it out. Sort of like Cairo.
Many of the current problems with GRASS are due to cases where
expediency was allowed to win out over good design (or *any* design).
[As to the specific task of binning, you could probably get a suitable
tool within 5 minutes by scavenging from r.quantile.]
This would be good to add to r.stats, which already has an interface for this but which only works with float maps.
Sure, we can provide "basic" functionality, but where do you draw the
line? Which features of a "real" statistics package *wouldn't* be
useful in GRASS? I'm really quite worried about the potential for
feature creep in this area.
I guess I'd leave it up to the user/developer base to decide what
kinds of functionality we need. If it is something regularly
necessary, someone will probably craft a script for it. If this is
really useful, it might get translated into C-code. Visualization and
spatial analysis is an important part of GIS functionality to me. In
fact, a lot of graphing programs and stat packages would have
difficulty in dealing with 20, 50, or 100 million points. MatPlotLib
(or something like it) seems like a good tool to have available for
development. It's also an encouragement for people to begin to develop
scripts in Python.
It's also far too low-level. If you start writing lots of scripts
which call matplotlib directly, you will quickly end up with a
situation where one script lets you set the axis colour and the label
font but not the symbols, another lets you control the symbols but not
the font, etc.
It's not nearly as low level as d.graph or bash. Using pyplot or pylab dispenses with even more code. But it's important to see how this would work in a GRASS environment rather than just for creating general purpose graphs. This means it's up to the script/module/GUI developer to decide how much control is appropriate to pass on to users.
[I'm using stylistic properties here for simplicity; there are
probably other properties which are far more important, e.g. being
able to zoom, select log/linear scaling, etc.
If you can't find or aren't happy with existing graphing libraries,
then write one. But don't require each script to include its own
written-from-scratch-in-one-hour graphing library.
My thought was that this could potentially serve as a 'standard' graphic library for Python scripting in GRASS. Depending on the graphic requirements, it could also serve for some of the 'built in' graphing functions in a wxPython GUI environment--that is, things built in to the GUI or modules that ship with GRASS. Obviously, we'd need to try it out and maybe look at other possible alternatives. This library is widely used and well-maintained, and has a lot of functionality. So it seems like a good candidate for something like this.
What is required is something at a level where the script generates
some data then indicates that the data should be displayed as a
scatter plot. The library should take care of the rest (scales, sizes,
colours, ...)
I agree, especially for 'built-in' graphing like histogramming.
without the script having to explicitly read options
from the user and pass them to matplotlib.
The idea is not that we should attempt to create general-purpose graphing applications, but to provide a flexible set of tools that GRASS developers and sophisticated users can use.
Are you going to hard-code the fonts, colours, line width, symbols,
etc, or allow the user to set them?
I guess it depends on the application. Sometimes this is superfluous and other times it's very useful.
Assuming the latter, are you going
to force them to specify font= linecolor= textcolor= etc for every
command that they type, or allow them to set defaults? Assuming the
latter, how will the script obtain this information?
The examples that I did gave the user virtually no choice over graph formatting or the type of graph displayed, although it is easy enough to build it into a script GUI. The existence of a library which allows a programmer to produce diverse, publication quality graphs doesn't mean that it is necessary to push all of the options onto users. This would be creating a general purpose graphing application. And as you point out, other applications can fill that role.
Something like MatPlotLib does give us a way to create an important class of visualization that is largely lacking in GRASS, which is otherwise rich in analytical tools. I realize that people vary in how they respond to information presented in different ways. I am someone who usually much prefers to see a graph than a table of numbers. So having something along the lines of MatPlotLib available seems like a good thing to me.
Also, are these scripts going to be usable from within the GUI? (I.e.
display the graphics in a window created by the GUI, not just dump a
PNG file to the disk).
Currently some scripts are usable in the GUI simply because they are called from the menu. There is no particular integration beyond that. Some scripts (and C modules) produce dense numerical output that could benefit by optionally having it drawn to a graph in a file, rather than only being written to a text file. In other cases, there may be functions, like interactive profiling, that need to be wrapped into the GUI to be fully useable. The current interactive profiling module uses a wxPython library. This is fine, but is difficult to use outside of the wxPython GUI. Because there is considerable demand (at least among the developer community) for maintaining the possibility inside and outside the GUI, I've been looking for something that would work in both environments. That is one reason I like MatPlotLib, although there may be better alternatives I'm not yet aware of. It has a wxPython backend that allows it to be wrapped completely in the GUI and display to a canvas. Or it can create its own display environment. Or it can output to a file. And it is as easy to insert into a stand-along script as it is to embed it into the wxPython GUI environment.
How are the file format, dimensions, filename
etc communicated between the two?
Will the script be able to communicate the set of available display
attributes to the GUI (in such a way that it can distinguish "what to
display" from "how to display it")?
Currently no independent, stand-alone modules communicate with the GUI display. Jachym and Martin have created some hooks to make it possible to send the output from a display command (e.g. d.rast) to the wxPython canvas. I'm not sure if this is still active and I'm not sure that it should be a high priority, though I know that others might disagree.
Since MatPlotLib is pure Python and has a wxPython backend, it shouldn't be too hard to be able to have a graph created by this library display in the mapdisplay canvas. But I'm not sure that is a good idea. The mapdisplay canvas, with its toolbar, is pretty specialized for displaying maps and map-like imagery. Usually, I'd think a user would prefer to have a graph display in a separate window and a different kind of window. This is easy enough to do with MatPlotLib in a wxPython (or other) environment.
If this seems after testing to be a potentially valuable addition to the GRASS system, it would probably be good to build some standardized convenience libraries to easily create graphs with a standard look, manage data flows from grass modules to MatPlotLib/Numpy, create a standard graph display window (the toolbar that comes with MatPlotLib is only partly useful IMHO), etc. Perhaps this is what you meant above. It's also the kind of examples I did--axis labels, title, scaling are all automatic.
This, in a nutshell, is the difference between "software engineering"
and "coding". I've worked with too many "coders"; and by "worked
with", I mean "cleaned up after".
I agree very much. I have far less experience in this than you, but over the past several years, I've had to sort through a lot of poorly designed, accretionary code. On the other hand, almost none of us on the development team are programmers and development has been a long-term accretionary process. But I suppose that is all the more reason to try to work out the concepts better.
That isn't to say that you shouldn't start to write anything until you
have a complete architectural design. Just that you should expect the
first dozen or so attempts to serve as learning exercises rather than
something which will eventually be used. And expect the first few
attempts to tell you more about what *won't* work than what will.
In many regards, trial-and-error can produce a better design than
trying to operate from a purely theoretical perspective. Hindsight
tends to be more accurate than foresight, albeit with the drawback
that it takes rather more effort to obtain.
And some parts of GRASS very much need more in the way of well-though-out design concepts, while others can be more evolutionary. In my mind, that is where we are with a graphing library at the moment. I, at least, think it would be very good to have one. Someone could certainly program one in C, but that seems a lot of work if we can just use one that is already built. That leaves us more resources to build the pieces that are NOT out there to use. I'm simply proposing MatPlotLib as a potential candidate to use for creating graphs in a Python-enabled GRASS. We need to work with it some to see what its potential and limitations are.
Michael