Hello,
This might seem a Little off topic, but maybe someone here can help me.
I need to extract toponomical data from old digitized paper maps. I wish to explore Optical character recognition (OCR).
Does anyone has a suggestion/experience with this kind of challenge?
Thank you,
André Mano
–
Associação Leonel Trindade
SOCIEDADE DE HISTÓRIA NATURAL
Apartado 25 2564-909 Torres Vedras Portugal
Sede e Biblioteca: rua Cavaleiros da Espora Dourada, 27A 2560 Torres Vedras
Laboratório de Paleontologia e Paleoecologia: Polígono Industrial do Alto do Ameal 2565-641 Ramalhal
http://alt-shn.blogspot.com
www.alt-shn.org
On Wed, May 4, 2011 at 11:18 AM, ALT SHN <i.geografica@alt-shn.org> wrote:
Hello,
This might seem a Little off topic, but maybe someone here can help me.
I need to extract toponomical data from old digitized paper maps. I wish to
explore Optical character recognition (OCR).
Does anyone has a suggestion/experience with this kind of challenge?
You could try with this software:
http://www.gnu.org/software/ocrad/ocrad.html
Markus
Thank you for your suggestions!
I’ll begin to explore Ocrad (thanks for the tip Markus).
If I achieve relevant results will share them here.
best regards,
André
2011/5/5 Markus Neteler <neteler@osgeo.org>
On Wed, May 4, 2011 at 11:18 AM, ALT SHN <i.geografica@alt-shn.org> wrote:
Hello,
This might seem a Little off topic, but maybe someone here can help me.
I need to extract toponomical data from old digitized paper maps. I wish to
explore Optical character recognition (OCR).
Does anyone has a suggestion/experience with this kind of challenge?
You could try with this software:
http://www.gnu.org/software/ocrad/ocrad.html
Markus
–
Associação Leonel Trindade
SOCIEDADE DE HISTÓRIA NATURAL
Apartado 25 2564-909 Torres Vedras Portugal
Sede e Biblioteca: rua Cavaleiros da Espora Dourada, 27A 2560 Torres Vedras
Laboratório de Paleontologia e Paleoecologia: Polígono Industrial do Alto do Ameal 2565-641 Ramalhal
http://alt-shn.blogspot.com
www.alt-shn.org
On Thu, 5 May 2011, Markus Neteler wrote:
You could try with this software:
http://www.gnu.org/software/ocrad/ocrad.html
Nah, that won't work. OCR, as the name implies, recognizes text
characters: letters and digits. I've used gocr/jocr for years with varying
degrees of success. Unless the typeface is simple (monospaced, for example)
the software cannot make any sense of it.
A map is not made up of characters, so OCR is inappropriate.
What might work is to open the scanned image in The GIMP, open a
transparent layer on top of it, and trace the lines of interest. Then run
the file (you can try this on the original scanned map, too) through
ImageMagick's 'convert' program to produce a .pdf version. This converts the
image from bit-mapped to vector. Heck, convert might alao produce a .svg
output that one could clean and tweak with Inkscape.
I have a large digitizing tablet with 4-button cursor I'm looking to sell
because it's been years since I've needed to digitize a paper map. Won't
help in this case because shipping this 2'x3' digitizer would be quite
expensive.
Rich
You can also try to select by color in Gimp. I've done this in the past with varying degrees of success. Gimp allows you to adjust the threshold on those types of selects. GRASS's r.thin and r.to.vect might come in handy if you get any useful information from this technique. Obviously, it is dependent on the map and whether or not features that you are hoping to extract have unique, or nearly so, colors.
Good luck,
John
On May 5, 2011, at 5:18 AM, Rich Shepard wrote:
On Thu, 5 May 2011, Markus Neteler wrote:
You could try with this software:
http://www.gnu.org/software/ocrad/ocrad.html
Nah, that won't work. OCR, as the name implies, recognizes text
characters: letters and digits. I've used gocr/jocr for years with varying
degrees of success. Unless the typeface is simple (monospaced, for example)
the software cannot make any sense of it.
A map is not made up of characters, so OCR is inappropriate.
What might work is to open the scanned image in The GIMP, open a
transparent layer on top of it, and trace the lines of interest. Then run
the file (you can try this on the original scanned map, too) through
ImageMagick's 'convert' program to produce a .pdf version. This converts the
image from bit-mapped to vector. Heck, convert might alao produce a .svg
output that one could clean and tweak with Inkscape.
I have a large digitizing tablet with 4-button cursor I'm looking to sell
because it's been years since I've needed to digitize a paper map. Won't
help in this case because shipping this 2'x3' digitizer would be quite
expensive.
Rich
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user
On Mon, May 9, 2011 at 5:06 AM, John C. Tull <jctull@gmail.com> wrote:
You can also try to select by color in Gimp. I've done this in the past with varying degrees of success. Gimp allows you to adjust the threshold on those types of selects. GRASS's r.thin and r.to.vect might come in handy if you get any useful information from this technique. Obviously, it is dependent on the map and whether or not features that you are hoping to extract have unique, or nearly so, colors.
In this regard also check the "r.seg" extension from Alfonso Vitti,
Univ. of Trento
(find in GRASS Addons) which does wonders in preprocessing such data.
Markus