Page 1 of 1

tesseract and ocrfeeder

PostPosted: Oct 26th, '12, 23:49
by Rich49
I have been trying to do some OCR on Mageia. gocr works but I'm somewhat underwhelmed with the results so I've installed ocrfeeder and tesseract, but can't get either to work. When I try to run ocrfeeder I get an icon in my task bar for a few sceonds but nothing else happens. If I attempt to use tesseract from a command line I get the following messages:

name_to_image_type:Error:Unrecognized image type:xsane.png
IMAGE::read_header:Error:Can't read this image type:xsane.png
Read of file xsane.png failed.

I have also tried this with a pnm file with the same results (I own both files and have RW access). I have also tried the above on ubuntu using the exact same files and both ocrfeeder and tesseract work fine, so my files would appear to be OK.

I'm running Mageia2 64bit fully updated and have the core, nonfree and tainted release and update channels checked.

Any ideas greatly appreciated thanks...

Re: tesseract and ocrfeeder

PostPosted: Oct 27th, '12, 01:26
by tom_
http://en.wikipedia.org/wiki/Tesseract_(software)

Tesseract up to and including version 2 could only accept TIFF images of simple one column text as inputs. These early versions did not include layout analysis and so inputting multi-columned text, images, or equations produced a garbled output. Since version 3.00 Tesseract has supported output text formatting, hOCR positional information and page layout analysis. Support for a number of new image formats was added using the Leptonica library.


maybe Ubuntu has this Leptonica library and Mageia don't?

Re: tesseract and ocrfeeder

PostPosted: Oct 27th, '12, 14:32
by doktor5000
Well, leptonica was imported/submitted just recently, as it needed to be relicensed first before inclusion into Mageia.
BTW the author was really nice and changed the license after i noticed him about the problem. But tesseract should also work without leptonica.
Tesseract should be available for Mageia 3, though.

Apart from that, i can recommend cuneiform, and as frontend either yagf or kbookocr. There are some more alternatives, but not that good, IMHO:
Tesseract-gui, gimagereader, ocrad and Rubuquet+

Re: tesseract and ocrfeeder

PostPosted: Oct 28th, '12, 22:39
by Rich49
doktor5000 wrote:Well, leptonica was imported/submitted just recently, as it needed to be relicensed first before inclusion into Mageia.
BTW the author was really nice and changed the license after i noticed him about the problem. But tesseract should also work without leptonica.
Tesseract should be available for Mageia 3, though.

Apart from that, i can recommend cuneiform, and as frontend either yagf or kbookocr. There are some more alternatives, but not that good, IMHO:
Tesseract-gui, gimagereader, ocrad and Rubuquet+


Are you saying that leptonica is in the repositories then, as I can't find it? Ditto cuneiform or I would try that. Is the only other option to compile them from source, and if so could anyonerecommend to me a REALLY SIMPLE guide for doing this?

Also any ideas why tesseract and ocrfeeder (both of which I did install from the repos) aren't working?

Thanks :)

Re: tesseract and ocrfeeder

PostPosted: Oct 29th, '12, 00:21
by doktor5000
leptonica has recently been submitted to Cauldron, so yes it is in the repositories, but may not work on Mageia 2.
I'm using locally built packages for both, or i used to. I'll try to take a look and upload them here. Are you running i586 or x86_64?

For tesseract i don't know, how are you using it? Do you have the needed language files installed, too? Ocrfeeder i don't know, not used it yet.

Re: tesseract and ocrfeeder

PostPosted: Oct 29th, '12, 18:36
by Rich49
doktor5000 wrote:leptonica has recently been submitted to Cauldron, so yes it is in the repositories, but may not work on Mageia 2.
I'm using locally built packages for both, or i used to. I'll try to take a look and upload them here. Are you running i586 or x86_64?

For tesseract i don't know, how are you using it? Do you have the needed language files installed, too? Ocrfeeder i don't know, not used it yet.


If you get the time that would be great thanks, I'm running x86_64.

Which repo do I have to activate to find leptonica there?

I'm running tesseract in a terminal using the command:
tesseract <input image file> <output file>
It installed the English language pack with the main program.

Thanks again :)

Re: tesseract and ocrfeeder

PostPosted: Oct 29th, '12, 20:24
by doktor5000
Beware, cauldron is the development version of Mageia: https://wiki.mageia.org/en/Cauldron


EDIT: Nope, scratch that one, that is no option as that way it pulls in too much dependencies from cauldron,
i.e. a new glibc, shell and so on, just tried it. I'll try to take a look to rebuild those packages for Mageia 2.

Re: tesseract and ocrfeeder

PostPosted: Oct 29th, '12, 23:30
by Rich49
That would be great thanks

Re: tesseract and ocrfeeder

PostPosted: Oct 30th, '12, 00:07
by doktor5000
Well, just tried tesseract, and it only produces garbage from a screenshot of some random english text (screenshot from a website, simple font, black&white) :(
Currently working on an updated rebuild of cuneiform.