tesseract and ocrfeeder

tesseract and ocrfeeder

Postby Rich49 » Oct 26th, '12, 23:49

I have been trying to do some OCR on Mageia. gocr works but I'm somewhat underwhelmed with the results so I've installed ocrfeeder and tesseract, but can't get either to work. When I try to run ocrfeeder I get an icon in my task bar for a few sceonds but nothing else happens. If I attempt to use tesseract from a command line I get the following messages:

name_to_image_type:Error:Unrecognized image type:xsane.png
IMAGE::read_header:Error:Can't read this image type:xsane.png
Read of file xsane.png failed.

I have also tried this with a pnm file with the same results (I own both files and have RW access). I have also tried the above on ubuntu using the exact same files and both ocrfeeder and tesseract work fine, so my files would appear to be OK.

I'm running Mageia2 64bit fully updated and have the core, nonfree and tainted release and update channels checked.

Any ideas greatly appreciated thanks...
Rich49
 
Posts: 4
Joined: Oct 26th, '12, 23:36

Re: tesseract and ocrfeeder

Postby tom_ » Oct 27th, '12, 01:26

http://en.wikipedia.org/wiki/Tesseract_(software)

Tesseract up to and including version 2 could only accept TIFF images of simple one column text as inputs. These early versions did not include layout analysis and so inputting multi-columned text, images, or equations produced a garbled output. Since version 3.00 Tesseract has supported output text formatting, hOCR positional information and page layout analysis. Support for a number of new image formats was added using the Leptonica library.


maybe Ubuntu has this Leptonica library and Mageia don't?
tom_
 
Posts: 423
Joined: Sep 3rd, '11, 12:26
Location: Porto Ercole, Italy

Re: tesseract and ocrfeeder

Postby doktor5000 » Oct 27th, '12, 14:32

Well, leptonica was imported/submitted just recently, as it needed to be relicensed first before inclusion into Mageia.
BTW the author was really nice and changed the license after i noticed him about the problem. But tesseract should also work without leptonica.
Tesseract should be available for Mageia 3, though.

Apart from that, i can recommend cuneiform, and as frontend either yagf or kbookocr. There are some more alternatives, but not that good, IMHO:
Tesseract-gui, gimagereader, ocrad and Rubuquet+
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18045
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: tesseract and ocrfeeder

Postby Rich49 » Oct 28th, '12, 22:39

doktor5000 wrote:Well, leptonica was imported/submitted just recently, as it needed to be relicensed first before inclusion into Mageia.
BTW the author was really nice and changed the license after i noticed him about the problem. But tesseract should also work without leptonica.
Tesseract should be available for Mageia 3, though.

Apart from that, i can recommend cuneiform, and as frontend either yagf or kbookocr. There are some more alternatives, but not that good, IMHO:
Tesseract-gui, gimagereader, ocrad and Rubuquet+


Are you saying that leptonica is in the repositories then, as I can't find it? Ditto cuneiform or I would try that. Is the only other option to compile them from source, and if so could anyonerecommend to me a REALLY SIMPLE guide for doing this?

Also any ideas why tesseract and ocrfeeder (both of which I did install from the repos) aren't working?

Thanks :)
Rich49
 
Posts: 4
Joined: Oct 26th, '12, 23:36

Re: tesseract and ocrfeeder

Postby doktor5000 » Oct 29th, '12, 00:21

leptonica has recently been submitted to Cauldron, so yes it is in the repositories, but may not work on Mageia 2.
I'm using locally built packages for both, or i used to. I'll try to take a look and upload them here. Are you running i586 or x86_64?

For tesseract i don't know, how are you using it? Do you have the needed language files installed, too? Ocrfeeder i don't know, not used it yet.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18045
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: tesseract and ocrfeeder

Postby Rich49 » Oct 29th, '12, 18:36

doktor5000 wrote:leptonica has recently been submitted to Cauldron, so yes it is in the repositories, but may not work on Mageia 2.
I'm using locally built packages for both, or i used to. I'll try to take a look and upload them here. Are you running i586 or x86_64?

For tesseract i don't know, how are you using it? Do you have the needed language files installed, too? Ocrfeeder i don't know, not used it yet.


If you get the time that would be great thanks, I'm running x86_64.

Which repo do I have to activate to find leptonica there?

I'm running tesseract in a terminal using the command:
tesseract <input image file> <output file>
It installed the English language pack with the main program.

Thanks again :)
Rich49
 
Posts: 4
Joined: Oct 26th, '12, 23:36

Re: tesseract and ocrfeeder

Postby doktor5000 » Oct 29th, '12, 20:24

Beware, cauldron is the development version of Mageia: https://wiki.mageia.org/en/Cauldron


EDIT: Nope, scratch that one, that is no option as that way it pulls in too much dependencies from cauldron,
i.e. a new glibc, shell and so on, just tried it. I'll try to take a look to rebuild those packages for Mageia 2.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18045
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: tesseract and ocrfeeder

Postby Rich49 » Oct 29th, '12, 23:30

That would be great thanks
Last edited by doktor5000 on Oct 29th, '12, 23:42, edited 1 time in total.
Reason: removed fullquote
Rich49
 
Posts: 4
Joined: Oct 26th, '12, 23:36

Re: tesseract and ocrfeeder

Postby doktor5000 » Oct 30th, '12, 00:07

Well, just tried tesseract, and it only produces garbage from a screenshot of some random english text (screenshot from a website, simple font, black&white) :(
Currently working on an updated rebuild of cuneiform.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18045
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany


Return to Printers and Scanners

Who is online

Users browsing this forum: No registered users and 0 guests