OCR toolkit for use in a C++ program

Asked By Beta Version 260 points N/A Posted on - 05/10/2011

Hi,

I want to make a software that scans an Image (admission form) and identifies the Text within its fields. And puts them in some File. I am doing this in C++. I have some little experience with opencV but not as much as is required to do the OCR thing. So I realized a solution. Call an external Command Line tool which converts an image to text. Then my program just needs to fetch the required data from the text file. Its kind of funny I believe, I don't actually know if I have any other options in hand. So my question is:

"Is there any API sort of thing that can be called from my program to convert an image to text?"

Status: Open
Question Views: 1831
Answer Count: 5
Vote Up 26 Vote Down

Answer Accepted: Yes
Question Category: C++

Best Answer by shababhsiddique

Go To Solution

Best Answer

Answered By shababhsiddique 0 points N/A #91562

OCR toolkit for use in a C++ program

If your question was for the API. Then I don't know of any which is free and yes, if u want to do the same thing in some other way (calling external program which does the OCR thing) as you mentioned then, there is a tool named "Tesseract". It is as far as I know the simplest to start with and the accuracy is also notable. You can download it from Google code:

Download Tesseract

After installing, just issue this in the cmd: Tesseract testimage.tif outputtext

Here testimage.tif is the image file that contains the form and output text is the text file which shall be used to write the extracted data.

About shababhsiddique

Questions
0

Answers
9

Best Answers
3

Vote Up 0 Vote Down

Posted on - 05/11/2011
Question Category: C++

Answered By Beta Version 260 points N/A #91563

OCR toolkit for use in a C++ program

Thanks for the reply. It is useful but how do I convert my scanned image (in jpeg) to tif? And one more thing, tesseract doesn't seem to be saving alignment. A multi lined document just gets messy. Am I doing something wrong?

About Beta Version

Questions
1

Answers
2

Best Answers
0

Vote Up 0 Vote Down

Posted on - 05/11/2011
Question Category: C++

Answered By Ochena Expert 0 points N/A #91564

OCR toolkit for use in a C++ program

As far as I know tesseract doesn't support multi lined OCR. This may be the cause. If you want to convert the admission form to text then, use your OpenCV skills to slice the form to each fields( e. g. name, address number). Save each slice as a different .tif file. This will also solve your conversion problem and then call tesseract to each of this sliced pieces of image to generate text for each field.

About Ochena Expert

Questions
0

Answers
3

Best Answers
0

Vote Up 0 Vote Down

Posted on - 05/11/2011
Question Category: C++

Answered By Beta Version 260 points N/A #91565

OCR toolkit for use in a C++ program

I am trying to do as you said, but a little problem.

cvSetImageROI(img, cvRect(645,1468,565,75));

    cvSaveImage("firstName.tif",img);
    system("tesseract firstName.tif temp");
    fio.open("temp.txt");
    getline(fio,s1.firstname);
    fio.close();

cvSetImageROI(img, cvRect(390,290,280,60));
cvSaveImage("formNumber.tif",img);

system("tesseract firstName.tif temp");

The program crashes for some reason stating, "CV_image buffer overloaded".

About Beta Version

Questions
1

Answers
2

Best Answers
0

Vote Up 0 Vote Down

Posted on - 05/11/2011
Question Category: C++

Answered By shababhsiddique 0 points N/A #91566

OCR toolkit for use in a C++ program

It crashes because you haven't reset the ROI after each crop. Use something like this:

cvSetImageROI(img, cvRect(645,1468,565,75));
cvSaveImage("firstName.tif",img);
cvResetImageROI(img); // After saving is done reset the image ROI to original size

system("tesseract firstName.tif temp");

    fio.open("temp.txt");
    getline(fio,s1.firstname);
    fio.close();

    cvSetImageROI(img, cvRect(390,290,280,60)); //now new roi works as the first one
    cvSaveImage("formNumber.tif",img);
    cvResetImageROI(img);

system("tesseract formNumber.tif temp");

About shababhsiddique

Questions
0

Answers
9

Best Answers
3

Vote Up 0 Vote Down

Posted on - 05/11/2011
Question Category: C++

OCR toolkit for use in a C++ program

OCR toolkit for use in a C++ program

OCR toolkit for use in a C++ program

OCR toolkit for use in a C++ program

OCR toolkit for use in a C++ program

OCR toolkit for use in a C++ program

System() function in java project

How to clean up photo?

Related Questions

Latest Articles

KjøPe Mobil Uten Abonnement Or With A Contract

Is Dream99 Really Reputable?

Top SEO Reporting Tools

Latest Blogs

Top 10 New Laptop Entrants That Shook The Public

10 Facts About The Dark Web

Top 10 Latest Steam Cleaner Machines

Latest Tips

Top 10 Internet Monitoring Software

Top 10 Best Partition Manager Software

Top 10 Best Online Music Production Software