Installation on Ubuntu
Tesseract is available directly on Ubuntu and you can install Tesseract and its development tools as follows [2].$ sudo apt install tesseract-ocr $ sudo apt install libtesseract-dev
If you want to install language, you can use 'sudo apt install tesseract-ocr-[lang]'. For example, to install Myanmar language, use the following command.
$ sudo apt install tesseract-ocr-mya $ tesseract --list-langs
Using tesseract command
To input the scanned image 'engim.jpg' and output the result to 'oute.txt' using English language, you can use the following command.$ tesseract -l eng engim.jpg oute
Similarly, for Myanmar language, using 'myaim.jpg' as input, [3]
$ tesseract -l mya myaim.jpg outm
Tesseract C++ API with OpenCV
The following C++ example,ta.cpp, demonstrates using Tesseract API with OpenCV [4, 5]. For that, you need to install OpenCV as well. Please visit to the following link to see more details about installing OpenCV on Linux.http://cool-emerald.blogspot.com/2017/11/opencv-on-linux-using-g-cmake-qt.html,
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <opencv2/opencv.hpp>
using namespace cv;
int main()
{
char *otxt;
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
// initialize tesseract-ocr with English
if(api->Init(NULL,"eng")){
fprintf(stderr,"Could not initialize tesseract.\n");
exit(1);
}
// Open input image with OpenCV
Mat img = imread("./engim.jpg",1);
if(!img.data){
fprintf(stderr,"Could not read input image.\n");
exit(1);
}
// set image
api->SetImage(img.data,img.cols,img.rows,3,img.step);
// Get OCR result
otxt = api->GetUTF8Text();
printf("OCR output: \n%s\n",otxt);
// destroy used obj and release memory
api->End();
delete api;
delete [] otxt;
return 0;
}
Use the following commands to build and run the program. The input image 'engim.jpg' should be in the same directory.
$ g++ ta.cpp `pkg-config --cflags --libs tesseract opencv` -o ta $ ./ta
With wxWidgets
An example C++ program, tawxcv.cpp, using Tesseract together with OpenCV and wxWidgets can be found at the following link.https://github.com/yan9a/cewx/tree/master/tawxcv,
You need wxWidgets for the program and details about setting up wxWidgets can be found at the following link.
http://cool-emerald.blogspot.com/2013/08/cross-platform-c-programming-with.html,
To build the program using CMake and run, use the following commands.
$ cmake . $ make $ ./tawxcv
The output of the program is illustrated below.
Acknowledgement
Thanks to my colleague, Ada, for introducing Tesseract to me.References
[1] Wikipedia. Tesseract (software).url: https://en.wikipedia.org/wiki/Tesseract_(software).
[2] Tesseract-OCR. Wiki page on GitHub.
url: https://github.com/tesseract-ocr/tesseract/wiki.
[3] Tesseract-OCR. Tesseract Manual Page.
url: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages.
[4] tessdoc. Tesseract documentation - API examples.
url: https://tesseract-ocr.github.io/tessdoc/APIExample.
[5] Evans Ehiorobo. Basic OCR with Tesseract and OpenCV.
url: https://medium.com/building-a-simple-text-correction-tool/basic-ocr-with-tesseract-and-opencv-34fae6ab3400.

No comments:
Post a Comment
Comments are moderated and don't be surprised if your comment does not appear promptly.