Installation on Ubuntu
Tesseract is available directly on Ubuntu and you can install Tesseract and its development tools as follows [2].$ sudo apt install tesseract-ocr $ sudo apt install libtesseract-dev
If you want to install language, you can use 'sudo apt install tesseract-ocr-[lang]'. For example, to install Myanmar language, use the following command.
$ sudo apt install tesseract-ocr-mya $ tesseract --list-langs
Using tesseract command
To input the scanned image 'engim.jpg' and output the result to 'oute.txt' using English language, you can use the following command.$ tesseract -l eng engim.jpg oute
Similarly, for Myanmar language, using 'myaim.jpg' as input, [3]
$ tesseract -l mya myaim.jpg outm
Tesseract C++ API with OpenCV
The following C++ example,ta.cpp, demonstrates using Tesseract API with OpenCV [4, 5]. For that, you need to install OpenCV as well. Please visit to the following link to see more details about installing OpenCV on Linux.http://cool-emerald.blogspot.com/2017/11/opencv-on-linux-using-g-cmake-qt.html,
#include <tesseract/baseapi.h> #include <leptonica/allheaders.h> #include <opencv2/opencv.hpp> using namespace cv; int main() { char *otxt; tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI(); // initialize tesseract-ocr with English if(api->Init(NULL,"eng")){ fprintf(stderr,"Could not initialize tesseract.\n"); exit(1); } // Open input image with OpenCV Mat img = imread("./engim.jpg",1); if(!img.data){ fprintf(stderr,"Could not read input image.\n"); exit(1); } // set image api->SetImage(img.data,img.cols,img.rows,3,img.step); // Get OCR result otxt = api->GetUTF8Text(); printf("OCR output: \n%s\n",otxt); // destroy used obj and release memory api->End(); delete api; delete [] otxt; return 0; }
Use the following commands to build and run the program. The input image 'engim.jpg' should be in the same directory.
$ g++ ta.cpp `pkg-config --cflags --libs tesseract opencv` -o ta $ ./ta
With wxWidgets
An example C++ program, tawxcv.cpp, using Tesseract together with OpenCV and wxWidgets can be found at the following link.https://github.com/yan9a/cewx/tree/master/tawxcv,
You need wxWidgets for the program and details about setting up wxWidgets can be found at the following link.
http://cool-emerald.blogspot.com/2013/08/cross-platform-c-programming-with.html,
To build the program using CMake and run, use the following commands.
$ cmake . $ make $ ./tawxcv
The output of the program is illustrated below.
Acknowledgement
Thanks to my colleague, Ada, for introducing Tesseract to me.References
[1] Wikipedia. Tesseract (software).url: https://en.wikipedia.org/wiki/Tesseract_(software).
[2] Tesseract-OCR. Wiki page on GitHub.
url: https://github.com/tesseract-ocr/tesseract/wiki.
[3] Tesseract-OCR. Tesseract Manual Page.
url: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages.
[4] tessdoc. Tesseract documentation - API examples.
url: https://tesseract-ocr.github.io/tessdoc/APIExample.
[5] Evans Ehiorobo. Basic OCR with Tesseract and OpenCV.
url: https://medium.com/building-a-simple-text-correction-tool/basic-ocr-with-tesseract-and-opencv-34fae6ab3400.
No comments:
Post a Comment
Comments are moderated and don't be surprised if your comment does not appear promptly.