Tesseract-OCR在Ubuntu20.04平台上使用
安装Tesseract-OCR在ubuntu20.04上,我们按官方的最简单方式安装,sudo apt install tesseract-ocr如果你需要做开发,或是自己训练模型,那就要安装开发者工具,sudo apt install libtesseract-dev安装完了检查一下,发现版本是4.1.1tesseract -vtesseract 4.1.1leptonica-1.79.0lib
安装Tesseract-OCR
在ubuntu20.04上,我们按官方的最简单方式安装,
sudo apt install tesseract-ocr
如果你需要做开发,或是自己训练模型,那就要安装开发者工具,
sudo apt install libtesseract-dev
安装完了检查一下,发现版本是4.1.1
tesseract -v
tesseract 4.1.1
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
Found AVX512BW
Found AVX512F
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
如果你希望安装最新的版本如5.1.0,那么你需要到官方地址去下载了自己编译安装,这里略过先
安装qtcreator
因为这里我们会调用一个简单的qt界面,因此需要安装开发工具qtcreator。
sudo apt install qt5-default
然后到这里下载,
Download Offline Installers | Source Package Offline Installer | Qt
比如我下载的是6.0.2版本的,
sudo chmod +x qt-creator-opensource-linux-x86_64-6.0.2.run
./qt-creator-opensource-linux-x86_64-6.0.2.run
这样,就安装好了。
qtcreator->新建文件或项目-->Non Qt Project --> Plain C++ Application,然后在main下输入下面的源码,
源码如下,
#include <stdio.h>
#include <stdlib.h>
#include <opencv2/opencv.hpp>
#include <leptonica/allheaders.h>
#include <tesseract/baseapi.h>
#include <tesseract/publictypes.h>
#include <opencv2/imgproc.hpp>
int main()
{
//std::cout << "Hello World!" << std::endl;
std::string image_name = "/home/mc/ocr/testimg/testocr.png"; //"/home/mc/ocr/testimg/eurotext.png";
cv::Mat imageMat;
imageMat = cv::imread(image_name);
if (imageMat.data == nullptr)
{
printf("No image data \n");
return -1;
}
//cv::Mat blurMat;
//cv::medianBlur(imageMat, blurMat, 5); // 图像模糊
cv::Mat z1, g_grayImage;
cv::cvtColor(imageMat, z1, cv::COLOR_BGR2GRAY); // 灰度图
// cv::threshold(z1, z2, 214, 255, cv::THRESH_BINARY); // 阈值
cv::adaptiveThreshold(z1, g_grayImage, 255, cv::ADAPTIVE_THRESH_MEAN_C, cv::THRESH_BINARY, 7, 25); // 自动降噪
cv::namedWindow("Image1", cv::WINDOW_AUTOSIZE);
cv::imshow("Image1", g_grayImage);
cv::waitKey(0);
//std::system("chcp 65001");
char* outText;
tesseract::TessBaseAPI api;
//if (api.Init(NULL, "chi_sim")) // for chinese
if(api.Init("/home/mc/ocr/tesseract/tessdata_best-main", "eng", tesseract::OEM_DEFAULT))
{
std::cout << stderr << std::endl;
exit(1);
}
// Pix *image = pixRead("3.jpg");
api.SetImage((uchar*)g_grayImage.data, g_grayImage.cols, g_grayImage.rows, 1, g_grayImage.cols);
outText = api.GetUTF8Text();
if (outText == nullptr)
{
std::cout << "No Data" << std::endl;
}
std::cout << outText << std::endl;
// Destroy used object and release memory
api.End(); // delete api;
delete[] outText; // pixDestroy(&image);
return 0;
}
qtcreator的配置文件如下所示(注意我没用到默认的gcc-9,而是用的gcc-8),可以参考前一篇文章,
qtcreator报错:fatal error: stdlib.h: No such file or directory_高精度计算机视觉的博客-CSDN博客
TEMPLATE = app
CONFIG += console c++11
CONFIG -= app_bundle
CONFIG -= qt
SOURCES += \
main.cpp
unix:!macx: LIBS += -L$$PWD/../../../../usr/local/lib/ -lopencv_world
INCLUDEPATH += $$PWD/../../../../usr/local/include/opencv4
DEPENDPATH += $$PWD/../../../../usr/local/include/opencv4
unix:!macx: LIBS += -L$$PWD/../../../../usr/lib/x86_64-linux-gnu/ -ltesseract
#INCLUDEPATH += $$PWD/../../../../usr/include
#DEPENDPATH += $$PWD/../../../../usr/include
INCLUDEPATH += /usr/include/c++/8
参考: 官方的安装说明
Ubuntu
If they are not already installed, you need the following libraries (Ubuntu 16.04/14.04):
sudo apt-get install g++ # or clang++ (presumably)
sudo apt-get install autoconf automake libtool
sudo apt-get install pkg-config
sudo apt-get install libpng-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev
if you plan to install the training tools, you also need the following libraries:
sudo apt-get install libicu-dev
sudo apt-get install libpango1.0-dev
sudo apt-get install libcairo2-dev
Leptonica
You also need to install Leptonica. Ensure that the development headers for Leptonica are installed before compiling Tesseract.
Tesseract versions and the minimum version of Leptonica required:
Tesseract | Leptonica | Ubuntu |
---|---|---|
4.00 | 1.74.2 | Ubuntu 18.04 |
3.05 | 1.74.0 | Must build from source |
3.04 | 1.71 | Ubuntu 16.04 |
3.03 | 1.70 | Ubuntu 14.04 |
3.02 | 1.69 | Ubuntu 12.04 |
3.01 | 1.67 |
One option is to install the distro’s Leptonica package:
sudo apt-get install libleptonica-dev
but if you are using an oldish version of Linux, the Leptonica version may be too old, so you will need to build from source.
The sources are at https://github.com/DanBloomberg/leptonica . The instructions for building are given in Leptonica README.
Note that if building Leptonica from source, you may need to ensure that /usr/local/lib is in your library path. This is a standard Linux bug, and the information at Stackoverflow is very helpful.
Installing Tesseract from Git
Please follow instructions in Compiling–GitInstallation
Also read Install Instructions
Install elsewhere / without root
Tesseract can be configured to install anywhere, which makes it possible to install it without root access.
To install it in $HOME/local:
./autogen.sh
./configure --prefix=$HOME/local/
make
make install
To install it in $HOME/local using Leptonica libraries also installed in $HOME/local:
./autogen.sh
LIBLEPT_HEADERSDIR=$HOME/local/include ./configure \
--prefix=$HOME/local/ --with-extra-libraries=$HOME/local/lib
make
make install
In some system, you might also need to specify the path to the pkg-config
before running the configure
script:
export PKG_CONFIG_PATH=$HOME/local/lib/pkgconfig
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)