pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it‘s not in your PATH

OCR识别时，出现 pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH，找到下载的可执行文件，安装修改文件

yuping_zhu

11658人浏览 · 2020-08-09 11:02:18

yuping_zhu · 2020-08-09 11:02:18 发布

文章目录

报错场景

最近是在练习一个Selenium的测试项目，其中有一个测试用例需要验证码登录，用到了OCR识别，但是我安装了 pytesseract 后，执行程序依然报错，提示：

Traceback (most recent call last):
  File "D:\Python3.6.0\lib\site-packages\pytesseract\pytesseract.py", line 238, in run_tesseract
    proc = subprocess.Popen(cmd_args, **subprocess_args())
  File "D:\Python3.6.0\lib\subprocess.py", line 707, in __init__
    restore_signals, start_new_session)
  File "D:\Python3.6.0\lib\subprocess.py", line 990, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  ......
  File "D:\Python3.6.0\lib\site-packages\pytesseract\pytesseract.py", line 242, in run_tesseract
    raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH

搜索资料

网上搜索了一些资料，解决方法都大同小异，只是有一步骤没理解，浪费了一些时间。都说是去 https://github.com/tesseract-ocr/tesseract/wiki 下载安装，千篇一律，点进去后是github上的源码，下载压缩包后，并没有可以直接下载 .exe 可执行文件，也就无法按照很多指导中写的进行后续操作了。

其实是需要再进入其他链接找的，我把这种方法放后面了，先说以下这种方法。

下载方式1

正确的可以直接下载 tesseract.exe 可执行文件地址（windows64）：

https://digi.bib.uni-mannheim.de/tesseract/

如图所示，这些都是可以下载的可执行文件：
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CVXLfFTR-1596941702781)(../../../markdown_pic/blog_ocr.png)]

点击下载文件后，在本地可以看到这样一个文件（可以自己选exe，不一定相同），双击该文件安装：
在这里插入图片描述

安装过程中，它会让你选择可识别的语言，如果需要识别汉字，需要勾选 Chinese，也可以全选，但是没必要。勾选安装语言组件后，就不用自己单去下载语言包了，很方便。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3bcgAi9E-1596941702814)(../../../markdown_pic/blog_ocr3.png)]
接着，需要选择安装路径，我默认安装在 C:\Program Files\Tesseract-OCR，这个路径很重要，以后要用到。

下载方式2

先进入 https://github.com/tesseract-ocr/tesseract，找到如图所示的点击：
进入 https://tesseract-ocr.github.io/tessdoc/Home.html，这边有各种下载版本的，Linux、MaxOS、Windows等，这边只展示windows下的截图了：
进入 https://github.com/UB-Mannheim/tesseract/wiki，可以看到最新的下载版本：
下载后的安装步骤，和下载方式1的后续一致，不再赘述了。

修改配置

方式1（不推荐）

安装完成后，找到 pytesseract.py 文件，并如下图所示修改：

在这里插入图片描述

这边的 r'C:\Program Files\Tesseract-OCR\tesseract.exe'是你刚刚下载的可执行文件的安装路径，如果安装在别处，需要用对应的路径。

最后，再执行原来报错的文件，就正常了。

方式2

在用到的时候，配置路径

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
image = Image.open(picture_new_path)
text = pytesseract.image_to_string(image, lang='eng', config="--psm 6")

或者

tesseract_executable = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
output_file = os.path.join(os.path.dirname(picture_path), 'output')
subprocess.run([tesseract_executable, picture_new_path, output_file, "-l", "eng"], check=True)
with open(output_file + ".txt", "r", encoding="utf-8") as result_file:
    text = result_file.read()
    print("OCR result:", text)

Mac 平台

在 Mac 上，使用官网推荐的方式安装：

brew install tesseract

The tesseract directory can then be found using brew info tesseract, e.g.

/usr/local/Cellar/tesseract/5.3.2/bin/tesseract

demo:

import pytesseract
from PIL import Image

# 可以写一个函数 crop_picture 将原图裁剪一下，只保留想要识别文本的部分，这样识别更加准确一些。
def crop_picture(picture_path, crop_box: list):
    """
    crap picture with crop_box
    :param picture_path: picture to be crapped
    :param crop_box: crop region, eg: [100, 200, 300, 350]
    :return: path of crapped picture
    """
    dirname = os.path.dirname(picture_path)
    basename = os.path.basename(picture_path)
    new_basename = ''.join([basename.split('.')[0], '_new.', basename.split('.')[1]])

    picture_origin = Image.open(picture_path)
    picture_origin_size = picture_origin.size
    if crop_box[2] is None:
        crop_box[2] = picture_origin_size[0]
    if crop_box[3] is None:
        crop_box[3] = picture_origin_size[1]
    picture_new = picture_origin.crop(tuple(crop_box))

    picture_new_path = os.path.join(dirname, new_basename)
    picture_new.save(picture_new_path)
    return picture_new_path

def get_text_from_picture(picture_path):
    """
    get text from picture
    :param picture_path: picture to be crapped
    :param crop_box: crop region, eg: [100, 200, 300, 350]
    :return: text
    """
    pytesseract.pytesseract.tesseract_cmd = r'/usr/local/Cellar/tesseract/5.3.2/bin/tesseract'
    picture_new_path = crop_picture(picture_path, crop_box=[585, 360, None, 800])
    image = Image.open(picture_new_path)
    text = pytesseract.image_to_string(image, lang='eng')
    return text