Python3 Ocr 初探

环境概述|Python|3.9.2||-pytesseract 客户端-|-5.0-|| pytesseract 依赖库| 0.3.8 || 语言包|chi_sim.traineddata、chi_sim_vert.traineddata|下载地址CSDN资源地址：https://download.csdn.net/download/yanxilou/76285648客户端：https://git

冷小鱼

710人浏览 · 2022-01-17 16:24:45

冷小鱼 · 2022-01-17 16:24:45 发布

环境概述

|Python|3.9.2 |
|-pytesseract 客户端-|-5.0-|
| pytesseract 依赖库| 0.3.8 |
| 语言包|chi_sim.traineddata、chi_sim_vert.traineddata |

下载地址

CSDN资源地址：https://download.csdn.net/download/yanxilou/76285648
客户端：https://github.com/UB-Mannheim/tesseract/wiki
在这里插入图片描述
语言包：https://github.com/tesseract-ocr/tessdata

在这里插入图片描述依赖库： pip install pytesseract

在这里插入图片描述

修改配置

语言包放到这里：
在这里插入图片描述修改cmd路径：

目录结构

在这里插入图片描述

代码

from PIL import Image
import pytesseract

def read_text(text_path):
  # 打开图片
  im = Image.open(text_path)
  # 转化为8bit的黑白图片
  imgry = im.convert('L')
  # 二值化，采用阈值分割算法，threshold为分割点
  threshold = 140
  table = []
  for j in range(256):
    if j < threshold:
      table.append(0)
    else:
      table.append(1)
  out = imgry.point(table, '1')
  # 识别文本
  text = pytesseract.image_to_string(out, lang="chi_sim", config='--psm 6')
  return text
if __name__ == '__main__':
  print(read_text(r'.\img\demo.png'))

效果

在这里插入图片描述

AtomGit 开源协作平台测评赛

瓜分20万奖金获得内推名额丰厚实物奖励易参与易上手

更多推荐

【Spring Boot 】Spring Boot + HikariCP 连接池使用示例

文章目录示例工具版本HikariCP 依赖HikariCP 配置1. connectionTimeout2. minimumIdle3. maximumPoolSize4. idleTimeout5. maxLifetime6. autoCommitSpring Boot Data + HikariCP + MySQL示例测试应用程序1. 使用 Maven 命令2. 使用 Eclipse3. 使用