几个Python ocr识别库

工作生活中常常会遇到需要提取图片中文字信息的情况，以前都是手动自己把图片里的字敲出来，但随着这几年人工智能技术的愈发成熟，市面上有越来越多的ocr产品了，基本上能大部分正常图片的文字提取需求。当然有时候需要提取文字的图片数量较多或者有某个应用程序编写需求时，就需要借助代码来实现了，这里介绍几个比较适合新手小白的python ocr库，简单实用，可满足绝大多数常规的图片文字提取、验证码识别需求。

1、easyocr

github上一万多个star的开源ocr项目（github地址：
https://github.com/JaidedAI/EasyOCR），支持80多种语言的识别，识别精度超高。

python库安装命令如下：

pip install easyocr

待识别图片如下：

几个Python ocr识别库

代码实现：

import easyocr

#设置识别中英文两种语言
#首次使用需要修改easyocr/scripts/utils.py ，将ANTIALIAS替换为LANCZOS
reader = easyocr.Reader(['ch_sim','en'], gpu = False) # need to run only once to load model into memory

result = reader.readtext(r"d:Desktop	est.png", detail = 0)

print(result)

初次运行需要在线下载检测模型和识别模型，提议在网速好点的环境运行：

Using CPU. Note: This module is much faster with a GPU.

Downloading detection model, please wait. This may take several minutes depending upon your network connection.

Downloading recognition model, please wait. This may take several minutes depending upon your network connection.

识别结果输出如下，没有遗漏任何一个文字:

['协助文档', '快捷键', '目录', '标题', '文本样式', '列表', '链接', '代码片', '表格', '注脚', '注释', '自定义列表', 'LaTex 数学公式', '插入甘犄图', '插入UML图', '插入Mernaid流程图', '插入 Flowchart流程图', '插入类图']

2、muggle_ocr

muggle_ocr是一款轻量级的ocr识别库，从名字也可以看出来，专为麻瓜设计！使用也超级简单，但其强项主要是用于识别各类验证码，一般文字提取效果就稍差了。

python库安装命令如下：

pip install muggle_ocr

待识别验证码如下：

几个Python ocr识别库

代码实现：

import muggle_ocr

# 初始化sdk；model_type 包含了 ModelType.OCR/ModelType.Captcha 两种模式,分别对应常规图片与验证码
sdk = muggle_ocr.SDK(model_type=muggle_ocr.ModelType.Captcha)

with open(r"d:Desktop四位验证码.png", "rb") as f:
    img = f.read()

text = sdk.predict(image_bytes=img)
print(text)

识别结果输出如下：

MuggleOCR Session [captcha] Loaded.
3n3d

3、dddd_ocr

dddd_ocr也是一个用于识别验证码的开源库，又名带带弟弟ocr，爬虫界大佬sml2h3开发，识别效果也是超级不错，对一些常规的数字、字母验证码识别有奇效。

python库安装命令如下：

pip install dddd_ocr

待识别验证码如下：

几个Python ocr识别库

代码实现：

import ddddocr

ocr = ddddocr.DdddOcr()

with open("d:Desktop四位验证码2.png", 'rb') as f:

    img_bytes = f.read()

res = ocr.classification(img_bytes)

print(res)

识别结果输出如下，可以看出即使有一些线条干扰，还是准确的识别出了四个字母：

jepv

4、PaddleOCR

PaddleOCR是百度开源的一款基于深度学习的ocr识别库，对中文的识别精度相当不错，可以应付绝大多数的文字提取需求。

需要依次安装三个依赖库，安装命令如下，其中shapely库可能会受系统影响安装报错，具体解决方案参考这篇博客：百度OCR（文字识别）服务使用入坑指南

pip install paddlepaddle
pip install shapely
pip install paddleocr

待识别图片如下：

几个Python ocr识别库

代码实现：

if __name__ == '__main__':
    img_path = 'd:Desktop四位验证码2.png/bd.jpg'
    ocr = PaddleOCR(use_angle_cls=True, lang="ch")
    # 输入待识别图片路径
    # 输出结果保存路径
    result = ocr.ocr(img_path, cls=True)
    for line in result:
        print(line)

识别结果输出如下，会显示出每个区域字体识别的置信度，以及其坐标位置信息：

Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='/Users/liyang/.paddleocr/whl/det/ch/ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='/Users/liyang/.paddleocr/whl/rec/ch/ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='/Users/liyang/Library/Python/3.9/lib/python/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=True, cls_model_dir='/Users/liyang/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, return_word_box=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, use_pdf2docx_api=False, lang='ch', det=True, rec=True, type='ocr', ocr_version='PP-OCRv4', structure_version='PP-StructureV2')
[2023/08/11 10:03:29] ppocr DEBUG: dt_boxes num : 20, elapse : 0.32483887672424316
[2023/08/11 10:03:29] ppocr DEBUG: cls num  : 20, elapse : 0.15648436546325684
[2023/08/11 10:03:31] ppocr DEBUG: rec_res num  : 20, elapse : 2.3386950492858887
[[[[22.0, 18.0], [166.0, 18.0], [166.0, 57.0], [22.0, 57.0]], ('协助文档', 0.9953004121780396)], [[[31.0, 111.0], [118.0, 111.0], [118.0, 141.0], [31.0, 141.0]], ('快捷键', 0.9989374279975891)], [[[174.0, 110.0], [233.0, 110.0], [233.0, 142.0], [174.0, 142.0]], ('目录', 0.9999440908432007)], [[[284.0, 110.0], [346.0, 110.0], [346.0, 141.0], [284.0, 141.0]], ('标题', 0.999672532081604)], [[[397.0, 112.0], [511.0, 112.0], [511.0, 141.0], [397.0, 141.0]], ('文本样式', 0.9988542795181274)], [[[562.0, 110.0], [626.0, 110.0], [626.0, 142.0], [562.0, 142.0]], ('列表', 0.9999798536300659)], [[[32.0, 175.0], [93.0, 175.0], [93.0, 206.0], [32.0, 206.0]], ('链接', 0.999297022819519)], [[[144.0, 175.0], [230.0, 175.0], [230.0, 205.0], [144.0, 205.0]], ('代码片', 0.9998571872711182)], [[[284.0, 175.0], [345.0, 175.0], [345.0, 206.0], [284.0, 206.0]], ('表格', 0.9998085498809814)], [[[394.0, 174.0], [457.0, 174.0], [457.0, 205.0], [394.0, 205.0]], ('注脚', 0.9997339248657227)], [[[506.0, 175.0], [569.0, 175.0], [569.0, 206.0], [506.0, 206.0]], ('注释', 0.9993423223495483)], [[[35.0, 240.0], [175.0, 240.0], [175.0, 269.0], [35.0, 269.0]], ('自定义列表', 0.9996191263198853)], [[[229.0, 239.0], [430.0, 241.0], [430.0, 269.0], [229.0, 267.0]], ('LaTeX数学公式', 0.99480140209198)], [[[485.0, 240.0], [624.0, 240.0], [624.0, 269.0], [485.0, 269.0]], ('插入甘特图', 0.9993108510971069)], [[[34.0, 304.0], [180.0, 304.0], [180.0, 332.0], [34.0, 332.0]], ('插入UML图', 0.9942235946655273)], [[[237.0, 306.0], [495.0, 306.0], [495.0, 331.0], [237.0, 331.0]], ('插入Mermaid流程图', 0.9910006523132324)], [[[34.0, 367.0], [306.0, 369.0], [306.0, 397.0], [34.0, 395.0]], ('插入Flowchait流程图', 0.9257834553718567)], [[[358.0, 370.0], [474.0, 370.0], [474.0, 396.0], [358.0, 396.0]], ('插入类图', 0.9856606125831604)], [[[471.0, 368.0], [649.0, 370.0], [649.0, 399.0], [470.0, 397.0]], ('q38017966', 0.9926116466522217)]]