Home  >  Article  >  Backend Development  >  Python code implements image text recognition

Python code implements image text recognition

零到壹度
零到壹度Original
2018-04-02 14:12:092655browse

This article shares with you the Python code to implement image text recognition. The content is quite good. I hope it can help friends in need.

Let’s take poetry recognition as an example
The following is the picture we want to identify

Python code implements image text recognition

Let’s take a look at the rendering first

Python code implements image text recognition
The recognition result after we run the code is A few characters were not recognized correctly, but most characters were recognized.

风急天高猿啸哀 渚芸胄芳少白鸟飞凤
无边落木萧萧下, 不尽长量工盲衮宕衮来
万里悲秋常1乍窨, 百年多病独登氤
艰难苦恨擎霜量 漂倒新停澍酉帆

One line of code can identify images, we need to do some preparation work behind the scenes

  • Here we need to use two libraries: pytesseract and PIL

  • At the same time, we also need to install the recognition engine tesseract-ocr

Let’s talk about the installation of these libraries, because only these libraries are installed In the future, Python can realize image text recognition with one line of code

1. Installation of pytesseract and PIL

You can install these two packages with the help of pip
- 1. Command line installation
pip install PIL
pip install pytesseract
- 2. If you use the pycharm editor, you can directly use pycharm to achieve quick installation.
Follow the following steps on the Settings page of pycharm
Python code implements image text recognition
In this way, you can successfully install pytesseract. To install PIL, you only need to search for PIL in the third step above and click Install
Python code implements image text recognition

At this time, we have installed the library and run the following code

from PIL import Image
import pytesseract
text=pytesseract.image_to_string(Image.open('denggao.jpeg'),lang='chi_sim')
print(text)

will report the following error. The reason for the error is: the recognition engine tesseract-ocr is not installed

Python code implements image text recognition

Second, install the recognition engine tesseract-ocr

  • 1. Download the installation package below, and then click to install it directly
    tesseract-ocr installation package Unzip and install the Chinese language package

and then do the following operations after installing tesseract-ocr to support Chinese recognition. Because tesseract-ocr does not support Chinese recognition by default.
Python code implements image text recognition

  • #2. After installing tesseract-ocr, we still need to do some configuration
    In C:\Users\huxiu\AppData\Local\Programs\ Python\Python35\Lib\site-packages\pytesseract Find pytesseract.py and open it and do the following operations

# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
#tesseract_cmd = 'tesseract'
tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'

You can also quickly open pytesseract.py through pycharm

Python code implements image text recognition

Python code implements image text recognition

Python code implements image text recognition

Now all our configurations are complete. Run the following code to parse the picture poem Du Fu's Ascension into text.

Python code implements image text recognition

The above is the detailed content of Python code implements image text recognition. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn