Home >Backend Development >PHP Tutorial >How to install and configure tesseract-ocr 4.00 under Windows?
Recently I have to do text recognition, and I am not allowed to directly use other people’s interfaces, so I can only try to use open source libraries. tesseract-ocr is an open source text recognition project from HP. It can quickly build an image and text recognition system and help us develop an OCR system that can recognize images. Because I develop in Windows environment, I must install the system in Windows environment.
Step 1: Download the installation package
According to this, I found the unofficial installation package. It seems that I only saw the 64-bit installation package http://digi.bib.uni-mannheim .de/tesseract/tesseract-ocr-setup-4.00.00dev.exe, you can install it directly after downloading, but remember your installation directory, we will configure the environment variables later.
If you are not doing English image and text recognition, you need to download recognition packages in other languages.
Simplified Chinese character recognition package:
Traditional Chinese character recognition package:
Step 2: Install
Directly execute the downloaded tesseract -ocr-setup-4.00.00dev.exe, next step, next step to install.
Step 3: Configure environment variables
Note: My system is win7, other systems should be similar, just like configuring java variables
Copy your installation address, I is installed in C:\Program Files (x86)\Tesseract-OCR, the interface is as follows:
Copy the installation path "C:\Program Files (x86)\Tesseract- OCR", enter "Control Panel\System and Security\System", click
"System Protection"
to enter the following interface:
Click on the environment variable to enter the following configuration interface:
Change the installation path just now "C:\Program Files (x86)\ "Tesseract-OCR" is added to the PATH and Path underlined in red. Note that when adding, use ";" to separate it from the previous variables at the beginning and end with ";". The following is a sample of my configuration information:
C:\Users\Administrator\AppData\Roaming\Composer\vendor\bin;C:\Users\Administrator\AppData\Roaming\npm;C:\ Program Files (x86)\Tesseract-OCR;
After configuring, click Save.
Open the command terminal, enter: tesseract -v, you can see the version information
If an error occurs, it is probably an environment variable Not configured properly.
At this point, even if we have completed the installation, our system still cannot recognize Chinese. We need to download the simplified Chinese and traditional Chinese language packs (the addresses are given above). After downloading, put Just go to the tessconfigs directory of the installation directory.
Additional: Because there are no global variables configured, data conversion cannot be performed across disks. Here we add a configuration information to the environment variable
System variables—->New:
Add a TESSDATA_PREFIX variable name, the variable value is still my installation path C:\Program Files (x86)\Tesseract-OCR;
The above is the detailed content of How to install and configure tesseract-ocr 4.00 under Windows?. For more information, please follow other related articles on the PHP Chinese website!