How to build a document scanner in Python?-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

How to build a document scanner in Python?

王林

Apr 26, 2023 pm 01:10 PM

pythondocumentscanner

Translator | Bugatti

Reviewer | Sun Shujuan

You may want to digitize documents to save physical space or create backups. In any case, writing a program to convert photos of paper documents into a quasi-format is exactly what Python is good at.

Using a combination of appropriate libraries, you can build a small application to digitize documents. Your program will take an image of a physical document as input, apply several image processing techniques to it, and output a scanned version of the input.

1. Prepare the environment

First of all, you should be familiar with the basics of Python, and you also need to know how to use the NumPy Python library.

Open any Python IDE and create two Python files. Name one main.py and the other transform.py. Then execute the following command on the terminal to install the required libraries.

pip install OpenCV-Python imutils scikit-image NumPy

You will use OpenCV-Python to take image input and do some image processing, use Imutils to resize the input and output images, and use scikit-image to threshold the images. NumPy will help you with arrays.

How to build a document scanner in Python?

Wait for the installation to complete and for the IDE to update the backbone of the project. Once the backbone content is updated, you can start programming. The complete source code can be found in the GitHub repository.

2. Import the installed library

Open the main.py file and import the installed library. This will enable you to call and use their functions when necessary.

import cv2
import imutils
from skimage.filters import threshold_local
from transform import perspective_transform

Ignore errors thrown by perspective_transform. Once you've finished processing the transform.py file, the error will disappear.

3. Obtain and adjust the input size

Take a clear image of the document you want to scan. Make sure all four corners of the document and its contents are visible. Copy the image to the same folder where the program files are stored.

How to build a document scanner in Python?

Pass the input image path to OpenCV. Make a copy of the original image as you will need it during the perspective transformation. Divide the height of the original image by the height you want to resize it to. This will maintain the aspect ratio. Finally, the adjusted image is output.

# Passing the image path
original_img = cv2.imread('sample.jpg')
copy = original_img.copy()

# The resized height in hundreds
ratio = original_img.shape[0] / 500.0
img_resize = imutils.resize(original_img, height=500)

# Displaying output
cv2.imshow('Resized image', img_resize)

# Waiting for the user to press any key
cv2.waitKey(0)

The output of the above code is as follows:

How to build a document scanner in Python?

Now you have The height of the original image is adjusted to 500 pixels.

4. Convert the adjusted image to grayscale image

Convert the adjusted RGB image to grayscale image. Most image processing libraries only handle grayscale images because they are easier to process.

gray_image = cv2.cvtColor(img_resize, cv2.COLOR_BGR2GRAY)
cv2.imshow('Grayed Image', gray_image)
cv2.waitKey(0)

Note the difference between the original image and the grayscale image.

How to build a document scanner in Python?

Program output showing gray image on IDE

The color table becomes a black and white table.

5. Use edge detector

Apply Gaussian blur filter to grayscale image to remove noise. The OpenCV canny function is then called to detect the edges present in the image.

blurred_image = cv2.GaussianBlur(gray_image, (5, 5), 0)
edged_img = cv2.Canny(blurred_image, 75, 200)
cv2.imshow('Image edges', edged_img)
cv2.waitKey(0)

Edges are visible on the output.

How to build a document scanner in Python?

The edges you will be working on are the edges of the document.

6. Find the largest contour

Detect the contour in the edge image. Sort in descending order, keeping only the five largest contours. By cyclically sorting the contours, the largest four-sided contour is approximately obtained.

cnts, _ = cv2.findContours(edged_img, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:5]

for c in cnts:
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.02 * peri, True)

if len(approx) == 4:
doc = approx
break

An outline with four sides is likely to contain documents.

7. Circle the four corners of the document outline

Circle the corners of the detected document outline. This will help you determine whether your program is able to detect the document in the image.

p = []

for d in doc:
tuple_point = tuple(d[0])
cv2.circle(img_resize, tuple_point, 3, (0, 0, 255), 4)
p.append(tuple_point)

cv2.imshow('Circled corner points', img_resize)
cv2.waitKey(0)

Circle a few corners of the adjusted RGB image.

How to build a document scanner in Python?

After detecting the document, you now need to extract it from the image.

8. Use warp perspective to get the desired image

Warp perspective is a computer vision technique used to transform images to correct distortion. It transforms the image into different planes, allowing you to view the image from different angles.

warped_image = perspective_transform(copy, doc.reshape(4, 2) * ratio)
warped_image = cv2.cvtColor(warped_image, cv2.COLOR_BGR2GRAY)
cv2.imshow("Warped Image", imutils.resize(warped_image, height=650))
cv2.waitKey(0)

In order to get the distorted image, you need to create a simple module to perform the perspective transformation.

9. Conversion module

该模块将对文档角的点进行排序。它还会将文档图像转换成不同的平面，并将相机角度更改为俯拍。

打开之前创建的那个transform.py文件，导入OpenCV库和NumPy库。

import numpy as np
import cv2

这个模块将含有两个函数。创建一个对文档角点的坐标进行排序的函数。第一个坐标将是左上角的坐标，第二个将是右上角的坐标，第三个将是右下角的坐标，第四个将是左下角的坐标。

def order_points(pts):
 # initializing the list of coordinates to be ordered
 rect = np.zeros((4, 2), dtype = "float32")

 s = pts.sum(axis = 1)

 # top-left point will have the smallest sum
 rect[0] = pts[np.argmin(s)]

 # bottom-right point will have the largest sum
 rect[2] = pts[np.argmax(s)]

 '''computing the difference between the points, the
 top-right point will have the smallest difference,
 whereas the bottom-left will have the largest difference'''
 diff = np.diff(pts, axis = 1)
 rect[1] = pts[np.argmin(diff)]
 rect[3] = pts[np.argmax(diff)]

 # returns ordered coordinates
 return rect

创建将计算新图像的角坐标，并获得俯拍的第二个函数。然后，它将计算透视变换矩阵，并返回扭曲的图像。

def perspective_transform(image, pts):
 # unpack the ordered coordinates individually
 rect = order_points(pts)
 (tl, tr, br, bl) = rect

 '''compute the width of the new image, which will be the
 maximum distance between bottom-right and bottom-left
 x-coordinates or the top-right and top-left x-coordinates'''
 widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
 widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
 maxWidth = max(int(widthA), int(widthB))

 '''compute the height of the new image, which will be the
 maximum distance between the top-left and bottom-left y-coordinates'''
 heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
 heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
 maxHeight = max(int(heightA), int(heightB))

 '''construct the set of destination points to obtain an overhead shot'''
 dst = np.array([
[0, 0],
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]], dtype = "float32")

 # compute the perspective transform matrix
 transform_matrix = cv2.getPerspectiveTransform(rect, dst)

 # Apply the transform matrix
 warped = cv2.warpPerspective(image, transform_matrix, (maxWidth, maxHeight))

 # return the warped image
 return warped

现在您已创建了转换模块。perspective_transform导入方面的错误现在将消失。

How to build a document scanner in Python?

注意，显示的图像有俯拍。

10、运用自适应阈值，保存扫描输出

在main.py文件中，对扭曲的图像运用高斯阈值。这将给扭曲的图像一个扫描后的外观。将扫描后的图像输出保存到含有程序文件的文件夹中。

T = threshold_local(warped_image, 11, offset=10, method="gaussian")
warped = (warped_image > T).astype("uint8") * 255
cv2.imwrite('./'+'scan'+'.png',warped)

以PNG格式保存扫描件可以保持文档质量。

11、显示输出

输出扫描后文档的图像：

cv2.imshow("Final Scanned image", imutils.resize(warped, height=650))
cv2.waitKey(0)
cv2.destroyAllWindows()

下图显示了程序的输出，即扫描后文档的俯拍。

How to build a document scanner in Python?

12、计算机视觉在如何进步？

创建文档扫描器涉及计算机视觉的一些核心领域，计算机视觉是一个广泛而复杂的领域。为了在计算机视觉方面取得进步，您应该从事有趣味又有挑战性的项目。

您还应该阅读如何将计算机视觉与当前前技术结合使用方面的更多信息。这让您能了解情况，并为所处理的项目提供新的想法。

原文链接：https://www.makeuseof.com/python-create-document-scanner/

The above is the detailed content of How to build a document scanner in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

详细讲解Python之Seaborn（数据可视化）Apr 21, 2022 pm 06:08 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于Seaborn的相关问题，包括了数据可视化处理的散点图、折线图、条形图等等内容，下面一起来看一下，希望对大家有帮助。

详细了解Python进程池与进程锁May 10, 2022 pm 06:11 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于进程池与进程锁的相关问题，包括进程池的创建模块，进程池函数等等内容，下面一起来看一下，希望对大家有帮助。

Python自动化实践之筛选简历Jun 07, 2022 pm 06:59 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于简历筛选的相关问题，包括了定义 ReadDoc 类用以读取 word 文件以及定义 search_word 函数用以筛选的相关内容，下面一起来看一下，希望对大家有帮助。

归纳总结Python标准库May 03, 2022 am 09:00 AM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于标准库总结的相关问题，下面一起来看一下，希望对大家有帮助。

分享10款高效的VSCode插件，总有一款能够惊艳到你！！Mar 09, 2021 am 10:15 AM

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件，能够让原本单薄的VS Code如虎添翼，开发效率顿时提升到一个新的阶段。

Python数据类型详解之字符串、数字Apr 27, 2022 pm 07:27 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于数据类型之字符串、数字的相关问题，下面一起来看一下，希望对大家有帮助。

详细介绍python的numpy模块May 19, 2022 am 11:43 AM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于numpy模块的相关问题，Numpy是Numerical Python extensions的缩写，字面意思是Python数值计算扩展，下面一起来看一下，希望对大家有帮助。

python中文是什么意思Jun 24, 2019 pm 02:22 PM

pythn的中文意思是巨蟒、蟒蛇。1989年圣诞节期间，Guido van Rossum在家闲的没事干，为了跟朋友庆祝圣诞节，决定发明一种全新的脚本语言。他很喜欢一个肥皂剧叫Monty Python，所以便把这门语言叫做python。

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Repo: How To Revive Teammates

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

3 weeks agoByDDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

4 weeks agoByDDD

Hot Tools

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

Atom editor mac version download

The most popular open source editor

SublimeText3 Linux new version

SublimeText3 Linux latest version

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Hot Topics

Where is the login entrance for gmail email?

7359

1628

1353

1265

1214