Home  >  Article  >  Backend Development  >  How to build a document scanner in Python?

How to build a document scanner in Python?

王林
王林forward
2023-04-26 13:10:111691browse

​Translator | Bugatti

Reviewer | Sun Shujuan

You may want to digitize documents to save physical space or create backups. In any case, writing a program to convert photos of paper documents into a quasi-format is exactly what Python is good at.

Using a combination of appropriate libraries, you can build a small application to digitize documents. Your program will take an image of a physical document as input, apply several image processing techniques to it, and output a scanned version of the input.

1. Prepare the environment

First of all, you should be familiar with the basics of Python, and you also need to know how to use the NumPy Python library.

Open any Python IDE and create two Python files. Name one main.py and the other transform.py. Then execute the following command on the terminal to install the required libraries.

pip install OpenCV-Python imutils scikit-image NumPy

You will use OpenCV-Python to take image input and do some image processing, use Imutils to resize the input and output images, and use scikit-image to threshold the images. NumPy will help you with arrays.

How to build a document scanner in Python?

Wait for the installation to complete and for the IDE to update the backbone of the project. Once the backbone content is updated, you can start programming. The complete source code can be found in the GitHub repository.

2. Import the installed library

Open the main.py file and import the installed library. This will enable you to call and use their functions when necessary.

import cv2
import imutils
from skimage.filters import threshold_local
from transform import perspective_transform

Ignore errors thrown by perspective_transform. Once you've finished processing the transform.py file, the error will disappear.

3. Obtain and adjust the input size

Take a clear image of the document you want to scan. Make sure all four corners of the document and its contents are visible. Copy the image to the same folder where the program files are stored.

How to build a document scanner in Python?

Pass the input image path to OpenCV. Make a copy of the original image as you will need it during the perspective transformation. Divide the height of the original image by the height you want to resize it to. This will maintain the aspect ratio. Finally, the adjusted image is output.

# Passing the image path
original_img = cv2.imread('sample.jpg')
copy = original_img.copy()

# The resized height in hundreds
ratio = original_img.shape[0] / 500.0
img_resize = imutils.resize(original_img, height=500)

# Displaying output
cv2.imshow('Resized image', img_resize)

# Waiting for the user to press any key
cv2.waitKey(0)

The output of the above code is as follows:

How to build a document scanner in Python?

Now you have The height of the original image is adjusted to 500 pixels.

4. Convert the adjusted image to grayscale image

Convert the adjusted RGB image to grayscale image. Most image processing libraries only handle grayscale images because they are easier to process.

gray_image = cv2.cvtColor(img_resize, cv2.COLOR_BGR2GRAY)
cv2.imshow('Grayed Image', gray_image)
cv2.waitKey(0)

Note the difference between the original image and the grayscale image.

How to build a document scanner in Python?

Program output showing gray image on IDE

The color table becomes a black and white table.

5. Use edge detector

Apply Gaussian blur filter to grayscale image to remove noise. The OpenCV canny function is then called to detect the edges present in the image.

blurred_image = cv2.GaussianBlur(gray_image, (5, 5), 0)
edged_img = cv2.Canny(blurred_image, 75, 200)
cv2.imshow('Image edges', edged_img)
cv2.waitKey(0)

Edges are visible on the output.

How to build a document scanner in Python?

The edges you will be working on are the edges of the document.

6. Find the largest contour

Detect the contour in the edge image. Sort in descending order, keeping only the five largest contours. By cyclically sorting the contours, the largest four-sided contour is approximately obtained.

cnts, _ = cv2.findContours(edged_img, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:5]

for c in cnts:
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.02 * peri, True)

if len(approx) == 4:
doc = approx
break

An outline with four sides is likely to contain documents.

7. Circle the four corners of the document outline

Circle the corners of the detected document outline. This will help you determine whether your program is able to detect the document in the image.

p = []

for d in doc:
tuple_point = tuple(d[0])
cv2.circle(img_resize, tuple_point, 3, (0, 0, 255), 4)
p.append(tuple_point)

cv2.imshow('Circled corner points', img_resize)
cv2.waitKey(0)

Circle a few corners of the adjusted RGB image.

How to build a document scanner in Python?

After detecting the document, you now need to extract it from the image.

8. Use warp perspective to get the desired image

Warp perspective is a computer vision technique used to transform images to correct distortion. It transforms the image into different planes, allowing you to view the image from different angles.

warped_image = perspective_transform(copy, doc.reshape(4, 2) * ratio)
warped_image = cv2.cvtColor(warped_image, cv2.COLOR_BGR2GRAY)
cv2.imshow("Warped Image", imutils.resize(warped_image, height=650))
cv2.waitKey(0)

In order to get the distorted image, you need to create a simple module to perform the perspective transformation.

9. Conversion module

该模块将对文档角的点进行排序。它还会将文档图像转换成不同的平面,并将相机角度更改为俯拍。

打开之前创建的那个transform.py文件,导入OpenCV库和NumPy库。

import numpy as np
import cv2

这个模块将含有两个函数。创建一个对文档角点的坐标进行排序的函数。第一个坐标将是左上角的坐标,第二个将是右上角的坐标,第三个将是右下角的坐标,第四个将是左下角的坐标。

def order_points(pts):
 # initializing the list of coordinates to be ordered
 rect = np.zeros((4, 2), dtype = "float32")

 s = pts.sum(axis = 1)

 # top-left point will have the smallest sum
 rect[0] = pts[np.argmin(s)]

 # bottom-right point will have the largest sum
 rect[2] = pts[np.argmax(s)]

 '''computing the difference between the points, the
 top-right point will have the smallest difference,
 whereas the bottom-left will have the largest difference'''
 diff = np.diff(pts, axis = 1)
 rect[1] = pts[np.argmin(diff)]
 rect[3] = pts[np.argmax(diff)]

 # returns ordered coordinates
 return rect

创建将计算新图像的角坐标,并获得俯拍的第二个函数。然后,它将计算透视变换矩阵,并返回扭曲的图像。

def perspective_transform(image, pts):
 # unpack the ordered coordinates individually
 rect = order_points(pts)
 (tl, tr, br, bl) = rect

 '''compute the width of the new image, which will be the
 maximum distance between bottom-right and bottom-left
 x-coordinates or the top-right and top-left x-coordinates'''
 widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
 widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
 maxWidth = max(int(widthA), int(widthB))

 '''compute the height of the new image, which will be the
 maximum distance between the top-left and bottom-left y-coordinates'''
 heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
 heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
 maxHeight = max(int(heightA), int(heightB))

 '''construct the set of destination points to obtain an overhead shot'''
 dst = np.array([
[0, 0],
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]], dtype = "float32")

 # compute the perspective transform matrix
 transform_matrix = cv2.getPerspectiveTransform(rect, dst)

 # Apply the transform matrix
 warped = cv2.warpPerspective(image, transform_matrix, (maxWidth, maxHeight))

 # return the warped image
 return warped

现在您已创建了转换模块。perspective_transform导入方面的错误现在将消失。

How to build a document scanner in Python?

注意,显示的图像有俯拍。

10、运用自适应阈值,保存扫描输出

在main.py文件中,对扭曲的图像运用高斯阈值。这将给扭曲的图像一个扫描后的外观。将扫描后的图像输出保存到含有程序文件的文件夹中。

T = threshold_local(warped_image, 11, offset=10, method="gaussian")
warped = (warped_image > T).astype("uint8") * 255
cv2.imwrite('./'+'scan'+'.png',warped)

以PNG格式保存扫描件可以保持文档质量。

11、显示输出

输出扫描后文档的图像:

cv2.imshow("Final Scanned image", imutils.resize(warped, height=650))
cv2.waitKey(0)
cv2.destroyAllWindows()

下图显示了程序的输出,即扫描后文档的俯拍。

How to build a document scanner in Python?

12、计算机视觉在如何进步?

创建文档扫描器涉及计算机视觉的一些核心领域,计算机视觉是一个广泛而复杂的领域。为了在计算机视觉方面取得进步,您应该从事有趣味又有挑战性的项目。

您还应该阅读如何将计算机视觉与当前前技术结合使用方面的更多信息。这让您能了解情况,并为所处理的项目提供新的想法。

原文链接:https://www.makeuseof.com/python-create-document-scanner/

The above is the detailed content of How to build a document scanner in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete