Home > Article > Backend Development > How to build a document scanner in Python?
Translator | Bugatti
Reviewer | Sun Shujuan
You may want to digitize documents to save physical space or create backups. In any case, writing a program to convert photos of paper documents into a quasi-format is exactly what Python is good at.
Using a combination of appropriate libraries, you can build a small application to digitize documents. Your program will take an image of a physical document as input, apply several image processing techniques to it, and output a scanned version of the input.
First of all, you should be familiar with the basics of Python, and you also need to know how to use the NumPy Python library.
Open any Python IDE and create two Python files. Name one main.py and the other transform.py. Then execute the following command on the terminal to install the required libraries.
pip install OpenCV-Python imutils scikit-image NumPy
You will use OpenCV-Python to take image input and do some image processing, use Imutils to resize the input and output images, and use scikit-image to threshold the images. NumPy will help you with arrays.
Wait for the installation to complete and for the IDE to update the backbone of the project. Once the backbone content is updated, you can start programming. The complete source code can be found in the GitHub repository.
Open the main.py file and import the installed library. This will enable you to call and use their functions when necessary.
import cv2 import imutils from skimage.filters import threshold_local from transform import perspective_transform
Ignore errors thrown by perspective_transform. Once you've finished processing the transform.py file, the error will disappear.
Take a clear image of the document you want to scan. Make sure all four corners of the document and its contents are visible. Copy the image to the same folder where the program files are stored.
Pass the input image path to OpenCV. Make a copy of the original image as you will need it during the perspective transformation. Divide the height of the original image by the height you want to resize it to. This will maintain the aspect ratio. Finally, the adjusted image is output.
# Passing the image path original_img = cv2.imread('sample.jpg') copy = original_img.copy() # The resized height in hundreds ratio = original_img.shape[0] / 500.0 img_resize = imutils.resize(original_img, height=500) # Displaying output cv2.imshow('Resized image', img_resize) # Waiting for the user to press any key cv2.waitKey(0)
The output of the above code is as follows:
Now you have The height of the original image is adjusted to 500 pixels.
Convert the adjusted RGB image to grayscale image. Most image processing libraries only handle grayscale images because they are easier to process.
gray_image = cv2.cvtColor(img_resize, cv2.COLOR_BGR2GRAY) cv2.imshow('Grayed Image', gray_image) cv2.waitKey(0)
Note the difference between the original image and the grayscale image.
Program output showing gray image on IDE
The color table becomes a black and white table.
Apply Gaussian blur filter to grayscale image to remove noise. The OpenCV canny function is then called to detect the edges present in the image.
blurred_image = cv2.GaussianBlur(gray_image, (5, 5), 0) edged_img = cv2.Canny(blurred_image, 75, 200) cv2.imshow('Image edges', edged_img) cv2.waitKey(0)
Edges are visible on the output.
The edges you will be working on are the edges of the document.
Detect the contour in the edge image. Sort in descending order, keeping only the five largest contours. By cyclically sorting the contours, the largest four-sided contour is approximately obtained.
cnts, _ = cv2.findContours(edged_img, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE) cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:5] for c in cnts: peri = cv2.arcLength(c, True) approx = cv2.approxPolyDP(c, 0.02 * peri, True) if len(approx) == 4: doc = approx break
An outline with four sides is likely to contain documents.
Circle the corners of the detected document outline. This will help you determine whether your program is able to detect the document in the image.
p = [] for d in doc: tuple_point = tuple(d[0]) cv2.circle(img_resize, tuple_point, 3, (0, 0, 255), 4) p.append(tuple_point) cv2.imshow('Circled corner points', img_resize) cv2.waitKey(0)
Circle a few corners of the adjusted RGB image.
After detecting the document, you now need to extract it from the image.
Warp perspective is a computer vision technique used to transform images to correct distortion. It transforms the image into different planes, allowing you to view the image from different angles.
warped_image = perspective_transform(copy, doc.reshape(4, 2) * ratio) warped_image = cv2.cvtColor(warped_image, cv2.COLOR_BGR2GRAY) cv2.imshow("Warped Image", imutils.resize(warped_image, height=650)) cv2.waitKey(0)
In order to get the distorted image, you need to create a simple module to perform the perspective transformation.
该模块将对文档角的点进行排序。它还会将文档图像转换成不同的平面,并将相机角度更改为俯拍。
打开之前创建的那个transform.py文件,导入OpenCV库和NumPy库。
import numpy as np import cv2
这个模块将含有两个函数。创建一个对文档角点的坐标进行排序的函数。第一个坐标将是左上角的坐标,第二个将是右上角的坐标,第三个将是右下角的坐标,第四个将是左下角的坐标。
def order_points(pts): # initializing the list of coordinates to be ordered rect = np.zeros((4, 2), dtype = "float32") s = pts.sum(axis = 1) # top-left point will have the smallest sum rect[0] = pts[np.argmin(s)] # bottom-right point will have the largest sum rect[2] = pts[np.argmax(s)] '''computing the difference between the points, the top-right point will have the smallest difference, whereas the bottom-left will have the largest difference''' diff = np.diff(pts, axis = 1) rect[1] = pts[np.argmin(diff)] rect[3] = pts[np.argmax(diff)] # returns ordered coordinates return rect
创建将计算新图像的角坐标,并获得俯拍的第二个函数。然后,它将计算透视变换矩阵,并返回扭曲的图像。
def perspective_transform(image, pts): # unpack the ordered coordinates individually rect = order_points(pts) (tl, tr, br, bl) = rect '''compute the width of the new image, which will be the maximum distance between bottom-right and bottom-left x-coordinates or the top-right and top-left x-coordinates''' widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2)) widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2)) maxWidth = max(int(widthA), int(widthB)) '''compute the height of the new image, which will be the maximum distance between the top-left and bottom-left y-coordinates''' heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2)) heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2)) maxHeight = max(int(heightA), int(heightB)) '''construct the set of destination points to obtain an overhead shot''' dst = np.array([ [0, 0], [maxWidth - 1, 0], [maxWidth - 1, maxHeight - 1], [0, maxHeight - 1]], dtype = "float32") # compute the perspective transform matrix transform_matrix = cv2.getPerspectiveTransform(rect, dst) # Apply the transform matrix warped = cv2.warpPerspective(image, transform_matrix, (maxWidth, maxHeight)) # return the warped image return warped
现在您已创建了转换模块。perspective_transform导入方面的错误现在将消失。
注意,显示的图像有俯拍。
在main.py文件中,对扭曲的图像运用高斯阈值。这将给扭曲的图像一个扫描后的外观。将扫描后的图像输出保存到含有程序文件的文件夹中。
T = threshold_local(warped_image, 11, offset=10, method="gaussian") warped = (warped_image > T).astype("uint8") * 255 cv2.imwrite('./'+'scan'+'.png',warped)
以PNG格式保存扫描件可以保持文档质量。
输出扫描后文档的图像:
cv2.imshow("Final Scanned image", imutils.resize(warped, height=650)) cv2.waitKey(0) cv2.destroyAllWindows()
下图显示了程序的输出,即扫描后文档的俯拍。
创建文档扫描器涉及计算机视觉的一些核心领域,计算机视觉是一个广泛而复杂的领域。为了在计算机视觉方面取得进步,您应该从事有趣味又有挑战性的项目。
您还应该阅读如何将计算机视觉与当前前技术结合使用方面的更多信息。这让您能了解情况,并为所处理的项目提供新的想法。
原文链接:https://www.makeuseof.com/python-create-document-scanner/
The above is the detailed content of How to build a document scanner in Python?. For more information, please follow other related articles on the PHP Chinese website!