Home > Article > Backend Development > Build a Text Extractor App with Python Code Under Lines Using Gradio and Hugging Face
original post: https://baxin.netlify.app/build-text-extractor-python-under-30-lines/
Extracting text from images, known as Optical Character Recognition (OCR), is a valuable feature for applications in document processing, data extraction, and accessibility. In this guide, we will create an OCR app using Python libraries like pytesseract for OCR, Pillow for image processing, and Gradio for building an interactive UI. We’ll deploy this app on Hugging Face Spaces.
Before starting, you’ll need a Hugging Face account and basic familiarity with Docker.
To deploy on Hugging Face Spaces with required system dependencies, such as Tesseract for OCR, we need a Dockerfile that configures the environment.
Create a Dockerfile with the following content:
# Use an official Python runtime as a parent image FROM python:3.12 ENV PIP_ROOT_USER_ACTION=ignore # Set the working directory in the container WORKDIR $HOME/app # Install system dependencies RUN apt-get update && apt-get install -y RUN apt-get install -y tesseract-ocr RUN apt-get install -y libtesseract-dev RUN apt-get install -y libgl1-mesa-glx RUN apt-get install -y libglib2.0-0 RUN pip install --upgrade pip # Copy requirements and install dependencies COPY requirements.txt requirements.txt RUN pip install --no-cache-dir -r requirements.txt # Copy the app code COPY app.py ./ # Expose the port for Gradio EXPOSE 7860 # Run the application CMD ["python", "app.py"]
import gradio as gr import pytesseract from PIL import Image import os def extract_text(image_path): if not image_path: return "No image uploaded. Please upload an image." if not os.path.exists(image_path): return f"Error: File not found at {image_path}" try: img = Image.open(image_path) text = pytesseract.image_to_string(img) return text if text.strip() else "No text detected in the image." except Exception as e: return f"An error occurred: {str(e)}" iface = gr.Interface( fn=extract_text, inputs=gr.Image(type="filepath", label="Upload an image"), outputs=gr.Textbox(label="Extracted Text"), title="Image Text Extractor", description="Upload an image and extract text from it using OCR." ) iface.launch(server_name="0.0.0.0", server_port=7860)
gradio pytesseract Pillow
This setup includes:
With all files created, push them to your Hugging Face Space
The above is the detailed content of Build a Text Extractor App with Python Code Under Lines Using Gradio and Hugging Face. For more information, please follow other related articles on the PHP Chinese website!