search
HomeWeb Front-endJS TutorialTurn Unstructured Emails to Actionable Data

Turn Unstructured Emails to Actionable Data

In this build, we’re building a tool designed for the logistics industry. This tool will automate the extraction of structured data from PDF attachments (such as requests for quotes or shipping information sheets) in emails, allowing this data to be used elsewhere in the workflow.

To make things easier to understand, let’s use Nova Logistics as an example—a fictional company specializing in transporting fragile electronics across various cities.

At Nova Logistics, customers reach out by email to request quotes for shipping items between cities and they usually attach a PDF that contains all the necessary shipping details. Currently, the process is manual: someone at Nova has to open each email, download the attached PDF, read through it, and extract key information like the item names and quantities before calculating the shipping cost.

This can take hours, especially when there are multiple emails per day, each with lengthy PDF documents.

In this article, we’ll walk through building a tool to automate this entire process—from fetching the emails and extracting the PDF data to sending the extracted information to Google Sheets.

How It Works

  1. Poll Emails: First, we’ll set up a system to regularly check for new emails from the inbox. Once an email is found, we’ll download the PDF attachment and apply a label to the email so it isn’t polled again in the future.
  2. Extract Data with Documind: We’ll pass the PDF as a URL to Documind, an open-source package that uses AI to extract structured data from documents. This will give us information like the item names, quantities, shipping details, weight, and more.
  3. Store and Use the Data: Finally, we’ll send the extracted data to Google Sheets, making it easy to view, track, and use for further calculations.

What We’ll Need

To build this tool, we’ll need the following packages:

  • Gmail API: To fetch emails from the inbox.
  • Supabase: To upload and store the PDFs.
  • Documind: To extract structured data from the PDFs.
  • Google Sheets API: To store the extracted data and calculate quotes.
  • Nango: To manage user authentication

Step 1: Initial set up

Before we start writing the code, we need to set up a few things. Don’t worry; I’ll guide you through each step.

1.1 Install Node.js

We’ll be using Node.js to run our code. If you don’t have Node.js installed, go to the Node.js website and download the latest version.

1.2 Install Required Libraries

Once Node.js is installed, we need to install the packages that will help us interact with Gmail, Google Sheets, Supabase, and Documind.

  1. Open a terminal or command prompt.
  2. Create a new folder for your project by running:

    mkdir nova
    cd nova
    
  3. Initialize the project:

    npm init -y
    
  4. Install the required packages:

    npm install googleapis @supabase/supabase-js documind dotenv @nangohq/node
    

1.3 Get API Credentials

Before we can start writing the code, you need to set up and get all the credentials to use the Google APIs (Gmail and Google Sheets), Supabase and Documind. Here’s a quick guide for each:

Google APIs

  1. Go to the Google Cloud Console.
  2. From the projects list, select a project or create a new one
  3. Enable Gmail API and Google Sheets API for your project:
    • Go to the API Library in the Cloud Console and search for "Gmail API" and "Google Sheets API". Click on each and enable them.
  4. Configure your consent screen:
    • Go to APIs & Services > OAuth consent screen.
    • Give your app a name.
    • Choose “External” as your audience type.
    • Fill out any other required fields.
  5. Create OAuth 2.0 credentials:
    • Go to APIs & Services > Credentials.
    • Click on Create Credentials and choose OAuth Client ID.
    • Choose “Web application” as the application type.
    • Copy your Client ID and Secret.
  6. To easily manage user OAuth across multiple platforms, I use Nango. You can check out their documentation on how to get started:
    • Log in to Nango and click on Configure New Integration.
    • Search for Google Mail in the list of integrations.
    • Add the Client ID and Secret you copied.
    • In the field for scopes, add https://www.googleapis.com/auth/gmail.readonly , https://www.googleapis.com/auth/gmail.modify and https://www.googleapis.com/auth/gmail.labels
    • Copy the callback URL for the integration and save.
    • Go back to Credentials on your Google console and add the callback URL as an authorized redirect URI.

Since we’re also using Google Sheets API, you can simply go through step 6 to create another integration on Nango. Search for the Google Sheets integration and use the same Client ID and Secret you copied. In the space for scopes, add https://www.googleapis.com/auth/spreadsheets

To publish your app, go to the OAuth consent screen in the Google console and click on the Publish button.

Supabase

  1. Sign up for a free account at Supabase.
  2. Create a new project and bucket for storing PDFs.
  3. Get the API URL and API Key from your project settings.

Step 2: Write the code

Now let’s write the code in small steps.

2.1 Add environment variables

Create a .env file to store all important variables that would be used through out the code. Here’s an example:

mkdir nova
cd nova

We’ll walk through how to get and use these variables further in the code.

2.2 Set up Gmail API and fetch emails

We’ll begin by using the Gmail API to fetch emails that don’t have the Processed label and contain attachments.

To retrieve the necessary access token, we’ll use Nango, which will automatically handle token refreshes if they expire, so you won’t need to worry about managing token lifecycles yourself.

All you need are:

  1. The Integration ID from the Gmail setup in Nango.
  2. The Connection ID for the user whose access token is needed.
  3. Your Nango secret key.

You can easily add a new connection directly through the Nango UI using your own Gmail account. Your secret key can be found in the environment settings section of the Nango dashboard.

npm init -y

For simplicity, we’ll limit the results to just five emails at a time, and we’ll specifically filter to only fetch emails that have PDF attachments. From those, we’ll retrieve just the first attachment for processing. After downloading the attachment, we’ll mark the email as processed by applying a label, ensuring that it won't be fetched again in future polling cycles.

2.2 Upload to Supabase

Next, we need upload the downloaded PDFs to Supabase. Make sure you replace the bucket name in the code with yours.

npm install googleapis @supabase/supabase-js documind dotenv @nangohq/node

2.3 Extract data using Documind

Once the PDF is stored in Supabase, we’ll use Documind to extract the relevant data. Since it leverages OpenAI for processing, make sure your API Key is added to the .env file.

Documind works with schemas that you define to extract the structured data you need. We’ll go over schema definition shortly, but feel free to check the documentation for more details.

SUPABASE_API_KEY=<supabase api key>
SUPABASE_URL=<supabase url>
OPENAI_API_KEY=<open ai api key>
NANGO_KEY=<nango secret key>
</nango></open></supabase></supabase>

2.4 Send the extracted data to Google Sheets

After extracting the data from the PDF, we’ll send it to Google Sheets.

Before proceeding, ensure that your Google Sheets is set up and you’ve created a connection with your account through Nango. If you haven’t already, here’s a template you can use to get started.

mkdir nova
cd nova

Step 3: Putting everything together

Now that we’ve written the individual functions, we need to bring everything together.

In this step, we’ll define the schema that Documind will use to extract the required data. This schema will guide the AI in identifying and structuring the relevant information from the PDFs.

npm init -y

Test the Code

The full source code is available on GitHub, along with a sample PDF for testing. However, you’re welcome to create and use your own documents as well. Simply clone the repository, modify the code to fit your requirements, and try it out for your own use case.

The above is the detailed content of Turn Unstructured Emails to Actionable Data. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Python vs. JavaScript: Which Language Should You Learn?Python vs. JavaScript: Which Language Should You Learn?May 03, 2025 am 12:10 AM

Choosing Python or JavaScript should be based on career development, learning curve and ecosystem: 1) Career development: Python is suitable for data science and back-end development, while JavaScript is suitable for front-end and full-stack development. 2) Learning curve: Python syntax is concise and suitable for beginners; JavaScript syntax is flexible. 3) Ecosystem: Python has rich scientific computing libraries, and JavaScript has a powerful front-end framework.

JavaScript Frameworks: Powering Modern Web DevelopmentJavaScript Frameworks: Powering Modern Web DevelopmentMay 02, 2025 am 12:04 AM

The power of the JavaScript framework lies in simplifying development, improving user experience and application performance. When choosing a framework, consider: 1. Project size and complexity, 2. Team experience, 3. Ecosystem and community support.

The Relationship Between JavaScript, C  , and BrowsersThe Relationship Between JavaScript, C , and BrowsersMay 01, 2025 am 12:06 AM

Introduction I know you may find it strange, what exactly does JavaScript, C and browser have to do? They seem to be unrelated, but in fact, they play a very important role in modern web development. Today we will discuss the close connection between these three. Through this article, you will learn how JavaScript runs in the browser, the role of C in the browser engine, and how they work together to drive rendering and interaction of web pages. We all know the relationship between JavaScript and browser. JavaScript is the core language of front-end development. It runs directly in the browser, making web pages vivid and interesting. Have you ever wondered why JavaScr

Node.js Streams with TypeScriptNode.js Streams with TypeScriptApr 30, 2025 am 08:22 AM

Node.js excels at efficient I/O, largely thanks to streams. Streams process data incrementally, avoiding memory overload—ideal for large files, network tasks, and real-time applications. Combining streams with TypeScript's type safety creates a powe

Python vs. JavaScript: Performance and Efficiency ConsiderationsPython vs. JavaScript: Performance and Efficiency ConsiderationsApr 30, 2025 am 12:08 AM

The differences in performance and efficiency between Python and JavaScript are mainly reflected in: 1) As an interpreted language, Python runs slowly but has high development efficiency and is suitable for rapid prototype development; 2) JavaScript is limited to single thread in the browser, but multi-threading and asynchronous I/O can be used to improve performance in Node.js, and both have advantages in actual projects.

The Origins of JavaScript: Exploring Its Implementation LanguageThe Origins of JavaScript: Exploring Its Implementation LanguageApr 29, 2025 am 12:51 AM

JavaScript originated in 1995 and was created by Brandon Ike, and realized the language into C. 1.C language provides high performance and system-level programming capabilities for JavaScript. 2. JavaScript's memory management and performance optimization rely on C language. 3. The cross-platform feature of C language helps JavaScript run efficiently on different operating systems.

Behind the Scenes: What Language Powers JavaScript?Behind the Scenes: What Language Powers JavaScript?Apr 28, 2025 am 12:01 AM

JavaScript runs in browsers and Node.js environments and relies on the JavaScript engine to parse and execute code. 1) Generate abstract syntax tree (AST) in the parsing stage; 2) convert AST into bytecode or machine code in the compilation stage; 3) execute the compiled code in the execution stage.

The Future of Python and JavaScript: Trends and PredictionsThe Future of Python and JavaScript: Trends and PredictionsApr 27, 2025 am 12:21 AM

The future trends of Python and JavaScript include: 1. Python will consolidate its position in the fields of scientific computing and AI, 2. JavaScript will promote the development of web technology, 3. Cross-platform development will become a hot topic, and 4. Performance optimization will be the focus. Both will continue to expand application scenarios in their respective fields and make more breakthroughs in performance.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.