Files
nusquama 1c441ec3a4 creation
2025-11-13 14:37:27 +01:00

34 KiB
Raw Blame History

Extract Invoice Data from Email to Google Sheets using GPT-4o AI Automation

https://n8nworkflows.xyz/workflows/extract-invoice-data-from-email-to-google-sheets-using-gpt-4o-ai-automation-4376

Extract Invoice Data from Email to Google Sheets using GPT-4o AI Automation

1. Workflow Overview

This workflow automates the extraction of invoice data from email attachments and organizes it into Google Sheets using AI-powered processing. It is designed to support accountants, finance teams, and small businesses by eliminating manual invoice data entry, streamlining invoice processing, and enhancing financial record-keeping.

The workflow consists of the following logical blocks:

  • 1.1 Email Monitoring & Attachment Filtering: Watches a specific Gmail label for new emails with attachments and filters those emails to ensure they contain invoice files.
  • 1.2 Invoice Data Extraction: Extracts text data from PDF invoice attachments.
  • 1.3 AI-Powered Invoice Analysis: Uses OpenAI GPT-4o-based AI to parse extracted text, identify invoice elements, and convert them into structured JSON data.
  • 1.4 Spreadsheet Creation & Organization: Creates a new Google Sheet per invoice batch, moves it to a designated Drive folder, and appends the parsed invoice data into the spreadsheet.

2. Block-by-Block Analysis

2.1 Email Monitoring & Attachment Filtering

  • Overview:
    This block continuously monitors Gmail inbox emails labeled for invoice processing, downloads attachments, and filters incoming emails to ensure they contain attachments before further processing.

  • Nodes Involved:

    • Gmail Trigger
    • Attachment Verification
  • Node Details:

    • Gmail Trigger

      • Type: Trigger node for Gmail
      • Configuration:
        • Triggers every minute
        • Watches emails with label ID corresponding to the custom invoice label ("Label_1393502052621954450")
        • Downloads attachments automatically
      • Inputs: None (trigger node)
      • Outputs: Email data including attachments (binary)
      • Potential Failures: Authentication errors (OAuth token expiration), Gmail API rate limits, label misconfiguration
      • Version: 1.2
    • Attachment Verification

      • Type: Filter node
      • Configuration:
        • Checks if the email item contains any binary data (attachments)
        • Condition: binary data must exist (strict object existence check)
      • Inputs: Gmail Trigger output
      • Outputs: Only emails containing attachments pass through
      • Edge Cases: Emails without attachments filtered out; failure if expression evaluation fails due to missing binary property
      • Version: 2.2

2.2 Invoice Data Extraction

  • Overview:
    Extracts text content from PDF attachments, preparing raw invoice data for AI analysis.

  • Nodes Involved:

    • Extract Invoice data
  • Node Details:

    • Extract Invoice data
      • Type: File extraction node (PDF operation)
      • Configuration:
        • Extracts text content from binary files of the incoming email attachment
        • Uses dynamic binary property name from the attachment verification node
      • Inputs: Filtered email with attachments (binary PDF files)
      • Outputs: Extracted text content from PDFs
      • Edge Cases: Failure when attachment is not a valid PDF, corrupted files, or scanned images without embedded text
      • Version: 1

2.3 AI-Powered Invoice Analysis

  • Overview:
    Processes extracted text through an AI agent that identifies key invoice fields and outputs structured JSON data describing invoice details.

  • Nodes Involved:

    • OpenAI Chat Model
    • Invoice AI Agent
  • Node Details:

    • OpenAI Chat Model

      • Type: Language model node (OpenAI GPT-4o-mini)
      • Configuration:
        • Uses GPT-4o-mini model
        • Outputs formatted as JSON objects
      • Credentials: OpenAI API key configured
      • Inputs: Text extracted from PDFs (connected via the Invoice AI Agent)
      • Outputs: AI-generated JSON with invoice data
      • Edge Cases: API key limits, rate limits, response format errors, partial or ambiguous invoice text leading to inaccurate extraction
      • Version: 1.2
    • Invoice AI Agent

      • Type: Langchain AI agent node
      • Configuration:
        • Receives raw text from extracted PDFs
        • Uses a system prompt defining the role as a financial advisor extracting invoice data
        • Outputs JSON containing predefined invoice keys (e.g., billed_to, invoice_number, items, tax details)
        • Output parser enabled for structured data extraction
      • Inputs: Extracted PDF text
      • Outputs: Structured invoice JSON data
      • Edge Cases: Misinterpretation of text, missing invoice fields, unexpected invoice formats
      • Version: 1.9

2.4 Spreadsheet Creation & Organization

  • Overview:
    Creates a timestamped Google Sheet for each invoice batch, moves it to a specified Google Drive folder, and appends the structured invoice data into the sheet.

  • Nodes Involved:

    • Create blank spreadsheet
    • Move spreadsheet in invoice folder
    • Preparing Final data
    • Final Spreadsheet with Invoice data
  • Node Details:

    • Create blank spreadsheet

      • Type: Google Sheets node
      • Configuration:
        • Creates a new spreadsheet titled with prefix "invoice_" plus current timestamp
        • Creates a sheet named "invoice_details"
      • Inputs: Triggered after AI agent outputs structured data
      • Outputs: Spreadsheet metadata including spreadsheet ID
      • Edge Cases: Google Sheets API quota limitations, failures in sheet creation, invalid naming conventions
      • Version: 4.5
    • Move spreadsheet in invoice folder

      • Type: Google Drive node
      • Configuration:
        • Moves the newly created spreadsheet to a specific Drive folder (ID: "1JIdajTJvK6gj4bRjniBJvHEJbQ1pn3AM", named "invoices")
        • Operates on the file ID received from the previous node
      • Inputs: Spreadsheet metadata from creation node
      • Outputs: Confirmation of file move
      • Edge Cases: Drive permission errors, invalid folder ID, API rate limits
      • Version: 3
    • Preparing Final data

      • Type: Code node (JavaScript)
      • Configuration:
        • Parses JSON output from the AI Agent node to convert it into a usable JavaScript object for spreadsheet insertion
        • Processes the first items JSON output string into an object
      • Inputs: Output of the AI Agent after file move
      • Outputs: Parsed JSON object for spreadsheet
      • Edge Cases: Malformed JSON causing parse errors, empty or null AI output
      • Version: 2
    • Final Spreadsheet with Invoice data

      • Type: Google Sheets node
      • Configuration:
        • Appends rows to the "invoice_details" sheet of the created spreadsheet
        • Uses auto-mapping for 25+ invoice-related fields such as billed_to, invoice_number, line items, tax details, payment info, and company contacts
        • Does not convert fields to strings by default, preserving original data types
      • Inputs: Parsed JSON from code node
      • Outputs: Confirmation of data append
      • Edge Cases: Mismatch between JSON keys and sheet columns, API limits, concurrent writes
      • Version: 4.5

3. Summary Table

Node Name Node Type Functional Role Input Node(s) Output Node(s) Sticky Note
Gmail Trigger Gmail Trigger Email monitoring and attachment download trigger None Attachment Verification AI Invoice Processor Agent: Email to Structured Data
Automatically extract, analyze, and organize invoice data from Gmail attachments into structured Google Sheets. Perfect for: Accountants, small businesses, finance teams.
Uses Gmail API, PDF extraction, OpenAI GPT-4, Google Sheets, Google Drive.
Requires Gmail OAuth2, OpenAI API Key, Google Sheets OAuth2, Google Drive OAuth2.
Gmail label setup required.
Workflow steps overview and pro tips included.
Customizable for approval workflows, OCR, duplicate detection, and payment reminders.
Attachment Verification Filter Filters emails to ensure attachments exist Gmail Trigger Extract Invoice data See above sticky note (covers entire workflow context).
Extract Invoice data Extract from File Extracts text from PDF attachments Attachment Verification Invoice AI Agent See above sticky note (covers entire workflow context).
Invoice AI Agent Langchain AI Agent Parses extracted text to structured invoice JSON Extract Invoice data Create blank spreadsheet See above sticky note (covers entire workflow context).
Create blank spreadsheet Google Sheets Creates new spreadsheet for invoice data Invoice AI Agent Move spreadsheet in invoice folder See above sticky note (covers entire workflow context).
Move spreadsheet in invoice folder Google Drive Moves new sheet to designated folder Create blank spreadsheet Preparing Final data See above sticky note (covers entire workflow context).
Preparing Final data Code Parses AI JSON output for sheet writing Move spreadsheet in invoice folder Final Spreadsheet with Invoice data See above sticky note (covers entire workflow context).
Final Spreadsheet with Invoice data Google Sheets Appends invoice data into Google Sheet Preparing Final data None See above sticky note (covers entire workflow context).
Sticky Note Sticky Note Documentation and workflow explanation None None AI Invoice Processor Agent: Email to Structured Data
Automatically extract, analyze, and organize invoice data from Gmail attachments into structured Google Sheets. Perfect for:
Accountants & bookkeepers — eliminate manual data entry from invoices
Small businesses — streamline invoice processing and tracking
Finance teams — automate accounts payable workflows
⚙️ What's Used: Gmail Trigger, File Extraction, AI Agent, OpenAI Chat Model, Google Sheets, Google Drive
📧 Gmail Setup Required: Label emails for invoice processing
📊 Extracted Data Fields: 25+ invoice elements including header, line items, tax, company info, payment info, addresses
💡 Pro Tips: Clear labeling, text-based PDFs, folder organization, AI prompt refinement, email filters
🛠️ Customize: Approval, accounting system integration, OCR, duplicate detection, payment reminders

4. Reproducing the Workflow from Scratch

  1. Create Gmail Trigger node:

    • Type: Gmail Trigger
    • Configure to poll every minute
    • Set filter to monitor emails with the Gmail label ID for invoices (create a dedicated Gmail label, e.g., "Invoice Processing")
    • Enable attachment download
    • Set OAuth2 credentials for Gmail
  2. Add Filter node (Attachment Verification):

    • Type: Filter
    • Condition: Check if binary property exists in incoming email item (ensures presence of attachments)
    • Connect Gmail Trigger output to this filter node
  3. Add Extract From File node (Extract Invoice data):

    • Type: Extract from File
    • Operation: PDF text extraction
    • Binary property name: dynamically set to the binary key(s) of the attachment from the previous node
    • Connect output of Attachment Verification node
  4. Add Langchain AI Agent node (Invoice AI Agent):

    • Type: Langchain AI Agent
    • Input: Text from Extract Invoice data node
    • Configure prompt with system message defining role as financial advisor extracting invoice JSON data
    • Use structured output parser enabled
    • Connect Extract Invoice data output
    • Configure OpenAI API credentials (GPT-4o-mini or GPT-4o model)
  5. Add Google Sheets node (Create blank spreadsheet):

    • Type: Google Sheets
    • Operation: Create Spreadsheet
    • Title: invoice_{{ $now }} (dynamic timestamp)
    • Create a sheet named "invoice_details"
    • Connect output of Invoice AI Agent node
    • Set Google Sheets OAuth2 credentials
  6. Add Google Drive node (Move spreadsheet in invoice folder):

    • Type: Google Drive
    • Operation: Move file
    • File ID: use spreadsheet ID from Create blank spreadsheet node output
    • Target folder ID: set to your designated invoice folder in Google Drive (e.g., folder ID "1JIdajTJvK6gj4bRjniBJvHEJbQ1pn3AM")
    • Connect Create blank spreadsheet output
    • Set Google Drive OAuth2 credentials
  7. Add Code node (Preparing Final data):

    • Type: Code (JavaScript)
    • Set mode: run once for each item
    • Code: parse JSON string output from Invoice AI Agent node into a JavaScript object for spreadsheet insertion. Example:
      let invoice_details = $('Invoice AI Agent').item.json.output;
      return JSON.parse(invoice_details);
      
    • Connect output of Move spreadsheet in invoice folder node
  8. Add Google Sheets node (Final Spreadsheet with Invoice data):

    • Type: Google Sheets
    • Operation: Append
    • Document ID: use spreadsheet ID from Create blank spreadsheet node
    • Sheet name: "invoice_details"
    • Columns: Map at least 25+ invoice-related fields (billed_to, invoice_number, item_0_description, tax fields, company info, payment info, addresses, etc.) with auto-mapping enabled
    • Connect output of Preparing Final data node
    • Use Google Sheets OAuth2 credentials
  9. Optional: Add Sticky Note node:

    • Add comprehensive documentation and instructions for users and maintainers

5. General Notes & Resources

Note Content Context or Link
Create and apply a dedicated Gmail label (e.g., "Invoice Processing") to all incoming invoice emails to enable workflow triggers. Gmail label configuration
Ensure invoice PDFs are text-based (not scanned images) to allow effective PDF text extraction. For scanned images, consider integrating OCR preprocessing before AI extraction. PDF extraction best practices
Google Drive folder for invoices must have appropriate permissions for workflow OAuth2 credentials. Create separate folders per accounting period if needed. Google Drive folder management
AI prompt can be refined to improve accuracy for specific invoice formats or languages. Test with sample invoices and adjust the prompt or parsing logic if necessary. AI prompt tuning
This workflow uses OpenAI GPT-4o-mini model; newer or more capable GPT-4o models can be used if available for better accuracy. OpenAI models documentation
Consider adding error handling nodes or notification mechanisms for failed steps or unexpected inputs to improve reliability. Workflow robustness
For scalability, add deduplication logic to avoid processing the same invoice multiple times. Duplicate invoice detection
Extend workflow to integrate with accounting software (e.g., QuickBooks, Xero, SAP) or payment reminder systems for end-to-end automation. System integration

Disclaimer: The provided description and nodes derive exclusively from an n8n automated workflow. The content complies strictly with applicable content policies and includes only legal, public data.