26 KiB
Automate Invoice Processing with AI Data Extraction, Google Drive & Sheets
Automate Invoice Processing with AI Data Extraction, Google Drive & Sheets
1. Workflow Overview
Purpose:
This n8n workflow automates the entire process of handling incoming PDF invoices stored in a designated Google Drive folder. It leverages AI-powered data extraction to parse critical invoice details, organizes invoices into a structured yearly/monthly folder hierarchy on Google Drive, renames files consistently, and logs all extracted data in a Google Sheets spreadsheet for centralized documentation.
Target Use Cases:
- Businesses managing numerous PDF invoices requiring automated data extraction
- Accounting teams seeking automated filing and documentation of invoices
- Workflows requiring integration of Google Drive, Google Sheets, and AI-based text extraction
Logical Blocks:
- 1.1 Monitoring & Input Reception: Detect new invoice PDFs in a Google Drive folder and queue them for processing
- 1.2 Text Extraction from PDF: Download PDF files and convert their contents into extractable text
- 1.3 AI-Based Information Extraction: Use AI (LangChain + OpenAI GPT-4) to extract structured invoice data from raw text
- 1.4 Folder Management: Check for and create year and month folders in Google Drive as needed
- 1.5 File Handling: Move the invoice to the correct folder and rename it based on extracted data
- 1.6 Documentation: Append extracted invoice data into a Google Sheets spreadsheet for record-keeping
- 1.7 Initialization for Folder Creation: (Separate sub-flow) Prepares lists of years and months and iterates to ensure folders exist
2. Block-by-Block Analysis
2.1 Monitoring & Input Reception
Overview:
This block triggers the workflow when a new file is created in a specified Google Drive folder. It then lists the files in the folder and prepares them for batch processing.
Nodes Involved:
- Google Drive Trigger
- Search files and folders
- Loop (splitInBatches)
Node Details:
-
Google Drive Trigger
- Type: Trigger node for Google Drive
- Configuration: Watches a specific folder (
Rechnungsablage) for new files created, polling every minute - Input: Event-based trigger (no input)
- Output: New file metadata
- Potential Failures: API limits, folder permission errors
-
Search files and folders
- Type: Google Drive node, file/folder search
- Configuration: Filters files inside the same folder as trigger
- Input: Trigger output
- Output: List of file/folder metadata for processing
- Edge Cases: Empty folder, API errors
-
Loop (splitInBatches)
- Type: SplitInBatches node to process files individually or in small batches
- Configuration: Default batch size (1) to handle each file sequentially
- Input: List of files from previous node
- Output: Single file per iteration
- Notes: Ensures workflow scales with multiple invoices
2.2 Text Extraction from PDF
Overview:
Downloads the PDF file from Google Drive and extracts its text content for AI processing.
Nodes Involved:
- GetFile
- ExtractFromPDF
Node Details:
-
GetFile
- Type: Google Drive node (download operation)
- Configuration: Uses file ID from batch item to download the PDF file
- Input: Batch item with file metadata
- Output: File binary data for extraction
- Edge Cases: File not found, permission denied, download failure
-
ExtractFromPDF
- Type: Extract from File node (PDF operation)
- Configuration: Extracts text content from the downloaded PDF file
- Input: Binary PDF from GetFile
- Output: JSON with extracted text under
textfield - Edge Cases: Corrupted PDF, scanned PDFs without OCR, extraction failure
2.3 AI-Based Information Extraction
Overview:
Leverages AI to parse and extract structured invoice details from the raw text extracted from PDFs.
Nodes Involved:
- Information Extractor
Node Details:
- Information Extractor
- Type: LangChain Information Extractor node integrated with OpenAI GPT-4
- Configuration:
- System prompt instructs AI to extract specific invoice fields with German formatting conventions (comma as decimal separator, date format YYYY-MM-DD)
- Extracts attributes: Company, Customer, Invoice Number, Invoice Date, Net Amount, VAT, Month (written), Year, Article count, Customer number
- Input: Text extracted from PDF
- Output: JSON object with extracted attributes under
outputkey - Key Expressions: Uses AI to interpret text; fallback behavior skips missing attributes
- Edge Cases: AI misunderstanding, incomplete extraction, incorrect formats
- Sub-workflow: Uses OpenAI Chat Model node internally (configured but not explicitly chained in connections)
2.4 Folder Management (Year and Month Folders)
Overview:
Checks for the existence of year and month folders in the designated Google Drive folder structure, creating them if they do not exist.
Nodes Involved:
- GetYearFolder
- GetMonthFolder
- Jahresordner schon vorhanden? (Year folder exists?)
- Ordner noch nicht vorhanden (Folder not present)
- Jahresordner erstellen (Create year folder)
- Monate als Liste (Months as list)
- Split Out / Split Out1
- Loop Over Items
- Monatsordner einfügen (Add month folder)
Node Details:
-
GetYearFolder
- Type: Google Drive node (search folder)
- Configuration: Queries for a folder named by the extracted year under main “Buchhaltung / Rechnungen” folder
- Input: Extracted year from AI output
- Output: Folder metadata if found
- Edge Cases: Folder not found, API errors
-
GetMonthFolder
- Type: Google Drive node (search folder)
- Configuration: Queries for a folder named by the extracted month under the year folder
- Input: Folder ID from GetYearFolder and month string from AI output
- Output: Month folder metadata if found
- Edge Cases: Folder missing, API errors
-
Jahresordner schon vorhanden?
- Type: Google Drive node with query string to check if year folder exists
- Input: Year from batch iteration
- Output: List of folders found
- Edge Cases: Empty results indicate folder absence
-
Ordner noch nicht vorhanden (If node)
- Type: If node
- Configuration: Checks if folder list is empty (i.e., folder does not exist)
- Output: Branches to create folders or skip
-
Jahresordner erstellen
- Type: Google Drive node (folder creation)
- Configuration: Creates year folder inside main folder if missing
- Input: Year string
- Output: Folder metadata of created folder
-
Monate als Liste
- Type: Set node
- Configuration: Creates an array of month names (German) for folder creation
- Output: Array of months
-
Split Out / Split Out1
- Type: SplitOut nodes
- Configuration: Splits arrays of years and months for iterative processing
-
Loop Over Items
- Type: SplitInBatches
- Configuration: Processes each year or month individually to create folders
-
Monatsordner einfügen
- Type: Google Drive node (folder creation)
- Configuration: Creates month folders inside corresponding year folder
- Input: Month name and parent year folder ID
Notes:
- This block runs in a sub-flow style, iterating over predefined years and months to ensure folder structure is in place and avoids duplication by checking existence first.
- Sticky Notes describe this as a safety measure to prevent duplicate folder creation.
2.5 File Handling (Move & Rename)
Overview:
Moves the processed invoice PDF into the correct year/month folder and renames the file according to a naming convention based on extracted data.
Nodes Involved:
- MoveFile
- UpdateFileName
Node Details:
-
MoveFile
- Type: Google Drive node (move file operation)
- Configuration: Moves the file (using original file ID) to the month folder resolved earlier
- Input: File ID from GetFile node, destination folder ID from GetMonthFolder
- Output: Updated file metadata with new location
- Edge Cases: Permission errors, file locked, folder missing
-
UpdateFileName
- Type: Google Drive node (update file metadata)
- Configuration: Renames file using template:
"{{ Customer }} \n{{ Month }} {{ Year }}" - Input: File ID from MoveFile, extracted attributes Customer, Month, Year
- Output: Updated file metadata with new name
- Edge Cases: Naming conflicts, invalid characters, update failure
2.6 Documentation
Overview:
Appends a new row to a Google Sheets spreadsheet with all extracted invoice data for centralized record keeping.
Nodes Involved:
- AddToOverview
Node Details:
- AddToOverview
- Type: Google Sheets node (append operation)
- Configuration: Maps extracted invoice attributes to predefined columns in a Google Sheets spreadsheet, including:
- Year, VAT, Customer, Month, Net Amount, Customer Number, Invoice Date, Invoice Number, Company Location
- Input: Extracted JSON output from Information Extractor
- Output: Confirmation of row append
- Edge Cases: Sheet not found, permission denied, schema mismatch
2.7 Initialization for Folder Creation
Overview:
Sets up the list of years and months to be created as folders and triggers the folder creation flow.
Nodes Involved:
- Jahre auswählen (Set Years)
- Monate als Liste (Set Months)
- Split Out (years)
- Split Out1 (months)
- Loop Over Items
Node Details:
-
Jahre auswählen
- Type: Set node
- Configuration: Hardcoded array of years
[2024, 2025, 2026] - Output: Array for folder creation iteration
-
Monate als Liste
- Already described above
-
Split Out / Split Out1 and Loop Over Items
- Used to sequentially process each year and each month for folder creation as described in 2.4
3. Summary Table
| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
|---|---|---|---|---|---|
| Google Drive Trigger | Google Drive Trigger | Trigger on new file in folder | - | Search files and folders | |
| Search files and folders | Google Drive | List files in input folder | Google Drive Trigger | Loop | |
| Loop | SplitInBatches | Process files in batches | Search files and folders | GetFile | |
| GetFile | Google Drive | Download invoice PDF file | Loop | ExtractFromPDF | |
| ExtractFromPDF | ExtractFromFile (PDF) | Extract text from PDF | GetFile | Information Extractor | |
| Information Extractor | LangChain AI Extractor | Extract structured invoice data | ExtractFromPDF | GetYearFolder | |
| GetYearFolder | Google Drive | Search year folder | Information Extractor | GetMonthFolder | |
| GetMonthFolder | Google Drive | Search month folder | GetYearFolder | MoveFile | |
| MoveFile | Google Drive | Move invoice to correct folder | GetMonthFolder | UpdateFileName | |
| UpdateFileName | Google Drive | Rename invoice file | MoveFile | AddToOverview | |
| AddToOverview | Google Sheets | Append invoice data to sheet | UpdateFileName | Loop | |
| Jahre auswählen | Set | Define years list | - | Split Out | |
| Split Out | SplitOut | Split years array for iteration | Jahre auswählen | Loop Over Items | |
| Loop Over Items | SplitInBatches | Iterate over years/months | Split Out / Monatsordner einfügen | Jahresordner schon vorhanden? | |
| Jahresordner schon vorhanden? | Google Drive | Check if year folder exists | Loop Over Items | Ordner noch nicht vorhanden | |
| Ordner noch nicht vorhanden | If | Branch based on folder existence | Jahresordner schon vorhanden? | Jahresordner erstellen / Loop Over Items | |
| Jahresordner erstellen | Google Drive | Create year folder | Ordner noch nicht vorhanden | Monate als Liste | |
| Monate als Liste | Set | Define months list | Jahresordner erstellen | Split Out1 | |
| Split Out1 | SplitOut | Split months array for iteration | Monate als Liste | Monatsordner einfügen | |
| Monatsordner einfügen | Google Drive | Create month folder | Split Out1 | Loop Over Items | |
| Sticky Note | StickyNote | Workflow description & intro | - | - | 🧾 Automated Invoice Processing - n8n Workflow (Detailed description included) |
| Sticky Note1 | StickyNote | Title: Create folders for invoices | - | - | # Ordner für Rechnungen erstellen |
| Sticky Note2 | StickyNote | Text extraction stage title | - | - | ## Text aus PDF extrahieren |
| Sticky Note3 | StickyNote | AI extraction stage title | - | - | ## Informationsextrahierung mit KI |
| Sticky Note4 | StickyNote | Moving and renaming invoices | - | - | ## Rechnung in entsprechenden Ordner verschieben und umbennenen |
| Sticky Note5 | StickyNote | Documentation title | - | - | ## Dokumentieren |
| Sticky Note6 | StickyNote | Incoming invoice processing | - | - | # Eingehende Rechnungen verarbeiten |
| Sticky Note7 | StickyNote | Folder creation explanation | - | - | ## Jahres / Monatsordner erstellen (Folder creation safety and logic explanation) |
4. Reproducing the Workflow from Scratch
-
Create Google Drive Trigger node:
- Type: Google Drive Trigger
- Configure to watch the folder “Rechnungsablage” (folder ID:
1Zc6vLxnVqTX1oZbn5lj54bRntQaDOnqy) - Set event to "fileCreated"
- Poll every minute
-
Add Google Drive node "Search files and folders":
- Operation: List files/folders inside folder
Rechnungsablage - Connect trigger output to this node
- Operation: List files/folders inside folder
-
Add SplitInBatches node "Loop":
- No special config (default batch size 1)
- Connect output of search node to this node
-
Add Google Drive node "GetFile":
- Operation: Download file
- Use expression to get fileId from current batch item (
{{$json.id}}) - Connect Loop output to this node
-
Add ExtractFromFile node "ExtractFromPDF":
- Operation: PDF text extraction
- Connect GetFile output to this node
-
Add LangChain Information Extractor node "Information Extractor":
- Input: Use extracted text from previous node (
{{$json.text}}) - Configure system prompt to instruct AI to extract invoice data with German formatting (comma decimal separator, YYYY-MM-DD)
- Define attributes to extract: Company, Customer, Invoice Number, Invoice Date, Net Amount, VAT, Month (written), Year, Article count, Customer Number
- Connect ExtractFromPDF output to this node
- Input: Use extracted text from previous node (
-
Add Google Drive node "GetYearFolder":
- Operation: Search folder
- Folder ID: Main accounting folder
"Buchhaltung / Rechnungen"(folder ID:15m2uc0MJyvzv-QHvzXYu4Yfp4C2SG-ou) - Query string: Use extracted
Yearattribute from AI output - Connect Information Extractor output to this node
-
Add Google Drive node "GetMonthFolder":
- Operation: Search folder
- Folder ID: Use ID from GetYearFolder output
- Query string: Use extracted
Monthattribute from AI output - Connect GetYearFolder output to this node
-
Add Google Drive node "MoveFile":
- Operation: Move file
- File ID: Use from GetFile output
- Destination folder ID: Use from GetMonthFolder output
- Connect GetMonthFolder output to this node
-
Add Google Drive node "UpdateFileName":
- Operation: Update file metadata (rename)
- File ID: Use from MoveFile output
- New file name: Compose string from extracted Customer, Month, and Year (e.g.
{{Customer}} {{Month}} {{Year}}) - Connect MoveFile output to this node
-
Add Google Sheets node "AddToOverview":
- Operation: Append row
- Document ID: Your Google Sheets invoice overview spreadsheet
- Sheet Name: First sheet (gid=0)
- Map columns to extracted attributes: Year, VAT, Customer, Month, Net Amount, Customer Number, Invoice Date, Invoice Number, Company Location
- Connect UpdateFileName output to this node
-
Add Set node "Jahre auswählen":
- Define array of years to create folders for, e.g.,
[2024, 2025, 2026]
- Define array of years to create folders for, e.g.,
-
Add SplitOut node "Split Out":
- Split field:
jahrfor iteration
- Split field:
-
Add SplitInBatches node "Loop Over Items":
- To iterate over each year
-
Add Google Drive node "Jahresordner schon vorhanden?":
- Search for year folder inside main accounting folder
- Query string: current year
-
Add If node "Ordner noch nicht vorhanden":
- Condition: Check if search result is empty (folder absent)
-
Add Google Drive node "Jahresordner erstellen":
- Create folder with current year name inside main accounting folder
-
Add Set node "Monate als Liste":
- Define array of months in German (e.g.,
["Januar", "Februar", ... "Dezember"])
- Define array of months in German (e.g.,
-
Add SplitOut node "Split Out1":
- Split field:
monatefor iteration
- Split field:
-
Add Google Drive node "Monatsordner einfügen":
- Create month folder inside the newly created year folder
-
Connect nodes as per logic:
- Jahresordner erstellen → Monate als Liste → Split Out1 → Monatsordner einfügen → Loop Over Items (to iterate months)
- Loop Over Items → Jahresordner schon vorhanden? → Ordner noch nicht vorhanden → Jahresordner erstellen
- Handle "folder exists" branch to skip creation
-
Connect main invoice processing flow to folder management:
- Information Extractor → GetYearFolder → GetMonthFolder → MoveFile → UpdateFileName → AddToOverview → Loop
-
Credentials:
- Google Drive (OAuth2) with full access to input and output folders
- Google Sheets (OAuth2) with write access to the spreadsheet
- OpenAI API credentials configured for LangChain Information Extractor and Chat Model
-
Add Sticky Notes:
- Add notes describing workflow sections as per original for clarity
5. General Notes & Resources
| Note Content | Context or Link |
|---|---|
| Workflow automates invoice processing including AI extraction, file organization, and documentation in Sheets | Overview sticky note within workflow |
| Folder structure must be established before running workflow (Main accounting folder with year/month subfolders) | See Sticky Note1 and Sticky Note7 |
| AI extraction uses German formatting conventions (comma decimal, date format YYYY-MM-DD) | Configuration in Information Extractor node |
| Workflow requires Google Drive and Sheets API credentials with proper permissions | Setup instructions in Sticky Note |
| OpenAI GPT-4 is used for reliable data extraction | Node: OpenAI Chat Model (LangChain integration) |
| Workflow includes error considerations: missing folders, incomplete PDFs, API limits | Described in Sticky Note |
| For further customization, add attributes in Information Extractor or modify filename templates | Refer to Sticky Note for customization tips |
| Google Sheet columns must match mapped fields exactly | See AddToOverview node configuration |
| Testing approach: upload test PDFs to input folder and monitor workflow execution and outputs | Suggested in workflow description sticky note |
| Official n8n documentation and Google API docs recommended for troubleshooting | External resources not linked but recommended for users |
Disclaimer: The provided text is derived exclusively from an automated workflow created with n8n, respecting all current content policies. It contains no illegal or protected data, only legal and public information.