49 KiB
Mask PII in documents for GDPR-safe AI processing with Postgres and Claude
Mask PII in documents for GDPR-safe AI processing with Postgres and Claude
1. Workflow Overview
This workflow implements a GDPR-oriented document-processing pipeline for PDF uploads. It extracts text from an uploaded document, detects personally identifiable information (PII), replaces detected values with tokens, stores originals in a Postgres vault, sends only masked text to an AI model, and optionally restores approved values afterward. It also writes compliance events to an audit log.
The intended use cases include:
- Safe AI summarization or extraction from documents containing personal data
- Privacy-preserving preprocessing before sending data to Claude
- Token-based de-identification with controlled re-injection
- Auditability for compliance programs
1.1 Document Intake and Baseline Configuration
The workflow starts from a webhook that accepts uploaded files. A Set node then adds runtime configuration such as documentId, confidence threshold, and Postgres table names.
1.2 OCR Text Extraction
The uploaded PDF is passed to an OCR/text extraction step that produces text used by all downstream PII detectors.
1.3 PII Detection Layer
Multiple parallel detection branches scan the extracted text:
- Regex/code-based email detection
- Regex/code-based phone detection
- Regex/code-based ID number detection
- AI-based physical address detection using a LangChain agent with Ollama
1.4 Detection Merge and Consolidation
All detection outputs are merged, then a code node attempts to resolve overlaps, remove duplicates, and produce a consolidated PII map.
1.5 Tokenization and Vault Storage
Detected PII values are converted into placeholder tokens, and original values are prepared for insertion into a Postgres vault table.
1.6 Masked Text Generation and Safety Gate
The workflow builds a masked version of the OCR text and checks whether masking succeeded. If masking fails, AI processing is blocked and an alert is sent.
1.7 AI Processing on Masked Data
If masking is successful, the masked text is sent to a Claude-based AI agent for structured document analysis. A structured output parser constrains the response format.
1.8 Controlled Re-Injection
The workflow analyzes the AI output for tokens, decides whether certain fields may be restored, queries the vault for original values, and attempts to restore approved values only.
1.9 Audit Logging
The final restoration output is written to a Postgres audit table for traceability.
2. Block-by-Block Analysis
2.1 Document Intake and Baseline Configuration
Overview
This block receives the uploaded PDF through an HTTP webhook and initializes workflow-level settings used later for vault and audit storage. It also creates a timestamp-based document identifier.
Nodes Involved
- Document Upload Webhook
- Workflow Configuration
Node Details
Document Upload Webhook
- Type and technical role:
n8n-nodes-base.webhook; entry point for inbound HTTP document uploads. - Configuration choices:
- HTTP method:
POST - Path:
gdpr-document-upload - Response mode:
lastNode - Raw body enabled
- HTTP method:
- Key expressions or variables used:
- None in configuration
- Input and output connections:
- Input: none
- Output:
Workflow Configuration
- Version-specific requirements:
- Type version
2.1
- Type version
- Edge cases or potential failure types:
- Invalid upload format
- Missing binary file payload
- If OCR expects binary and webhook payload is malformed, downstream extraction fails
- Response may hang if downstream nodes fail without error handling
- Sub-workflow reference: none
Workflow Configuration
- Type and technical role:
n8n-nodes-base.set; enriches the incoming item with workflow variables. - Configuration choices:
- Adds:
documentId = {{$now.toISO()}}confidenceThreshold = 0.8vaultTable = "pii_vault"auditTable = "pii_audit_log"
- Includes other incoming fields
- Adds:
- Key expressions or variables used:
$now.toISO()
- Input and output connections:
- Input:
Document Upload Webhook - Output:
OCR Extract Text
- Input:
- Version-specific requirements:
- Type version
3.4
- Type version
- Edge cases or potential failure types:
- Timestamp-based
documentIdmay not be unique enough under high concurrency compared to UUIDs - If later nodes expect different field names, data mismatch can occur
- Timestamp-based
- Sub-workflow reference: none
2.2 OCR Text Extraction
Overview
This block extracts text from the uploaded PDF and preserves source data. The extracted text becomes the canonical input for all PII detection branches.
Nodes Involved
- OCR Extract Text
Node Details
OCR Extract Text
- Type and technical role:
n8n-nodes-base.extractFromFile; PDF text extraction/OCR-style extraction from uploaded file content. - Configuration choices:
- Operation:
pdf - Keep source:
both
- Operation:
- Key expressions or variables used:
- None
- Input and output connections:
- Input:
Workflow Configuration - Outputs to:
Email DetectorPhone DetectorID Number DetectorAddress Detector AI
- Input:
- Version-specific requirements:
- Type version
1.1
- Type version
- Edge cases or potential failure types:
- Fails if no binary PDF is present
- OCR/text extraction quality may be poor for scanned or image-heavy PDFs
- Large PDFs may increase runtime or memory use
- Extracted text field naming consistency matters because code nodes use
textorextractedText
- Sub-workflow reference: none
2.3 PII Detection Layer
Overview
This block runs multiple detectors in parallel on the OCR output. Three are regex-based code nodes, and one is an AI-powered address detector using Ollama plus a structured output parser.
Nodes Involved
- Email Detector
- Phone Detector
- ID Number Detector
- Address Detector AI
- Address Output Parser
- Ollama Chat Model
Node Details
Email Detector
- Type and technical role:
n8n-nodes-base.code; regex-based email extraction from OCR text. - Configuration choices:
- Uses a global email regex
- Emits objects with:
valuetype = "email"start_posend_posconfidence = 1.0
- Returns payload under
detections
- Key expressions or variables used:
item.json.text
- Input and output connections:
- Input:
OCR Extract Text - Output:
Merge PII Detections
- Input:
- Version-specific requirements:
- Type version
2
- Type version
- Edge cases or potential failure types:
- Can miss unusual but valid email formats
- Can detect OCR-corrupted strings as emails
- Uses
start_pos/end_pos, but downstream consolidation expectsstart/end
- Sub-workflow reference: none
Phone Detector
- Type and technical role:
n8n-nodes-base.code; regex-based phone detection with several common formats. - Configuration choices:
- Uses multiple regex patterns for international and domestic formats
- Deduplicates values with
Set - Outputs under
detected_pii, notdetections - Includes
start_posandend_pos
- Key expressions or variables used:
item.json.text
- Input and output connections:
- Input:
OCR Extract Text - Output:
Merge PII Detections
- Input:
- Version-specific requirements:
- Type version
2
- Type version
- Edge cases or potential failure types:
- Output schema is inconsistent with other detectors
- Because downstream merger/consolidator reads
detections, phone matches may be ignored - OCR spacing/noise may break patterns
- Possible false positives on numeric strings
- Sub-workflow reference: none
ID Number Detector
- Type and technical role:
n8n-nodes-base.code; regex-based detection of SSN, PAN, driver’s license, bank account numbers, and IBAN. - Configuration choices:
- Reads
extractedTextortext - Emits array under
detections - Uses subtype values such as
ssn,pan,drivers_license,bank_account,iban
- Reads
- Key expressions or variables used:
item.json.extractedText || item.json.text || ''
- Input and output connections:
- Input:
OCR Extract Text - Output:
Merge PII Detections
- Input:
- Version-specific requirements:
- Type version
2
- Type version
- Edge cases or potential failure types:
- High false-positive risk for generic long numbers
- Validation is minimal
- Contains an odd SSN exclusion check against
'+1234567890', likely unintended - Uses
start_pos/end_pos, but downstream consolidation expectsstart/end
- Sub-workflow reference: none
Address Detector AI
- Type and technical role:
@n8n/n8n-nodes-langchain.agent; LLM agent used to detect physical addresses in text. - Configuration choices:
- Input text:
={{ $json.text }} - Prompt type: defined directly
- Uses a system message instructing the model to detect physical addresses and return positions plus confidence
- Structured output parser enabled
- Input text:
- Key expressions or variables used:
$json.text
- Input and output connections:
- Main input:
OCR Extract Text - AI language model input:
Ollama Chat Model - AI output parser input:
Address Output Parser - Main output:
Merge PII Detections
- Main input:
- Version-specific requirements:
- Type version
3 - Requires LangChain-compatible n8n installation
- Type version
- Edge cases or potential failure types:
- LLM may hallucinate addresses or positions
- Output schema may not match what downstream consolidation expects
- If
textis empty, output quality collapses - Depends on reachable Ollama instance and model availability
- Sub-workflow reference: none
Address Output Parser
- Type and technical role:
@n8n/n8n-nodes-langchain.outputParserStructured; enforces JSON schema for address detections. - Configuration choices:
- Manual JSON schema with root property
addresses - Each address object includes:
valuetypeenumaddressstart_posend_posconfidence
- Manual JSON schema with root property
- Key expressions or variables used:
- None
- Input and output connections:
- Output parser connection into
Address Detector AI
- Output parser connection into
- Version-specific requirements:
- Type version
1.3
- Type version
- Edge cases or potential failure types:
- If model output cannot be parsed, the agent node fails
- Schema root uses
addresses, notdetections, causing downstream mismatch unless normalized
- Sub-workflow reference: none
Ollama Chat Model
- Type and technical role:
@n8n/n8n-nodes-langchain.lmChatOllama; local LLM backend for address detection. - Configuration choices:
- No explicit model shown in parameters
- Key expressions or variables used:
- None
- Input and output connections:
- Output to
Address Detector AIas language model
- Output to
- Version-specific requirements:
- Type version
1 - Requires a running Ollama service accessible to n8n
- Type version
- Edge cases or potential failure types:
- Missing Ollama endpoint/model
- Latency or model startup delays
- Local model quality may be inconsistent
- Sub-workflow reference: none
2.4 Detection Merge and Consolidation
Overview
This block gathers outputs from the parallel detectors and tries to build a single conflict-resolved PII list. It is meant to remove duplicates and overlapping detections before tokenization.
Nodes Involved
- Merge PII Detections
- PII Consolidation & Conflict Resolver
Node Details
Merge PII Detections
- Type and technical role:
n8n-nodes-base.merge; combines multiple detector outputs. - Configuration choices:
- Mode:
combine - Combine by:
combineAll
- Mode:
- Key expressions or variables used:
- None
- Input and output connections:
- Inputs from:
Email DetectorPhone DetectorID Number DetectorAddress Detector AI
- Output:
PII Consolidation & Conflict Resolver
- Inputs from:
- Version-specific requirements:
- Type version
3.2
- Type version
- Edge cases or potential failure types:
- Multi-input semantics depend on n8n merge behavior; misalignment can produce unexpected payload structure
- Since detector outputs are structurally inconsistent, combined result may not be usable as intended
- Sub-workflow reference: none
PII Consolidation & Conflict Resolver
- Type and technical role:
n8n-nodes-base.code; intended to normalize all detections, resolve overlaps, deduplicate, and emit a final PII map. - Configuration choices:
- Reads all incoming items
- Collects only
item.json.detections - Sorts on
a.start - b.start - Resolves overlap by confidence, then span length
- Deduplicates by
type|value|start|end - Creates
piiMapwith ids likepii_1
- Key expressions or variables used:
$input.all()$input.first().json.extractedText || $input.first().json.text || ''
- Input and output connections:
- Input:
Merge PII Detections - Output:
Tokenization & Vault Storage
- Input:
- Version-specific requirements:
- Type version
2
- Type version
- Edge cases or potential failure types:
- Major schema mismatch:
- detectors emit
start_pos/end_pos, notstart/end - phone detector emits
detected_pii, notdetections - address parser root is
addresses, notdetections
- detectors emit
- As written, overlap logic may fail or produce incorrect sort order
originalTextmay not survive merge reliably
- Major schema mismatch:
- Sub-workflow reference: none
2.5 Tokenization and Vault Storage
Overview
This block generates token placeholders for detected PII values and prepares the records to be stored in a Postgres vault. It is the core privacy-preserving step before AI access.
Nodes Involved
- Tokenization & Vault Storage
- Store Tokens in Vault
Node Details
Tokenization & Vault Storage
- Type and technical role:
n8n-nodes-base.code; generates random token strings and vault records, and also builds a masked text copy. - Configuration choices:
- Expects
item.json.consolidatedPII - Generates tokens like
<<TYPE_AB12>> - Builds:
vaultRecordstokenMapmaskedTextdocumentId
- Expects
- Key expressions or variables used:
item.json.consolidatedPII || []item.json.originalText || ''
- Input and output connections:
- Input:
PII Consolidation & Conflict Resolver - Output:
Store Tokens in Vault
- Input:
- Version-specific requirements:
- Type version
2
- Type version
- Edge cases or potential failure types:
- Critical field mismatch: previous node outputs
piiMap, notconsolidatedPII - As configured, likely generates zero tokens unless corrected
- Random 4-hex token suffix has collision risk in larger volumes
- Replacing by raw value can unintentionally replace repeated context outside intended positions
- Critical field mismatch: previous node outputs
- Sub-workflow reference: none
Store Tokens in Vault
- Type and technical role:
n8n-nodes-base.postgres; inserts token/original-value pairs into Postgres vault table. - Configuration choices:
- Schema:
public - Table name from
Workflow Configuration.vaultTable - Column mapping:
type = {{$json.type}}token = "YOUR_CREDENTIAL_HERE"← placeholder/static valuecreated_atdocument_idoriginal_value
- Schema:
- Key expressions or variables used:
={{ $('Workflow Configuration').first().json.vaultTable }}- Several
{{$json...}}mappings
- Input and output connections:
- Input:
Tokenization & Vault Storage - Output:
Generate Masked Text
- Input:
- Version-specific requirements:
- Type version
2.6 - Requires Postgres credentials
- Type version
- Edge cases or potential failure types:
- Appears misconfigured:
- token column is hardcoded to
"YOUR_CREDENTIAL_HERE" - likely not iterating correctly over
vaultRecords
- token column is hardcoded to
- If table schema differs from mapping, inserts fail
- Missing insert/operation specifics may depend on default behavior
- Appears misconfigured:
- Sub-workflow reference: none
2.6 Masked Text Generation and Safety Gate
Overview
This block attempts to create the final masked text used for AI processing and verifies whether masking was successful. If not, the workflow stops AI exposure and sends an alert.
Nodes Involved
- Generate Masked Text
- Masking Success Check
- Block AI Processing
- Send Alert Notification
Node Details
Generate Masked Text
- Type and technical role:
n8n-nodes-base.code; replaces original PII values in OCR text using vault records. - Configuration choices:
- Reads OCR text directly from
OCR Extract Text - Reads tokenized records from
Store Tokens in Vault - Replaces each
original_valuewithtoken - Returns:
masked_textoriginal_texttoken_countmasking_successreplacements
- Reads OCR text directly from
- Key expressions or variables used:
$('OCR Extract Text').first().json$('Store Tokens in Vault').all()
- Input and output connections:
- Input:
Store Tokens in Vault - Output:
Masking Success Check
- Input:
- Version-specific requirements:
- Type version
2
- Type version
- Edge cases or potential failure types:
- If vault inserts do not return
tokenandoriginal_value, masking fails - If Postgres node output differs from expected insert-return format, no replacements occur
- Raw string replacement can replace unintended duplicate values
- If vault inserts do not return
- Sub-workflow reference: none
Masking Success Check
- Type and technical role:
n8n-nodes-base.if; safety gate before AI processing. - Configuration choices:
- Condition checks whether
$('Generate Masked Text').item.json.masking_successequalstrue
- Condition checks whether
- Key expressions or variables used:
={{ $('Generate Masked Text').item.json.masking_success }}
- Input and output connections:
- Input:
Generate Masked Text - True output:
AI Processing (Masked Data) - False output:
Block AI Processing
- Input:
- Version-specific requirements:
- Type version
2.3
- Type version
- Edge cases or potential failure types:
- If expression fails or item shape is unexpected, routing may misbehave
- If zero PII is found, current logic may treat masking as failure depending on earlier output
- Sub-workflow reference: none
Block AI Processing
- Type and technical role:
n8n-nodes-base.set; creates a blocked-status payload when masking is unsafe. - Configuration choices:
- Sets:
error = "Masking failed - AI processing blocked"status = "BLOCKED"requires_manual_review = true
- Sets:
- Key expressions or variables used:
- None
- Input and output connections:
- Input:
Masking Success Checkfalse branch - Output:
Send Alert Notification
- Input:
- Version-specific requirements:
- Type version
3.4
- Type version
- Edge cases or potential failure types:
- Does not preserve original fields unless default include behavior does so; downstream alert expects fields not guaranteed to exist
- Sub-workflow reference: none
Send Alert Notification
- Type and technical role:
n8n-nodes-base.httpRequest; sends failure notification to an external endpoint. - Configuration choices:
- Method:
POST - URL placeholder must be replaced
- Sends body with:
error_details = {{$json.error_details}}document_id = {{$json.document_id}}timestamp = {{$now.toISO()}}
- Method:
- Key expressions or variables used:
$json.error_details$json.document_id$now.toISO()
- Input and output connections:
- Input:
Block AI Processing - Output: none
- Input:
- Version-specific requirements:
- Type version
4.3
- Type version
- Edge cases or potential failure types:
- Placeholder URL causes immediate failure until configured
error_detailsanddocument_idare not set by the previous node as currently written- External webhook/network/auth errors possible
- Sub-workflow reference: none
2.7 AI Processing on Masked Data
Overview
This block sends the masked document to Claude for structured analysis. A structured parser constrains the result format so downstream logic can inspect it programmatically.
Nodes Involved
- AI Processing (Masked Data)
- AI Processing Model
- AI Output Parser
Node Details
AI Processing (Masked Data)
- Type and technical role:
@n8n/n8n-nodes-langchain.agent; AI agent that processes only masked text. - Configuration choices:
- Input text:
={{ $json.masked_text }} - System message instructs the model to preserve tokens like
<<EMAIL_7F3A>> - Structured output parser enabled
- Input text:
- Key expressions or variables used:
$json.masked_text
- Input and output connections:
- Main input:
Masking Success Checktrue branch - AI language model input:
AI Processing Model - AI output parser input:
AI Output Parser - Main output:
Re-Injection Controller
- Main input:
- Version-specific requirements:
- Type version
3 - Requires LangChain nodes support
- Type version
- Edge cases or potential failure types:
- Model may alter token formatting despite prompt
- If masked text is empty, output utility drops
- Anthropic credential or model access errors possible
- Sub-workflow reference: none
AI Processing Model
- Type and technical role:
@n8n/n8n-nodes-langchain.lmChatAnthropic; Claude chat model backend. - Configuration choices:
- Model:
claude-sonnet-4-5-20250929
- Model:
- Key expressions or variables used:
- None
- Input and output connections:
- Output to
AI Processing (Masked Data)as language model
- Output to
- Version-specific requirements:
- Type version
1.3 - Requires Anthropic API credentials
- Model availability depends on current n8n/Anthropic support
- Type version
- Edge cases or potential failure types:
- Model name may not exist in all environments
- Rate limits, quota errors, auth issues
- Sub-workflow reference: none
AI Output Parser
- Type and technical role:
@n8n/n8n-nodes-langchain.outputParserStructured; validates structured AI output. - Configuration choices:
- Manual schema with fields:
documentTyperequiredsummaryrequired- optional
keyEntities,dates,amounts,processedData
- Manual schema with fields:
- Key expressions or variables used:
- None
- Input and output connections:
- Output parser connection into
AI Processing (Masked Data)
- Output parser connection into
- Version-specific requirements:
- Type version
1.3
- Type version
- Edge cases or potential failure types:
- If model response is not valid against schema, agent step fails
- Numeric coercion in
amounts.amountmay fail if model returns strings
- Sub-workflow reference: none
2.8 Controlled Re-Injection
Overview
This block identifies whether masked tokens in the AI output should be restored, queries the vault, and attempts to replace authorized tokens with original PII values.
Nodes Involved
- Re-Injection Controller
- Retrieve Original Values
- Restore Original PII
Node Details
Re-Injection Controller
- Type and technical role:
n8n-nodes-base.code; analyzes AI output, extracts token references, and prepares re-injection metadata. - Configuration choices:
- Reads
aiOutput || json - Looks for field permissions marked
restoreorunmask - Recursively scans objects for token pattern:
TOKEN_([A-Z_]+)_([a-f0-9-]+)
- Reads
- Key expressions or variables used:
$execution.id
- Input and output connections:
- Input:
AI Processing (Masked Data) - Output:
Retrieve Original Values
- Input:
- Version-specific requirements:
- Type version
2
- Type version
- Edge cases or potential failure types:
- Token pattern does not match earlier token format
<<TYPE_HASH>> fieldPermissionsis never set upstream- Likely finds zero tokens even when AI preserved placeholders
- Token pattern does not match earlier token format
- Sub-workflow reference: none
Retrieve Original Values
- Type and technical role:
n8n-nodes-base.postgres; selects original values from the vault by token. - Configuration choices:
- Operation:
select - Return all:
true - Table from
Workflow Configuration.vaultTable - Where clause:
- column
token - value
={{ $('Re-Injection Controller').item.json.token }}
- column
- Operation:
- Key expressions or variables used:
={{ $('Workflow Configuration').first().json.vaultTable }}={{ $('Re-Injection Controller').item.json.token }}
- Input and output connections:
- Input:
Re-Injection Controller - Output:
Restore Original PII
- Input:
- Version-specific requirements:
- Type version
2.6 - Requires Postgres credentials
- Type version
- Edge cases or potential failure types:
- Re-Injection Controller does not output a top-level
tokenfield - Query likely returns nothing
- If multiple tokens should be fetched, current configuration does not iterate them cleanly
- Re-Injection Controller does not output a top-level
- Sub-workflow reference: none
Restore Original PII
- Type and technical role:
n8n-nodes-base.code; replaces tokens in approved fields with original values from vault data. - Configuration choices:
- Reads first input item as AI output
- Builds token map from
Retrieve Original Values - Replaces tokens matching regex
\[TOKEN_[A-Z0-9]+\] - Restores only if
allowed_for_reinjectionis truthy
- Key expressions or variables used:
$input.first().json$('Retrieve Original Values').all()
- Input and output connections:
- Input:
Retrieve Original Values - Output:
Store Audit Log
- Input:
- Version-specific requirements:
- Type version
2
- Type version
- Edge cases or potential failure types:
- Token regex does not match token format used earlier
- Reads
pii_typeandallowed_for_reinjection, which are not inserted in vault storage - Likely performs no actual restoration
- Sub-workflow reference: none
2.9 Audit Logging
Overview
This block writes compliance-related metadata to Postgres. It is intended to track masking and re-injection events.
Nodes Involved
- Store Audit Log
Node Details
Store Audit Log
- Type and technical role:
n8n-nodes-base.postgres; inserts audit information into Postgres. - Configuration choices:
- Table from
Workflow Configuration.auditTable - Schema:
public - Mapped columns:
actor = "system"timestamp = {{$json.timestamp}}document_id = {{$json.document_id}}token_count = {{$json.token_count}}pii_types_detected = {{$json.pii_types_detected}}ai_access_confirmed = truere_injection_events = {{$json.re_injection_events}}
- Table from
- Key expressions or variables used:
={{ $('Workflow Configuration').first().json.auditTable }}
- Input and output connections:
- Input:
Restore Original PII - Output: none
- Input:
- Version-specific requirements:
- Type version
2.6 - Requires Postgres credentials
- Type version
- Edge cases or potential failure types:
- Upstream node does not currently provide several mapped fields
- Insert may fail if columns do not exist or types differ
- Audit record quality is incomplete unless data mapping is corrected
- Sub-workflow reference: none
3. Summary Table
| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
|---|---|---|---|---|---|
| Document Upload Webhook | n8n-nodes-base.webhook | HTTP entry point for document upload | Workflow Configuration | ## Document Upload A webhook receives uploaded documents. This entry point triggers the workflow and passes the file to the OCR step for text extraction. |
|
| Workflow Configuration | n8n-nodes-base.set | Adds runtime config values and table names | Document Upload Webhook | OCR Extract Text | ## GDPR-Safe AI Document Processing This workflow processes uploaded documents while protecting sensitive personal data. When a PDF is uploaded, OCR extracts the text and multiple detectors identify Personally Identifiable Information (PII) such as emails, phone numbers, ID numbers, and addresses. Detected PII is consolidated and replaced with secure tokens while the original values are stored in a Postgres vault. The AI model only processes the masked version of the document, ensuring sensitive information is never exposed. If required, a controlled re-injection mechanism can restore original values from the vault. All masking, AI access, and restoration events are recorded in an audit log. Setup Configure Postgres credentials. Create pii_vault and pii_audit_log tables. Connect an AI model. Send documents to the webhook. |
| OCR Extract Text | n8n-nodes-base.extractFromFile | Extracts text from uploaded PDF | Workflow Configuration | Email Detector, Phone Detector, ID Number Detector, Address Detector AI | ## OCR Text Extraction Extracts text from uploaded PDF files. |
| Email Detector | n8n-nodes-base.code | Detects email addresses via regex | OCR Extract Text | Merge PII Detections | ## PII Detection Layer Multiple detectors scan the document to identify sensitive information such as emails, phone numbers, ID numbers, and physical addresses. |
| Phone Detector | n8n-nodes-base.code | Detects phone numbers via regex | OCR Extract Text | Merge PII Detections | ## PII Detection Layer Multiple detectors scan the document to identify sensitive information such as emails, phone numbers, ID numbers, and physical addresses. |
| ID Number Detector | n8n-nodes-base.code | Detects ID-like numeric/alphanumeric patterns | OCR Extract Text | Merge PII Detections | ## PII Detection Layer Multiple detectors scan the document to identify sensitive information such as emails, phone numbers, ID numbers, and physical addresses. |
| Address Detector AI | @n8n/n8n-nodes-langchain.agent | Uses AI to detect physical addresses | OCR Extract Text, Ollama Chat Model, Address Output Parser | Merge PII Detections | ## Address Detection (AI) local ollama An AI model analyzes the OCR text to detect physical addresses that are harder to capture with regex patterns. |
| Address Output Parser | @n8n/n8n-nodes-langchain.outputParserStructured | Enforces structured address output schema | Address Detector AI | ## Address Detection (AI) local ollama An AI model analyzes the OCR text to detect physical addresses that are harder to capture with regex patterns. |
|
| Ollama Chat Model | @n8n/n8n-nodes-langchain.lmChatOllama | Local LLM backend for address detection | Address Detector AI | ## Address Detection (AI) local ollama An AI model analyzes the OCR text to detect physical addresses that are harder to capture with regex patterns. |
|
| Merge PII Detections | n8n-nodes-base.merge | Combines detector outputs | Email Detector, Phone Detector, ID Number Detector, Address Detector AI | PII Consolidation & Conflict Resolver | ## Merge Detection Results All detection outputs are merged into a single dataset. |
| PII Consolidation & Conflict Resolver | n8n-nodes-base.code | Resolves overlaps and deduplicates detections | Merge PII Detections | Tokenization & Vault Storage | ## IResolve Overlapping Detections Overlapping or duplicate PII detections are resolved. |
| Tokenization & Vault Storage | n8n-nodes-base.code | Generates placeholder tokens and vault records | PII Consolidation & Conflict Resolver | Store Tokens in Vault | ##Tokenization & Vault Storage Each detected PII value is replaced with a secure token such as: <<EMAIL_AB12>> The original values are stored securely in a Postgres vault table. |
| Store Tokens in Vault | n8n-nodes-base.postgres | Inserts token records into Postgres vault | Tokenization & Vault Storage | Generate Masked Text | ##Tokenization & Vault Storage Each detected PII value is replaced with a secure token such as: <<EMAIL_AB12>> The original values are stored securely in a Postgres vault table. |
| Generate Masked Text | n8n-nodes-base.code | Replaces original PII in text with tokens | Store Tokens in Vault | Masking Success Check | ##Tokenization & Vault Storage Each detected PII value is replaced with a secure token such as: <<EMAIL_AB12>> The original values are stored securely in a Postgres vault table. |
| Masking Success Check | n8n-nodes-base.if | Prevents AI processing if masking failed | Generate Masked Text | AI Processing (Masked Data), Block AI Processing | ## IMasking Safety Check Before AI processing, the workflow verifies that masking was successful. If masking fails, AI processing is blocked to prevent accidental exposure of sensitive information. |
| Block AI Processing | n8n-nodes-base.set | Produces blocked-status payload | Masking Success Check | Send Alert Notification | ## IMasking Safety Check Before AI processing, the workflow verifies that masking was successful. If masking fails, AI processing is blocked to prevent accidental exposure of sensitive information. |
| Send Alert Notification | n8n-nodes-base.httpRequest | Sends external alert on masking failure | Block AI Processing | ## IMasking Safety Check Before AI processing, the workflow verifies that masking was successful. If masking fails, AI processing is blocked to prevent accidental exposure of sensitive information. |
|
| AI Processing (Masked Data) | @n8n/n8n-nodes-langchain.agent | Runs masked document analysis via LLM | Masking Success Check, AI Processing Model, AI Output Parser | Re-Injection Controller | ## AI Processing (Masked Data) The masked document is sent to an AI model for analysis. Since sensitive data is replaced with tokens, the AI can safely summarize or extract structured information. |
| AI Processing Model | @n8n/n8n-nodes-langchain.lmChatAnthropic | Claude model backend | AI Processing (Masked Data) | ## AI Processing (Masked Data) The masked document is sent to an AI model for analysis. Since sensitive data is replaced with tokens, the AI can safely summarize or extract structured information. |
|
| AI Output Parser | @n8n/n8n-nodes-langchain.outputParserStructured | Enforces structured AI extraction schema | AI Processing (Masked Data) | ## AI Processing (Masked Data) The masked document is sent to an AI model for analysis. Since sensitive data is replaced with tokens, the AI can safely summarize or extract structured information. |
|
| Re-Injection Controller | n8n-nodes-base.code | Determines whether tokens should be restored | AI Processing (Masked Data) | Retrieve Original Values | ## PII Re-Injection Controller Analyzes AI output to determine whether specific tokens should be replaced with original values. Restoration follows defined permissions to control where sensitive data can appear. |
| Retrieve Original Values | n8n-nodes-base.postgres | Queries vault for original token values | Re-Injection Controller | Restore Original PII | ## Restore Original Values Original PII values are retrieved from the vault and restored only in approved fields. This ensures controlled access to sensitive data. |
| Restore Original PII | n8n-nodes-base.code | Replaces approved tokens with original values | Retrieve Original Values | Store Audit Log | ## Restore Original Values Original PII values are retrieved from the vault and restored only in approved fields. This ensures controlled access to sensitive data. |
| Store Audit Log | n8n-nodes-base.postgres | Inserts compliance record into audit table | Restore Original PII | ## Compliance Audit Log All detection, masking, AI access, and restoration events are recorded in a Postgres audit table. This provides traceability and supports privacy compliance requirements. |
|
| Sticky Note | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note1 | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note2 | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note3 | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note4 | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note5 | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note8 | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note9 | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note10 | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note11 | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note12 | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note13 | n8n-nodes-base.stickyNote | Documentation/annotation | |||
| Sticky Note14 | n8n-nodes-base.stickyNote | Documentation/annotation |
4. Reproducing the Workflow from Scratch
Below is a practical rebuild sequence. It follows the current workflow closely, while also pointing out places where the original JSON is internally inconsistent and should be corrected during recreation.
Prerequisites
- Prepare an n8n instance with:
- Postgres access
- Anthropic credentials
- Ollama available for local address detection
- LangChain nodes enabled
- Create two Postgres tables:
pii_vaultpii_audit_log
Suggested pii_vault columns
tokentextoriginal_valuetexttypetextdocument_idtextcreated_attimestamp- optionally
allowed_for_reinjectionboolean - optionally
pii_typetext
Suggested pii_audit_log columns
actortexttimestamptimestampdocument_idtexttoken_countintegerpii_types_detectedjsonb or textai_access_confirmedbooleanre_injection_eventsjsonb or text
Build Steps
-
Create a Webhook node
- Name:
Document Upload Webhook - Type: Webhook
- Method:
POST - Path:
gdpr-document-upload - Response mode:
Last Node - Enable raw body
- Ensure your upload format provides a binary PDF consumable by the next node
- Name:
-
Create a Set node
- Name:
Workflow Configuration - Add fields:
documentIdas expression{{$now.toISO()}}confidenceThresholdas number0.8vaultTableas stringpii_vaultauditTableas stringpii_audit_log
- Keep incoming fields
- Name:
-
Connect
Document Upload Webhook→Workflow Configuration -
Create an Extract From File node
- Name:
OCR Extract Text - Operation:
PDF - Keep source:
both - Confirm it reads the uploaded PDF binary
- Name:
-
Connect
Workflow Configuration→OCR Extract Text
Build the PII detectors
-
Create a Code node
- Name:
Email Detector - Paste the email detection script
- Recommended correction: standardize output to
detectionsand fieldsstart/endrather thanstart_pos/end_pos
- Name:
-
Create a Code node
- Name:
Phone Detector - Paste the phone detection script
- Recommended correction: change output property from
detected_piitodetections - Also normalize
start_pos/end_postostart/end
- Name:
-
Create a Code node
- Name:
ID Number Detector - Paste the ID detection script
- Recommended correction: normalize
start_pos/end_postostart/end
- Name:
-
Create an Ollama Chat Model node
- Name:
Ollama Chat Model - Configure the local Ollama endpoint and model
- Pick an instruction-following model appropriate for extraction
- Name:
-
Create a Structured Output Parser node
- Name:
Address Output Parser - Use a manual schema with an array of address objects
- Recommended correction: return under
detectionsinstead ofaddresses, or add a normalization step later
- Name:
-
Create an AI Agent node
- Name:
Address Detector AI - Input text:
{{$json.text}} - Enable output parser
- System message: instruct model to detect physical addresses with positions and confidence
- Attach:
Ollama Chat Modelas language modelAddress Output Parseras parser
- Name:
-
Connect
OCR Extract Textto all four detector branchesOCR Extract Text→Email DetectorOCR Extract Text→Phone DetectorOCR Extract Text→ID Number DetectorOCR Extract Text→Address Detector AI
-
Connect
Ollama Chat Model→Address Detector AI -
Connect
Address Output Parser→Address Detector AIvia parser connection
Merge and consolidate detections
-
Create a Merge node
- Name:
Merge PII Detections - Mode:
Combine - Combine by:
Combine All
- Name:
-
Connect detector outputs into
Merge PII Detections- Phone
- ID number
- Address AI
-
Create a Code node
- Name:
PII Consolidation & Conflict Resolver - Paste the consolidation script
- Required correction:
- make it read all supported formats
- map
start_pos/end_postostart/end - include address outputs
- preserve
originalText
- Name:
- Best target output:
consolidatedPIIoriginalTextdocumentId
- Connect
Merge PII Detections→PII Consolidation & Conflict Resolver
Tokenize and store vault entries
- Create a Code node
- Name:
Tokenization & Vault Storage - Paste the tokenization script
- Required correction:
- make it read
piiMapor rename prior output toconsolidatedPII - output one item per vault record if you want direct Postgres inserts
- make it read
- Name:
- Recommended token format: keep
<<TYPE_HASH>>
-
Connect
PII Consolidation & Conflict Resolver→Tokenization & Vault Storage -
Create a Postgres node
- Name:
Store Tokens in Vault - Credentials: your Postgres credential
- Operation: insert/upsert according to your schema
- Schema:
public - Table:
{{$('Workflow Configuration').first().json.vaultTable}} - Map columns:
token→ token from tokenization nodeoriginal_value→ original detected valuetype→ PII typedocument_id→ document IDcreated_at→ created timestamp
- Required correction:
- replace hardcoded
"YOUR_CREDENTIAL_HERE"with the actual token expression
- replace hardcoded
- Name:
-
Connect
Tokenization & Vault Storage→Store Tokens in Vault
Generate masked text and enforce safety
-
Create a Code node
- Name:
Generate Masked Text - Paste the masking script
- Ensure the Postgres node returns inserted rows with
tokenandoriginal_value, or read from the tokenization node directly instead
- Name:
-
Connect
Store Tokens in Vault→Generate Masked Text -
Create an IF node
- Name:
Masking Success Check - Condition:
- left value:
{{$('Generate Masked Text').item.json.masking_success}} - operation: equals
- right value:
true
- left value:
- Name:
-
Connect
Generate Masked Text→Masking Success Check -
Create a Set node
- Name:
Block AI Processing - Add:
error = "Masking failed - AI processing blocked"status = "BLOCKED"requires_manual_review = true
- Recommended addition:
- pass through
documentIdand detailed failure reason
- pass through
- Name:
-
Create an HTTP Request node
- Name:
Send Alert Notification - Method:
POST - Replace placeholder URL with your incident endpoint
- Send body:
error_detailsdocument_idtimestamp = {{$now.toISO()}}
- Recommended correction:
- point body expressions to actual fields present from previous node
- Name:
-
Connect
Masking Success Checkfalse branch →Block AI Processing -
Connect
Block AI Processing→Send Alert Notification
Masked AI processing with Claude
-
Create an Anthropic Chat Model node
- Name:
AI Processing Model - Credentials: Anthropic
- Model:
claude-sonnet-4-5-20250929 - If unavailable, choose a currently supported Claude model in your environment
- Name:
-
Create a Structured Output Parser node
- Name:
AI Output Parser - Configure manual schema with:
documentTypesummary- optional
keyEntities - optional
dates - optional
amounts - optional
processedData
- Name:
-
Create an AI Agent node
- Name:
AI Processing (Masked Data) - Text input:
{{$json.masked_text}} - Enable output parser
- Add system prompt instructing the model to preserve tokens exactly
- Attach
AI Processing Model - Attach
AI Output Parser
- Name:
-
Connect
Masking Success Checktrue branch →AI Processing (Masked Data) -
Connect
AI Processing Model→AI Processing (Masked Data) -
Connect
AI Output Parser→AI Processing (Masked Data)
Controlled re-injection
-
Create a Code node
- Name:
Re-Injection Controller - Paste the provided script
- Required correction:
- token regex must match the token format you actually use
- if using
<<EMAIL_AB12>>, update matching accordingly - define or inject
fieldPermissionsif re-injection policy is required
- Name:
-
Connect
AI Processing (Masked Data)→Re-Injection Controller -
Create a Postgres node
- Name:
Retrieve Original Values - Operation:
Select - Schema:
public - Table:
{{$('Workflow Configuration').first().json.vaultTable}} - Query by
token - Required correction:
- iterate over extracted tokens, not
$('Re-Injection Controller').item.json.tokenunless that field is explicitly created
- iterate over extracted tokens, not
- Name:
-
Connect
Re-Injection Controller→Retrieve Original Values -
Create a Code node
- Name:
Restore Original PII - Paste the restore script
- Required correction:
- token regex must match actual token syntax
- if approval flags are needed, store
allowed_for_reinjectionin vault or policy config - ensure the node receives both AI output and vault query results in a compatible structure
- Name:
-
Connect
Retrieve Original Values→Restore Original PII
Audit logging
-
Create a Postgres node
- Name:
Store Audit Log - Credentials: Postgres
- Schema:
public - Table:
{{$('Workflow Configuration').first().json.auditTable}} - Map:
actor = "system"timestampdocument_idtoken_countpii_types_detectedai_access_confirmed = truere_injection_events
- Required correction:
- ensure prior node outputs these fields or derive them before insertion
- Name:
-
Connect
Restore Original PII→Store Audit Log
Add the documentation notes
- Create Sticky Notes matching the original sections:
- GDPR-Safe AI Document Processing
- Document Upload
- OCR Text Extraction
- PII Detection Layer
- Address Detection (AI) local ollama
- Merge Detection Results
- IResolve Overlapping Detections
- Tokenization & Vault Storage
- IMasking Safety Check
- AI Processing (Masked Data)
- PII Re-Injection Controller
- Restore Original Values
- Compliance Audit Log
Important implementation corrections
To make this workflow actually operate end-to-end, apply these fixes during reproduction:
-
Normalize all detector outputs
- Use the same output key, preferably
detections - Use the same positional fields, preferably
startandend
- Use the same output key, preferably
-
Align consolidation output with tokenization input
- Either:
- change consolidation output to
consolidatedPII
- change consolidation output to
- or:
- change tokenization node to read
piiMap
- change tokenization node to read
- Either:
-
Fix vault insertion
- Map
tokento the actual token value - Ensure one database row is produced per vault record
- Map
-
Unify token format everywhere
- Current workflow uses conflicting formats:
<<TYPE_HASH>>TOKEN_TYPE_ID[TOKEN_X]
- Pick one and update:
- token generation
- AI prompt examples
- re-injection token extraction
- restore regex
- Postgres lookup
- Current workflow uses conflicting formats:
-
Provide field-level re-injection policy
fieldPermissionsis referenced but never defined- Add a Set or Code node before re-injection if this policy matters
-
Ensure audit fields exist
- Add a transformation node before
Store Audit Logif needed
- Add a transformation node before
5. General Notes & Resources
| Note Content | Context or Link |
|---|---|
| GDPR-Safe AI Document Processing: This workflow processes uploaded documents while protecting sensitive personal data. When a PDF is uploaded, OCR extracts the text and multiple detectors identify Personally Identifiable Information (PII) such as emails, phone numbers, ID numbers, and addresses. Detected PII is consolidated and replaced with secure tokens while the original values are stored in a Postgres vault. The AI model only processes the masked version of the document, ensuring sensitive information is never exposed. If required, a controlled re-injection mechanism can restore original values from the vault. All masking, AI access, and restoration events are recorded in an audit log. | Workflow purpose |
Setup: Configure Postgres credentials. Create pii_vault and pii_audit_log tables. Connect an AI model. Send documents to the webhook. |
Deployment/setup |
| Address Detection (AI) local ollama: An AI model analyzes the OCR text to detect physical addresses that are harder to capture with regex patterns. | Local Ollama integration |
Example token shown in note: <<EMAIL_AB12>> |
Tokenization convention shown in design notes |
The workflow contains no sub-workflow nodes and only one explicit entry point: Document Upload Webhook. |
Architecture note |
| The current JSON is conceptually strong but operationally inconsistent; normalization of detector outputs and token formats is required for reliable production use. | Implementation note |