mirror of
https://github.com/khoaliber/n8nworkflows.xyz.git
synced 2026-04-24 18:59:15 +00:00
352 lines
24 KiB
Markdown
352 lines
24 KiB
Markdown
Manipulate PDF with Adobe developer API
|
||
|
||
https://n8nworkflows.xyz/workflows/manipulate-pdf-with-adobe-developer-api-2424
|
||
|
||
|
||
# Manipulate PDF with Adobe developer API
|
||
|
||
### 1. Workflow Overview
|
||
|
||
This workflow serves as a comprehensive, generic wrapper to interact with the Adobe PDF Services API, enabling various PDF manipulations such as splitting, combining, OCR, page operations, and content extraction. It encapsulates the multi-step Adobe API process, which includes authentication, asset registration, file upload, query execution, result polling, and downloading the transformed PDF or related data.
|
||
|
||
**Target Use Cases:**
|
||
- Automating PDF transformations using Adobe’s API from other workflows or manual triggers.
|
||
- Extracting clean PDF content for AI and Retrieval-Augmented Generation (RAG) systems, e.g., extracting tables as images for improved AI recognition.
|
||
- Developers needing a reusable, modular integration with Adobe PDF Services.
|
||
|
||
**Logical Blocks:**
|
||
- **1.1 Input Reception & Setup:** Triggering the workflow and preparing input data and query parameters.
|
||
- **1.2 Authentication:** Obtaining a temporary access token from Adobe for API calls.
|
||
- **1.3 Asset Registration & Upload:** Registering a new PDF asset and uploading the file to Adobe’s cloud storage.
|
||
- **1.4 Query Processing:** Sending the transformation request to Adobe and managing asynchronous processing.
|
||
- **1.5 Result Handling:** Polling for completion, downloading the output, and forwarding results back.
|
||
|
||
---
|
||
|
||
### 2. Block-by-Block Analysis
|
||
|
||
#### 1.1 Input Reception & Setup
|
||
**Overview:**
|
||
This block receives the initial trigger to start the workflow and prepares the Adobe API query parameters alongside input PDF data.
|
||
|
||
**Nodes Involved:**
|
||
- When clicking ‘Test workflow’ (Manual Trigger)
|
||
- Load a test pdf file (Dropbox node)
|
||
- Adobe API Query (Set node)
|
||
- Query + File (Merge node)
|
||
|
||
**Node Details:**
|
||
- **When clicking ‘Test workflow’**
|
||
- Type: Manual Trigger
|
||
- Role: Entry point for manual testing.
|
||
- Inputs: None
|
||
- Outputs: Triggers the "Load a test pdf file" and "Adobe API Query" nodes.
|
||
- Failure modes: None significant; manual trigger.
|
||
|
||
- **Load a test pdf file**
|
||
- Type: Dropbox (Download operation)
|
||
- Role: Downloads a test PDF file from Dropbox for processing.
|
||
- Configuration: Path set to a specific PDF in Dropbox, OAuth2 authentication used.
|
||
- Inputs: Triggered by manual trigger node.
|
||
- Outputs: Binary PDF data.
|
||
- Failure modes: OAuth2 token expiry, file not found, network errors.
|
||
|
||
- **Adobe API Query**
|
||
- Type: Set node
|
||
- Role: Prepares the JSON payload and the target endpoint for the Adobe API request.
|
||
- Configuration:
|
||
- `endpoint`: set to `"extractpdf"` (example use case).
|
||
- `json_payload`: JSON object specifying what to extract, here “tables” and “text”.
|
||
- Inputs: None directly (manual trigger)
|
||
- Outputs: Passes JSON data downstream.
|
||
- Failure modes: Expression evaluation failure if JSON malformed.
|
||
|
||
- **Query + File**
|
||
- Type: Merge (combine by position)
|
||
- Role: Combines the query parameters and the PDF file binary data into one data object for subsequent processing.
|
||
- Inputs: From "Adobe API Query" and "Load a test pdf file" nodes.
|
||
- Outputs: Merged JSON + binary object.
|
||
- Failure modes: Mismatched input positions or missing inputs.
|
||
|
||
---
|
||
|
||
#### 1.2 Authentication
|
||
**Overview:**
|
||
Obtains an OAuth access token from Adobe necessary for all subsequent API calls.
|
||
|
||
**Nodes Involved:**
|
||
- Authenticartion (get token) (HTTP Request)
|
||
|
||
**Node Details:**
|
||
- **Authenticartion (get token)**
|
||
- Type: HTTP Request (POST)
|
||
- Role: Exchanges client credentials for a temporary access token.
|
||
- Configuration:
|
||
- URL: `https://pdf-services.adobe.io/token`
|
||
- Content-Type: `application/x-www-form-urlencoded`
|
||
- Uses a custom HTTP credential containing client_id and client_secret in the body.
|
||
- Inputs: Triggered after input preparation.
|
||
- Outputs: JSON object containing `access_token`.
|
||
- Failure modes: Invalid credentials, network errors, token expiration.
|
||
- Credential note: Requires a "Custom Auth" credential with body parameters client_id and client_secret.
|
||
|
||
---
|
||
|
||
#### 1.3 Asset Registration & Upload
|
||
**Overview:**
|
||
Registers a new asset to host the PDF on Adobe servers and uploads the PDF binary data to this asset.
|
||
|
||
**Nodes Involved:**
|
||
- Create Asset (HTTP Request)
|
||
- Query + File + Asset information (Merge node)
|
||
- Upload PDF File (asset) (HTTP Request)
|
||
|
||
**Node Details:**
|
||
- **Create Asset**
|
||
- Type: HTTP Request (POST)
|
||
- Role: Registers a new asset with Adobe by sending a POST request with mediaType "application/pdf".
|
||
- Configuration:
|
||
- URL: `https://pdf-services.adobe.io/assets`
|
||
- Auth: Header with `Authorization: Bearer {{access_token}}`
|
||
- Sends JSON body `{ "mediaType": "application/pdf" }`
|
||
- Inputs: Access token from Authentication node.
|
||
- Outputs: JSON response containing asset metadata including assetID and uploadUri.
|
||
- Failure modes: Auth errors, rate limits, malformed requests.
|
||
|
||
- **Query + File + Asset information**
|
||
- Type: Merge (combine by position)
|
||
- Role: Combines the original query + file data with the newly created asset info.
|
||
- Inputs: From "Query + File" and "Create Asset" nodes.
|
||
- Outputs: Merged object including assetID and uploadUri for upload.
|
||
- Failure modes: Input mismatch, missing asset info.
|
||
|
||
- **Upload PDF File (asset)**
|
||
- Type: HTTP Request (PUT)
|
||
- Role: Uploads the PDF binary data to the asset’s upload URI.
|
||
- Configuration:
|
||
- URL: dynamic, based on `uploadUri` from asset registration.
|
||
- Content-Type: binaryData
|
||
- Input data field: binary PDF file data from merged input.
|
||
- Inputs: Merged data containing uploadUri and binary file.
|
||
- Outputs: Confirmation of upload success.
|
||
- Failure modes: Upload failures, network issues, invalid uploadUri.
|
||
|
||
---
|
||
|
||
#### 1.4 Query Processing
|
||
**Overview:**
|
||
Submits the requested PDF transformation query to Adobe, waits for processing, and polls for completion.
|
||
|
||
**Nodes Involved:**
|
||
- Process Query (HTTP Request)
|
||
- Wait 5 second (Wait node)
|
||
- Try to download the result (HTTP Request)
|
||
- Switch (Switch node)
|
||
|
||
**Node Details:**
|
||
- **Process Query**
|
||
- Type: HTTP Request (POST)
|
||
- Role: Calls the Adobe API operation endpoint for the requested transformation.
|
||
- Configuration:
|
||
- URL: `https://pdf-services.adobe.io/operation/{{endpoint}}` where `endpoint` is dynamic.
|
||
- Body: JSON combining `assetID` and the original `json_payload`.
|
||
- Auth: Bearer token header.
|
||
- Specifies full HTTP response to get headers (used for polling).
|
||
- Inputs: Merged query + file + asset data.
|
||
- Outputs: HTTP response with 'location' header for polling result URL.
|
||
- Failure modes: Auth errors, invalid payload, server errors.
|
||
|
||
- **Wait 5 second**
|
||
- Type: Wait node
|
||
- Role: Delay between polling attempts to avoid overloading Adobe.
|
||
- Configuration: 5 seconds delay.
|
||
- Inputs: Chain from Process Query or Switch node.
|
||
- Outputs: Triggers next polling attempt.
|
||
- Failure modes: None likely.
|
||
|
||
- **Try to download the result**
|
||
- Type: HTTP Request (GET)
|
||
- Role: Polls the URL in 'location' header for processing status or final result.
|
||
- Configuration:
|
||
- URL: dynamic from `Process Query` response header location.
|
||
- Auth: Bearer token header.
|
||
- Inputs: After wait node.
|
||
- Outputs: JSON containing status and possibly download URL.
|
||
- Failure modes: 404 if not ready, network errors, token expiry.
|
||
|
||
- **Switch**
|
||
- Type: Switch node
|
||
- Role: Routes output based on status field (`in progress`, `failed`, or else).
|
||
- Configuration: Checks JSON `status` field to determine next step.
|
||
- Inputs: From download attempt node.
|
||
- Outputs:
|
||
- If "in progress": loops back to Wait 5 second (poll again).
|
||
- If "failed" or other: forwards response to origin workflow (end).
|
||
- Failure modes: Missing or unexpected status fields.
|
||
|
||
---
|
||
|
||
#### 1.5 Result Handling
|
||
**Overview:**
|
||
Forwards the final output or error back to the calling workflow or user.
|
||
|
||
**Nodes Involved:**
|
||
- Forward response to origin workflow (Set node)
|
||
|
||
**Node Details:**
|
||
- **Forward response to origin workflow**
|
||
- Type: Set node
|
||
- Role: Passes through the final JSON data from Adobe API to the outer workflow or caller.
|
||
- Configuration: Includes all fields as is without modification.
|
||
- Inputs: From Switch node on failure or success completion.
|
||
- Outputs: Final output of the workflow.
|
||
- Failure modes: None significant; pass-through node.
|
||
|
||
---
|
||
|
||
### 3. Summary Table
|
||
|
||
| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
|
||
|------------------------------|---------------------|-------------------------------------|-----------------------------------------|----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|
|
||
| When clicking ‘Test workflow’ | Manual Trigger | Manual workflow start trigger | — | Load a test pdf file, Adobe API Query | |
|
||
| Load a test pdf file | Dropbox | Downloads test PDF from Dropbox | When clicking ‘Test workflow’ | Query + File | |
|
||
| Adobe API Query | Set | Prepares endpoint and payload JSON | When clicking ‘Test workflow’ | Query + File | |
|
||
| Query + File | Merge | Merges query and PDF file data | Adobe API Query, Load a test pdf file | Authenticartion (get token), Query + File + Asset information | |
|
||
| Authenticartion (get token) | HTTP Request | Retrieves Adobe API access token | Query + File, Execute Workflow Trigger | Create Asset | See Sticky Note3: Custom Auth credential with client_id and client_secret in body. |
|
||
| Create Asset | HTTP Request | Registers new PDF asset on Adobe | Authenticartion (get token) | Query + File + Asset information | |
|
||
| Query + File + Asset information | Merge | Combines query, file, and asset info | Query + File, Create Asset | Upload PDF File (asset) | |
|
||
| Upload PDF File (asset) | HTTP Request | Uploads PDF binary to Adobe asset | Query + File + Asset information | Process Query | |
|
||
| Process Query | HTTP Request | Sends transformation query to Adobe | Upload PDF File (asset) | Wait 5 second | |
|
||
| Wait 5 second | Wait | Delays for 5 seconds between polls | Process Query, Switch | Try to download the result | Sticky Note2: "Wait for file do be processed" |
|
||
| Try to download the result | HTTP Request | Polls for query result or status | Wait 5 second | Switch | |
|
||
| Switch | Switch | Routes based on processing status | Try to download the result | Wait 5 second (if in progress), Forward response to origin workflow (if failed or complete) | |
|
||
| Forward response to origin workflow | Set | Passes final result to caller | Switch | — | |
|
||
| Execute Workflow Trigger | Execute Workflow Trigger | Allows external workflows to call this wrapper | — | Authenticartion (get token), Query + File + Asset information | |
|
||
| Sticky Note | Sticky Note | Documentation and guidance | — | — | See section 5 for full sticky note contents |
|
||
| Sticky Note1 | Sticky Note | Development testing note | — | — | |
|
||
| Sticky Note2 | Sticky Note | Explains wait node | — | — | "Wait for file do be processed" |
|
||
| Sticky Note3 | Sticky Note | Details credential for token request| — | — | Custom Auth credential example with client_id and client_secret body |
|
||
| Sticky Note4 | Sticky Note | Details credential for API queries | — | — | Header Auth credential example with X-API-Key |
|
||
| Sticky Note5 | Sticky Note | Explains workflow input format | — | — | Examples of endpoint and json_payload formats |
|
||
|
||
---
|
||
|
||
### 4. Reproducing the Workflow from Scratch
|
||
|
||
1. **Create Manual Trigger Node:**
|
||
- Node type: Manual Trigger
|
||
- Purpose: Start workflow manually for testing.
|
||
|
||
2. **Add Dropbox node to download test PDF:**
|
||
- Node type: Dropbox
|
||
- Operation: Download
|
||
- Path: Set to desired PDF file path in Dropbox.
|
||
- Authentication: OAuth2 with Dropbox credentials.
|
||
|
||
3. **Add Set node to prepare Adobe API query:**
|
||
- Node type: Set
|
||
- Fields:
|
||
- `endpoint`: string, e.g., `"extractpdf"`
|
||
- `json_payload`: JSON object with desired extraction params, e.g., `{ "renditionsToExtract": ["tables"], "elementsToExtract": ["text","tables"] }`
|
||
|
||
4. **Merge node to combine query and PDF file:**
|
||
- Node type: Merge
|
||
- Mode: Combine by position (mergeByPosition)
|
||
- Inputs: Connect outputs of Set node and Dropbox node.
|
||
|
||
5. **Add HTTP Request node for Adobe Authentication:**
|
||
- Node type: HTTP Request
|
||
- Method: POST
|
||
- URL: `https://pdf-services.adobe.io/token`
|
||
- Content-Type: `application/x-www-form-urlencoded`
|
||
- Authentication: Use a **Custom Auth** credential configured as:
|
||
```json
|
||
{
|
||
"headers": {
|
||
"Content-Type": "application/x-www-form-urlencoded"
|
||
},
|
||
"body": {
|
||
"client_id": "YOUR_CLIENT_ID",
|
||
"client_secret": "YOUR_CLIENT_SECRET"
|
||
}
|
||
}
|
||
```
|
||
- Input: Connect from Merge node.
|
||
|
||
6. **Add HTTP Request node to Create Asset:**
|
||
- Node type: HTTP Request
|
||
- Method: POST
|
||
- URL: `https://pdf-services.adobe.io/assets`
|
||
- Authentication: Use a **Header Auth** credential with header `Authorization: Bearer {{access_token}}` (access_token from previous node).
|
||
- Body: JSON `{ "mediaType": "application/pdf" }`
|
||
- Input: Connect from Authentication node.
|
||
|
||
7. **Add Merge node to combine query+file with asset info:**
|
||
- Node type: Merge
|
||
- Mode: Combine by position
|
||
- Inputs: From "Query + File" merge node and "Create Asset" node.
|
||
|
||
8. **Add HTTP Request node to Upload PDF File:**
|
||
- Node type: HTTP Request
|
||
- Method: PUT
|
||
- URL: Dynamic from asset info `uploadUri`
|
||
- Content-Type: binaryData
|
||
- Input Data Field: Set to binary PDF file data
|
||
- Input: Connect from previous Merge node.
|
||
|
||
9. **Add HTTP Request node to Process Query:**
|
||
- Node type: HTTP Request
|
||
- Method: POST
|
||
- URL: `https://pdf-services.adobe.io/operation/{{endpoint}}` (endpoint from merged data)
|
||
- Authentication: Header with Bearer token.
|
||
- Body: JSON merging `assetID` and `json_payload`.
|
||
- Input: Connect from Upload PDF node.
|
||
|
||
10. **Add Wait node:**
|
||
- Node type: Wait
|
||
- Duration: 5 seconds
|
||
- Input: Connect from Process Query node.
|
||
|
||
11. **Add HTTP Request node to Try Download Result:**
|
||
- Node type: HTTP Request
|
||
- Method: GET
|
||
- URL: From `location` header of Process Query response
|
||
- Authentication: Bearer token header
|
||
- Input: Connect from Wait node.
|
||
|
||
12. **Add Switch node:**
|
||
- Node type: Switch
|
||
- Check JSON field `status`
|
||
- Condition branches:
|
||
- If `in progress`: Connect back to Wait node (loop).
|
||
- If `failed` or others: Forward output to next node.
|
||
|
||
13. **Add Set node to Forward response:**
|
||
- Node type: Set
|
||
- Purpose: Pass final output downstream or back to calling workflow.
|
||
- Input: Connect from Switch node.
|
||
|
||
14. **Optional:** Add Execute Workflow Trigger node if you want this workflow callable from others.
|
||
|
||
15. **Create Credentials:**
|
||
- Custom Auth credential for token request with client_id and client_secret in body.
|
||
- Header Auth credential for other API calls with `X-API-Key` header (same as client_id).
|
||
|
||
---
|
||
|
||
### 5. General Notes & Resources
|
||
|
||
| Note Content | Context or Link |
|
||
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||
| Adobe API Wrapper workflow documentation and official API docs: https://developer.adobe.com/document-services/docs/overview/pdf-services-api/howtos/ and https://developer.adobe.com/document-services/docs/overview/pdf-extract-api/gettingstarted/ | Sticky Note at workflow start |
|
||
| Credential setup for token request requires creating a Custom Auth credential with client_id and client_secret in the body, Content-Type `application/x-www-form-urlencoded`. | Sticky Note3 |
|
||
| Credential setup for all other Adobe API queries requires Header Auth credential with `X-API-Key` header set to client_id. | Sticky Note4 |
|
||
| Workflow input expects an object with `endpoint` string, `json_payload` object (excluding assetID), and PDF binary data as input. Examples provided for `splitpdf` and `extractpdf` endpoints. | Sticky Note5 |
|
||
| Typical use case: extracting tables as images from PDFs to forward to AI systems for improved recognition accuracy. | Workflow overview section |
|
||
| The workflow is designed to be used as a sub-workflow via Execute Workflow node or manually triggered for testing. | Node "Execute Workflow Trigger" presence |
|
||
| Rate limits: Adobe free tier allows up to 500 PDF operations per month. | Workflow description |
|
||
| Polling delay is set to 5 seconds to avoid overwhelming Adobe API during processing. | Sticky Note2 |
|
||
|
||
---
|
||
|
||
This document fully describes the workflow’s logic, node configuration, and integration points to enable seamless reproduction, modification, and error anticipation. |