24 KiB
Manipulate PDF with Adobe developer API
https://n8nworkflows.xyz/workflows/manipulate-pdf-with-adobe-developer-api-2424
Manipulate PDF with Adobe developer API
1. Workflow Overview
This workflow serves as a comprehensive, generic wrapper to interact with the Adobe PDF Services API, enabling various PDF manipulations such as splitting, combining, OCR, page operations, and content extraction. It encapsulates the multi-step Adobe API process, which includes authentication, asset registration, file upload, query execution, result polling, and downloading the transformed PDF or related data.
Target Use Cases:
- Automating PDF transformations using Adobe’s API from other workflows or manual triggers.
- Extracting clean PDF content for AI and Retrieval-Augmented Generation (RAG) systems, e.g., extracting tables as images for improved AI recognition.
- Developers needing a reusable, modular integration with Adobe PDF Services.
Logical Blocks:
- 1.1 Input Reception & Setup: Triggering the workflow and preparing input data and query parameters.
- 1.2 Authentication: Obtaining a temporary access token from Adobe for API calls.
- 1.3 Asset Registration & Upload: Registering a new PDF asset and uploading the file to Adobe’s cloud storage.
- 1.4 Query Processing: Sending the transformation request to Adobe and managing asynchronous processing.
- 1.5 Result Handling: Polling for completion, downloading the output, and forwarding results back.
2. Block-by-Block Analysis
1.1 Input Reception & Setup
Overview:
This block receives the initial trigger to start the workflow and prepares the Adobe API query parameters alongside input PDF data.
Nodes Involved:
- When clicking ‘Test workflow’ (Manual Trigger)
- Load a test pdf file (Dropbox node)
- Adobe API Query (Set node)
- Query + File (Merge node)
Node Details:
-
When clicking ‘Test workflow’
- Type: Manual Trigger
- Role: Entry point for manual testing.
- Inputs: None
- Outputs: Triggers the "Load a test pdf file" and "Adobe API Query" nodes.
- Failure modes: None significant; manual trigger.
-
Load a test pdf file
- Type: Dropbox (Download operation)
- Role: Downloads a test PDF file from Dropbox for processing.
- Configuration: Path set to a specific PDF in Dropbox, OAuth2 authentication used.
- Inputs: Triggered by manual trigger node.
- Outputs: Binary PDF data.
- Failure modes: OAuth2 token expiry, file not found, network errors.
-
Adobe API Query
- Type: Set node
- Role: Prepares the JSON payload and the target endpoint for the Adobe API request.
- Configuration:
endpoint: set to"extractpdf"(example use case).json_payload: JSON object specifying what to extract, here “tables” and “text”.
- Inputs: None directly (manual trigger)
- Outputs: Passes JSON data downstream.
- Failure modes: Expression evaluation failure if JSON malformed.
-
Query + File
- Type: Merge (combine by position)
- Role: Combines the query parameters and the PDF file binary data into one data object for subsequent processing.
- Inputs: From "Adobe API Query" and "Load a test pdf file" nodes.
- Outputs: Merged JSON + binary object.
- Failure modes: Mismatched input positions or missing inputs.
1.2 Authentication
Overview:
Obtains an OAuth access token from Adobe necessary for all subsequent API calls.
Nodes Involved:
- Authenticartion (get token) (HTTP Request)
Node Details:
- Authenticartion (get token)
- Type: HTTP Request (POST)
- Role: Exchanges client credentials for a temporary access token.
- Configuration:
- URL:
https://pdf-services.adobe.io/token - Content-Type:
application/x-www-form-urlencoded - Uses a custom HTTP credential containing client_id and client_secret in the body.
- URL:
- Inputs: Triggered after input preparation.
- Outputs: JSON object containing
access_token. - Failure modes: Invalid credentials, network errors, token expiration.
- Credential note: Requires a "Custom Auth" credential with body parameters client_id and client_secret.
1.3 Asset Registration & Upload
Overview:
Registers a new asset to host the PDF on Adobe servers and uploads the PDF binary data to this asset.
Nodes Involved:
- Create Asset (HTTP Request)
- Query + File + Asset information (Merge node)
- Upload PDF File (asset) (HTTP Request)
Node Details:
-
Create Asset
- Type: HTTP Request (POST)
- Role: Registers a new asset with Adobe by sending a POST request with mediaType "application/pdf".
- Configuration:
- URL:
https://pdf-services.adobe.io/assets - Auth: Header with
Authorization: Bearer {{access_token}} - Sends JSON body
{ "mediaType": "application/pdf" }
- URL:
- Inputs: Access token from Authentication node.
- Outputs: JSON response containing asset metadata including assetID and uploadUri.
- Failure modes: Auth errors, rate limits, malformed requests.
-
Query + File + Asset information
- Type: Merge (combine by position)
- Role: Combines the original query + file data with the newly created asset info.
- Inputs: From "Query + File" and "Create Asset" nodes.
- Outputs: Merged object including assetID and uploadUri for upload.
- Failure modes: Input mismatch, missing asset info.
-
Upload PDF File (asset)
- Type: HTTP Request (PUT)
- Role: Uploads the PDF binary data to the asset’s upload URI.
- Configuration:
- URL: dynamic, based on
uploadUrifrom asset registration. - Content-Type: binaryData
- Input data field: binary PDF file data from merged input.
- URL: dynamic, based on
- Inputs: Merged data containing uploadUri and binary file.
- Outputs: Confirmation of upload success.
- Failure modes: Upload failures, network issues, invalid uploadUri.
1.4 Query Processing
Overview:
Submits the requested PDF transformation query to Adobe, waits for processing, and polls for completion.
Nodes Involved:
- Process Query (HTTP Request)
- Wait 5 second (Wait node)
- Try to download the result (HTTP Request)
- Switch (Switch node)
Node Details:
-
Process Query
- Type: HTTP Request (POST)
- Role: Calls the Adobe API operation endpoint for the requested transformation.
- Configuration:
- URL:
https://pdf-services.adobe.io/operation/{{endpoint}}whereendpointis dynamic. - Body: JSON combining
assetIDand the originaljson_payload. - Auth: Bearer token header.
- Specifies full HTTP response to get headers (used for polling).
- URL:
- Inputs: Merged query + file + asset data.
- Outputs: HTTP response with 'location' header for polling result URL.
- Failure modes: Auth errors, invalid payload, server errors.
-
Wait 5 second
- Type: Wait node
- Role: Delay between polling attempts to avoid overloading Adobe.
- Configuration: 5 seconds delay.
- Inputs: Chain from Process Query or Switch node.
- Outputs: Triggers next polling attempt.
- Failure modes: None likely.
-
Try to download the result
- Type: HTTP Request (GET)
- Role: Polls the URL in 'location' header for processing status or final result.
- Configuration:
- URL: dynamic from
Process Queryresponse header location. - Auth: Bearer token header.
- URL: dynamic from
- Inputs: After wait node.
- Outputs: JSON containing status and possibly download URL.
- Failure modes: 404 if not ready, network errors, token expiry.
-
Switch
- Type: Switch node
- Role: Routes output based on status field (
in progress,failed, or else). - Configuration: Checks JSON
statusfield to determine next step. - Inputs: From download attempt node.
- Outputs:
- If "in progress": loops back to Wait 5 second (poll again).
- If "failed" or other: forwards response to origin workflow (end).
- Failure modes: Missing or unexpected status fields.
1.5 Result Handling
Overview:
Forwards the final output or error back to the calling workflow or user.
Nodes Involved:
- Forward response to origin workflow (Set node)
Node Details:
- Forward response to origin workflow
- Type: Set node
- Role: Passes through the final JSON data from Adobe API to the outer workflow or caller.
- Configuration: Includes all fields as is without modification.
- Inputs: From Switch node on failure or success completion.
- Outputs: Final output of the workflow.
- Failure modes: None significant; pass-through node.
3. Summary Table
| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
|---|---|---|---|---|---|
| When clicking ‘Test workflow’ | Manual Trigger | Manual workflow start trigger | — | Load a test pdf file, Adobe API Query | |
| Load a test pdf file | Dropbox | Downloads test PDF from Dropbox | When clicking ‘Test workflow’ | Query + File | |
| Adobe API Query | Set | Prepares endpoint and payload JSON | When clicking ‘Test workflow’ | Query + File | |
| Query + File | Merge | Merges query and PDF file data | Adobe API Query, Load a test pdf file | Authenticartion (get token), Query + File + Asset information | |
| Authenticartion (get token) | HTTP Request | Retrieves Adobe API access token | Query + File, Execute Workflow Trigger | Create Asset | See Sticky Note3: Custom Auth credential with client_id and client_secret in body. |
| Create Asset | HTTP Request | Registers new PDF asset on Adobe | Authenticartion (get token) | Query + File + Asset information | |
| Query + File + Asset information | Merge | Combines query, file, and asset info | Query + File, Create Asset | Upload PDF File (asset) | |
| Upload PDF File (asset) | HTTP Request | Uploads PDF binary to Adobe asset | Query + File + Asset information | Process Query | |
| Process Query | HTTP Request | Sends transformation query to Adobe | Upload PDF File (asset) | Wait 5 second | |
| Wait 5 second | Wait | Delays for 5 seconds between polls | Process Query, Switch | Try to download the result | Sticky Note2: "Wait for file do be processed" |
| Try to download the result | HTTP Request | Polls for query result or status | Wait 5 second | Switch | |
| Switch | Switch | Routes based on processing status | Try to download the result | Wait 5 second (if in progress), Forward response to origin workflow (if failed or complete) | |
| Forward response to origin workflow | Set | Passes final result to caller | Switch | — | |
| Execute Workflow Trigger | Execute Workflow Trigger | Allows external workflows to call this wrapper | — | Authenticartion (get token), Query + File + Asset information | |
| Sticky Note | Sticky Note | Documentation and guidance | — | — | See section 5 for full sticky note contents |
| Sticky Note1 | Sticky Note | Development testing note | — | — | |
| Sticky Note2 | Sticky Note | Explains wait node | — | — | "Wait for file do be processed" |
| Sticky Note3 | Sticky Note | Details credential for token request | — | — | Custom Auth credential example with client_id and client_secret body |
| Sticky Note4 | Sticky Note | Details credential for API queries | — | — | Header Auth credential example with X-API-Key |
| Sticky Note5 | Sticky Note | Explains workflow input format | — | — | Examples of endpoint and json_payload formats |
4. Reproducing the Workflow from Scratch
-
Create Manual Trigger Node:
- Node type: Manual Trigger
- Purpose: Start workflow manually for testing.
-
Add Dropbox node to download test PDF:
- Node type: Dropbox
- Operation: Download
- Path: Set to desired PDF file path in Dropbox.
- Authentication: OAuth2 with Dropbox credentials.
-
Add Set node to prepare Adobe API query:
- Node type: Set
- Fields:
endpoint: string, e.g.,"extractpdf"json_payload: JSON object with desired extraction params, e.g.,{ "renditionsToExtract": ["tables"], "elementsToExtract": ["text","tables"] }
-
Merge node to combine query and PDF file:
- Node type: Merge
- Mode: Combine by position (mergeByPosition)
- Inputs: Connect outputs of Set node and Dropbox node.
-
Add HTTP Request node for Adobe Authentication:
- Node type: HTTP Request
- Method: POST
- URL:
https://pdf-services.adobe.io/token - Content-Type:
application/x-www-form-urlencoded - Authentication: Use a Custom Auth credential configured as:
{ "headers": { "Content-Type": "application/x-www-form-urlencoded" }, "body": { "client_id": "YOUR_CLIENT_ID", "client_secret": "YOUR_CLIENT_SECRET" } } - Input: Connect from Merge node.
-
Add HTTP Request node to Create Asset:
- Node type: HTTP Request
- Method: POST
- URL:
https://pdf-services.adobe.io/assets - Authentication: Use a Header Auth credential with header
Authorization: Bearer {{access_token}}(access_token from previous node). - Body: JSON
{ "mediaType": "application/pdf" } - Input: Connect from Authentication node.
-
Add Merge node to combine query+file with asset info:
- Node type: Merge
- Mode: Combine by position
- Inputs: From "Query + File" merge node and "Create Asset" node.
-
Add HTTP Request node to Upload PDF File:
- Node type: HTTP Request
- Method: PUT
- URL: Dynamic from asset info
uploadUri - Content-Type: binaryData
- Input Data Field: Set to binary PDF file data
- Input: Connect from previous Merge node.
-
Add HTTP Request node to Process Query:
- Node type: HTTP Request
- Method: POST
- URL:
https://pdf-services.adobe.io/operation/{{endpoint}}(endpoint from merged data) - Authentication: Header with Bearer token.
- Body: JSON merging
assetIDandjson_payload. - Input: Connect from Upload PDF node.
-
Add Wait node:
- Node type: Wait
- Duration: 5 seconds
- Input: Connect from Process Query node.
-
Add HTTP Request node to Try Download Result:
- Node type: HTTP Request
- Method: GET
- URL: From
locationheader of Process Query response - Authentication: Bearer token header
- Input: Connect from Wait node.
-
Add Switch node:
- Node type: Switch
- Check JSON field
status - Condition branches:
- If
in progress: Connect back to Wait node (loop). - If
failedor others: Forward output to next node.
- If
-
Add Set node to Forward response:
- Node type: Set
- Purpose: Pass final output downstream or back to calling workflow.
- Input: Connect from Switch node.
-
Optional: Add Execute Workflow Trigger node if you want this workflow callable from others.
-
Create Credentials:
- Custom Auth credential for token request with client_id and client_secret in body.
- Header Auth credential for other API calls with
X-API-Keyheader (same as client_id).
5. General Notes & Resources
| Note Content | Context or Link |
|---|---|
| Adobe API Wrapper workflow documentation and official API docs: https://developer.adobe.com/document-services/docs/overview/pdf-services-api/howtos/ and https://developer.adobe.com/document-services/docs/overview/pdf-extract-api/gettingstarted/ | Sticky Note at workflow start |
Credential setup for token request requires creating a Custom Auth credential with client_id and client_secret in the body, Content-Type application/x-www-form-urlencoded. |
Sticky Note3 |
Credential setup for all other Adobe API queries requires Header Auth credential with X-API-Key header set to client_id. |
Sticky Note4 |
Workflow input expects an object with endpoint string, json_payload object (excluding assetID), and PDF binary data as input. Examples provided for splitpdf and extractpdf endpoints. |
Sticky Note5 |
| Typical use case: extracting tables as images from PDFs to forward to AI systems for improved recognition accuracy. | Workflow overview section |
| The workflow is designed to be used as a sub-workflow via Execute Workflow node or manually triggered for testing. | Node "Execute Workflow Trigger" presence |
| Rate limits: Adobe free tier allows up to 500 PDF operations per month. | Workflow description |
| Polling delay is set to 5 seconds to avoid overwhelming Adobe API during processing. | Sticky Note2 |
This document fully describes the workflow’s logic, node configuration, and integration points to enable seamless reproduction, modification, and error anticipation.