Manipulate PDF with Adobe developer API https://n8nworkflows.xyz/workflows/manipulate-pdf-with-adobe-developer-api-2424 # Manipulate PDF with Adobe developer API ### 1. Workflow Overview This workflow serves as a comprehensive, generic wrapper to interact with the Adobe PDF Services API, enabling various PDF manipulations such as splitting, combining, OCR, page operations, and content extraction. It encapsulates the multi-step Adobe API process, which includes authentication, asset registration, file upload, query execution, result polling, and downloading the transformed PDF or related data. **Target Use Cases:** - Automating PDF transformations using Adobe’s API from other workflows or manual triggers. - Extracting clean PDF content for AI and Retrieval-Augmented Generation (RAG) systems, e.g., extracting tables as images for improved AI recognition. - Developers needing a reusable, modular integration with Adobe PDF Services. **Logical Blocks:** - **1.1 Input Reception & Setup:** Triggering the workflow and preparing input data and query parameters. - **1.2 Authentication:** Obtaining a temporary access token from Adobe for API calls. - **1.3 Asset Registration & Upload:** Registering a new PDF asset and uploading the file to Adobe’s cloud storage. - **1.4 Query Processing:** Sending the transformation request to Adobe and managing asynchronous processing. - **1.5 Result Handling:** Polling for completion, downloading the output, and forwarding results back. --- ### 2. Block-by-Block Analysis #### 1.1 Input Reception & Setup **Overview:** This block receives the initial trigger to start the workflow and prepares the Adobe API query parameters alongside input PDF data. **Nodes Involved:** - When clicking ‘Test workflow’ (Manual Trigger) - Load a test pdf file (Dropbox node) - Adobe API Query (Set node) - Query + File (Merge node) **Node Details:** - **When clicking ‘Test workflow’** - Type: Manual Trigger - Role: Entry point for manual testing. - Inputs: None - Outputs: Triggers the "Load a test pdf file" and "Adobe API Query" nodes. - Failure modes: None significant; manual trigger. - **Load a test pdf file** - Type: Dropbox (Download operation) - Role: Downloads a test PDF file from Dropbox for processing. - Configuration: Path set to a specific PDF in Dropbox, OAuth2 authentication used. - Inputs: Triggered by manual trigger node. - Outputs: Binary PDF data. - Failure modes: OAuth2 token expiry, file not found, network errors. - **Adobe API Query** - Type: Set node - Role: Prepares the JSON payload and the target endpoint for the Adobe API request. - Configuration: - `endpoint`: set to `"extractpdf"` (example use case). - `json_payload`: JSON object specifying what to extract, here “tables” and “text”. - Inputs: None directly (manual trigger) - Outputs: Passes JSON data downstream. - Failure modes: Expression evaluation failure if JSON malformed. - **Query + File** - Type: Merge (combine by position) - Role: Combines the query parameters and the PDF file binary data into one data object for subsequent processing. - Inputs: From "Adobe API Query" and "Load a test pdf file" nodes. - Outputs: Merged JSON + binary object. - Failure modes: Mismatched input positions or missing inputs. --- #### 1.2 Authentication **Overview:** Obtains an OAuth access token from Adobe necessary for all subsequent API calls. **Nodes Involved:** - Authenticartion (get token) (HTTP Request) **Node Details:** - **Authenticartion (get token)** - Type: HTTP Request (POST) - Role: Exchanges client credentials for a temporary access token. - Configuration: - URL: `https://pdf-services.adobe.io/token` - Content-Type: `application/x-www-form-urlencoded` - Uses a custom HTTP credential containing client_id and client_secret in the body. - Inputs: Triggered after input preparation. - Outputs: JSON object containing `access_token`. - Failure modes: Invalid credentials, network errors, token expiration. - Credential note: Requires a "Custom Auth" credential with body parameters client_id and client_secret. --- #### 1.3 Asset Registration & Upload **Overview:** Registers a new asset to host the PDF on Adobe servers and uploads the PDF binary data to this asset. **Nodes Involved:** - Create Asset (HTTP Request) - Query + File + Asset information (Merge node) - Upload PDF File (asset) (HTTP Request) **Node Details:** - **Create Asset** - Type: HTTP Request (POST) - Role: Registers a new asset with Adobe by sending a POST request with mediaType "application/pdf". - Configuration: - URL: `https://pdf-services.adobe.io/assets` - Auth: Header with `Authorization: Bearer {{access_token}}` - Sends JSON body `{ "mediaType": "application/pdf" }` - Inputs: Access token from Authentication node. - Outputs: JSON response containing asset metadata including assetID and uploadUri. - Failure modes: Auth errors, rate limits, malformed requests. - **Query + File + Asset information** - Type: Merge (combine by position) - Role: Combines the original query + file data with the newly created asset info. - Inputs: From "Query + File" and "Create Asset" nodes. - Outputs: Merged object including assetID and uploadUri for upload. - Failure modes: Input mismatch, missing asset info. - **Upload PDF File (asset)** - Type: HTTP Request (PUT) - Role: Uploads the PDF binary data to the asset’s upload URI. - Configuration: - URL: dynamic, based on `uploadUri` from asset registration. - Content-Type: binaryData - Input data field: binary PDF file data from merged input. - Inputs: Merged data containing uploadUri and binary file. - Outputs: Confirmation of upload success. - Failure modes: Upload failures, network issues, invalid uploadUri. --- #### 1.4 Query Processing **Overview:** Submits the requested PDF transformation query to Adobe, waits for processing, and polls for completion. **Nodes Involved:** - Process Query (HTTP Request) - Wait 5 second (Wait node) - Try to download the result (HTTP Request) - Switch (Switch node) **Node Details:** - **Process Query** - Type: HTTP Request (POST) - Role: Calls the Adobe API operation endpoint for the requested transformation. - Configuration: - URL: `https://pdf-services.adobe.io/operation/{{endpoint}}` where `endpoint` is dynamic. - Body: JSON combining `assetID` and the original `json_payload`. - Auth: Bearer token header. - Specifies full HTTP response to get headers (used for polling). - Inputs: Merged query + file + asset data. - Outputs: HTTP response with 'location' header for polling result URL. - Failure modes: Auth errors, invalid payload, server errors. - **Wait 5 second** - Type: Wait node - Role: Delay between polling attempts to avoid overloading Adobe. - Configuration: 5 seconds delay. - Inputs: Chain from Process Query or Switch node. - Outputs: Triggers next polling attempt. - Failure modes: None likely. - **Try to download the result** - Type: HTTP Request (GET) - Role: Polls the URL in 'location' header for processing status or final result. - Configuration: - URL: dynamic from `Process Query` response header location. - Auth: Bearer token header. - Inputs: After wait node. - Outputs: JSON containing status and possibly download URL. - Failure modes: 404 if not ready, network errors, token expiry. - **Switch** - Type: Switch node - Role: Routes output based on status field (`in progress`, `failed`, or else). - Configuration: Checks JSON `status` field to determine next step. - Inputs: From download attempt node. - Outputs: - If "in progress": loops back to Wait 5 second (poll again). - If "failed" or other: forwards response to origin workflow (end). - Failure modes: Missing or unexpected status fields. --- #### 1.5 Result Handling **Overview:** Forwards the final output or error back to the calling workflow or user. **Nodes Involved:** - Forward response to origin workflow (Set node) **Node Details:** - **Forward response to origin workflow** - Type: Set node - Role: Passes through the final JSON data from Adobe API to the outer workflow or caller. - Configuration: Includes all fields as is without modification. - Inputs: From Switch node on failure or success completion. - Outputs: Final output of the workflow. - Failure modes: None significant; pass-through node. --- ### 3. Summary Table | Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note | |------------------------------|---------------------|-------------------------------------|-----------------------------------------|----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------| | When clicking ‘Test workflow’ | Manual Trigger | Manual workflow start trigger | — | Load a test pdf file, Adobe API Query | | | Load a test pdf file | Dropbox | Downloads test PDF from Dropbox | When clicking ‘Test workflow’ | Query + File | | | Adobe API Query | Set | Prepares endpoint and payload JSON | When clicking ‘Test workflow’ | Query + File | | | Query + File | Merge | Merges query and PDF file data | Adobe API Query, Load a test pdf file | Authenticartion (get token), Query + File + Asset information | | | Authenticartion (get token) | HTTP Request | Retrieves Adobe API access token | Query + File, Execute Workflow Trigger | Create Asset | See Sticky Note3: Custom Auth credential with client_id and client_secret in body. | | Create Asset | HTTP Request | Registers new PDF asset on Adobe | Authenticartion (get token) | Query + File + Asset information | | | Query + File + Asset information | Merge | Combines query, file, and asset info | Query + File, Create Asset | Upload PDF File (asset) | | | Upload PDF File (asset) | HTTP Request | Uploads PDF binary to Adobe asset | Query + File + Asset information | Process Query | | | Process Query | HTTP Request | Sends transformation query to Adobe | Upload PDF File (asset) | Wait 5 second | | | Wait 5 second | Wait | Delays for 5 seconds between polls | Process Query, Switch | Try to download the result | Sticky Note2: "Wait for file do be processed" | | Try to download the result | HTTP Request | Polls for query result or status | Wait 5 second | Switch | | | Switch | Switch | Routes based on processing status | Try to download the result | Wait 5 second (if in progress), Forward response to origin workflow (if failed or complete) | | | Forward response to origin workflow | Set | Passes final result to caller | Switch | — | | | Execute Workflow Trigger | Execute Workflow Trigger | Allows external workflows to call this wrapper | — | Authenticartion (get token), Query + File + Asset information | | | Sticky Note | Sticky Note | Documentation and guidance | — | — | See section 5 for full sticky note contents | | Sticky Note1 | Sticky Note | Development testing note | — | — | | | Sticky Note2 | Sticky Note | Explains wait node | — | — | "Wait for file do be processed" | | Sticky Note3 | Sticky Note | Details credential for token request| — | — | Custom Auth credential example with client_id and client_secret body | | Sticky Note4 | Sticky Note | Details credential for API queries | — | — | Header Auth credential example with X-API-Key | | Sticky Note5 | Sticky Note | Explains workflow input format | — | — | Examples of endpoint and json_payload formats | --- ### 4. Reproducing the Workflow from Scratch 1. **Create Manual Trigger Node:** - Node type: Manual Trigger - Purpose: Start workflow manually for testing. 2. **Add Dropbox node to download test PDF:** - Node type: Dropbox - Operation: Download - Path: Set to desired PDF file path in Dropbox. - Authentication: OAuth2 with Dropbox credentials. 3. **Add Set node to prepare Adobe API query:** - Node type: Set - Fields: - `endpoint`: string, e.g., `"extractpdf"` - `json_payload`: JSON object with desired extraction params, e.g., `{ "renditionsToExtract": ["tables"], "elementsToExtract": ["text","tables"] }` 4. **Merge node to combine query and PDF file:** - Node type: Merge - Mode: Combine by position (mergeByPosition) - Inputs: Connect outputs of Set node and Dropbox node. 5. **Add HTTP Request node for Adobe Authentication:** - Node type: HTTP Request - Method: POST - URL: `https://pdf-services.adobe.io/token` - Content-Type: `application/x-www-form-urlencoded` - Authentication: Use a **Custom Auth** credential configured as: ```json { "headers": { "Content-Type": "application/x-www-form-urlencoded" }, "body": { "client_id": "YOUR_CLIENT_ID", "client_secret": "YOUR_CLIENT_SECRET" } } ``` - Input: Connect from Merge node. 6. **Add HTTP Request node to Create Asset:** - Node type: HTTP Request - Method: POST - URL: `https://pdf-services.adobe.io/assets` - Authentication: Use a **Header Auth** credential with header `Authorization: Bearer {{access_token}}` (access_token from previous node). - Body: JSON `{ "mediaType": "application/pdf" }` - Input: Connect from Authentication node. 7. **Add Merge node to combine query+file with asset info:** - Node type: Merge - Mode: Combine by position - Inputs: From "Query + File" merge node and "Create Asset" node. 8. **Add HTTP Request node to Upload PDF File:** - Node type: HTTP Request - Method: PUT - URL: Dynamic from asset info `uploadUri` - Content-Type: binaryData - Input Data Field: Set to binary PDF file data - Input: Connect from previous Merge node. 9. **Add HTTP Request node to Process Query:** - Node type: HTTP Request - Method: POST - URL: `https://pdf-services.adobe.io/operation/{{endpoint}}` (endpoint from merged data) - Authentication: Header with Bearer token. - Body: JSON merging `assetID` and `json_payload`. - Input: Connect from Upload PDF node. 10. **Add Wait node:** - Node type: Wait - Duration: 5 seconds - Input: Connect from Process Query node. 11. **Add HTTP Request node to Try Download Result:** - Node type: HTTP Request - Method: GET - URL: From `location` header of Process Query response - Authentication: Bearer token header - Input: Connect from Wait node. 12. **Add Switch node:** - Node type: Switch - Check JSON field `status` - Condition branches: - If `in progress`: Connect back to Wait node (loop). - If `failed` or others: Forward output to next node. 13. **Add Set node to Forward response:** - Node type: Set - Purpose: Pass final output downstream or back to calling workflow. - Input: Connect from Switch node. 14. **Optional:** Add Execute Workflow Trigger node if you want this workflow callable from others. 15. **Create Credentials:** - Custom Auth credential for token request with client_id and client_secret in body. - Header Auth credential for other API calls with `X-API-Key` header (same as client_id). --- ### 5. General Notes & Resources | Note Content | Context or Link | |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Adobe API Wrapper workflow documentation and official API docs: https://developer.adobe.com/document-services/docs/overview/pdf-services-api/howtos/ and https://developer.adobe.com/document-services/docs/overview/pdf-extract-api/gettingstarted/ | Sticky Note at workflow start | | Credential setup for token request requires creating a Custom Auth credential with client_id and client_secret in the body, Content-Type `application/x-www-form-urlencoded`. | Sticky Note3 | | Credential setup for all other Adobe API queries requires Header Auth credential with `X-API-Key` header set to client_id. | Sticky Note4 | | Workflow input expects an object with `endpoint` string, `json_payload` object (excluding assetID), and PDF binary data as input. Examples provided for `splitpdf` and `extractpdf` endpoints. | Sticky Note5 | | Typical use case: extracting tables as images from PDFs to forward to AI systems for improved recognition accuracy. | Workflow overview section | | The workflow is designed to be used as a sub-workflow via Execute Workflow node or manually triggered for testing. | Node "Execute Workflow Trigger" presence | | Rate limits: Adobe free tier allows up to 500 PDF operations per month. | Workflow description | | Polling delay is set to 5 seconds to avoid overwhelming Adobe API during processing. | Sticky Note2 | --- This document fully describes the workflow’s logic, node configuration, and integration points to enable seamless reproduction, modification, and error anticipation.