Skip to main content

Structured formats import

Structured formats import allows for importing and processing of non-visual documents such as JSON or XML files. It not only correctly extracts the information from these files, but also renders a minimalistic PDF representation for easier manual reviews.

Installation

Structured formats import is a webhook maintained by Rossum. In order to use it, follow these steps:

  1. Login to your Rossum account.
  2. Navigate to Extensions → My extensions.
  3. Click on Create extension.
  4. Fill the following fields:
    1. Name: Structured formats import
    2. Trigger events: Document content: Initialize, Started, Updated
    3. Extension type: Webhook
    4. URL (see below)
  5. Click Create the webhook.
  6. Fill Configuration field (see Configuration examples
EnvironmentWebhook URL
EU1 Irelandhttps://elis.task-manager.rossum-ext.app/api/v1/tasks/structured-formats-import
EU2 Frankfurthttps://shared-eu2.task-manager.rossum-ext.app/api/v1/tasks/structured-formats-import
US east coasthttps://us.task-manager.rossum-ext.app/api/v1/tasks/structured-formats-import
Japan Tokyohttps://shared-jp.task-manager.rossum-ext.app/api/v1/tasks/structured-formats-import

Basic usage

Work in progress

We're still working on this part and would love to hear your thoughts! Feel free to share your feedback or submit a pull request. Thank you! 🙏

Available configuration options

{
// Various independent configurations that can be conditionally triggered via `trigger_condition`:
"configurations": [
{
"trigger_condition": {
"file_type": "xml"
},

// Optional. Whether the original XML/JSON file should be split into smaller ones:
"split_selectors": ["/RecordLabel/Productions/Production"],

// Fields to be extracted from the source file and assigned to given datapoints:
"fields": [
{
"schema_id": "document_id",

// If many selectors are specified, they serve as a fallback list.
"selectors": ["./Metadata/ID"]
}
],

// Optional specification of the original PDF file that should be extracted from the source
// file (base64 encoded):
"pdf_file": {
"name_selectors": [
"cac:AdditionalDocumentReference/cac:Attachment/cbc:EmbeddedDocumentBinaryObject/@filename"
],

// Content should be base64 encoded:
"content_selectors": [
"cac:AdditionalDocumentReference/cac:Attachment/cbc:EmbeddedDocumentBinaryObject"
]
}
}
// …
]
}