Skip to main content

Export pipelines

After documents are processed in Rossum, the extracted data typically need to be exported to the downstream systems. The Rossum team has prepared an "Export pipeline" for this very purpose.

Additional info for Rossum employees

Please visit the following restricted link to learn more: https://rossumai.atlassian.net/l/cp/t2we9106

Components of Export pipelines

The export pipeline consists of the following components chained together:

  1. Custom format templating purge, a cleaning mechanism that prepares the pipeline for export.
  2. Custom format templating, prepares the desired format for an export.
  3. REST API export, exports the prepared data to REST API and stores the reply.
  4. Data value extractor extracts important data from the API reply and stores them in the annotation object, e.g. downstream document ID, HTTP status codes.
  5. Export evaluator that decides whether the export is successful or it has failed.
  6. Finally, SFTP Export, upload the prepared data to SFTP or S3 file storage.

How to use Export pipelines

All the components of Export pipelines are typically connected by the standard extension chaining mechanism "run-after". Here are several extension chains demonstrated:

Simple SFTP export pipeline

  1. Custom format templating prepares extracted data in desired format.
  2. SFTP export stores data in on an SFTP (or S3).

Simple API export pipeline

  1. (Optional) Pipeline cleaning cleans previous export data (relevant for debugging).
  2. Custom format templating prepares extracted data in desired format.
  3. REST API calls an external API service and sends the prepared data. The extension also stores returned values including status code.
  4. Extract data to store needed information in the document (e.g. status code)
  5. Export evaluator that based on condition decides whether the export is succesfful (e.g. status code = 200, 201).

Complex API export pipeline

Work in progress

We're still working on this part and would love to hear your thoughts! Feel free to share your feedback or submit a pull request. Thank you! 🙏