Configuration examples
Detect duplicates based on fields and relations
This is probably the most common use case for duplicates detection. It detects duplicates based on the following:
- Content of
document_id
field - Content of
supplier_id
field (since two suppliers can use the same numbering scheme) - Compares binary content of the files ("relation")
If duplicate is detected, field is_rossum_duplicate
is set to true
.
Recommended configuration of the is_rossum_duplicate
datapoint.
{
"rir_field_names": [],
"constraints": {
"required": true
},
"score_threshold": 0.0,
"default_value": "false",
"category": "datapoint",
"id": "is_rossum_duplicate",
"label": "is_rossum_duplicate",
"hidden": true,
"disable_prediction": true,
"type": "enum",
"can_export": false,
"ui_configuration": {
"type": "data",
"edit": "disabled"
},
"options": [
{
"value": "true",
"label": "true"
},
{
"value": "false",
"label": "false"
}
]
}
{
"configurations": [
{
"logic": [
{
"rules": [
{
"id": 1,
"attribute": "field",
"field_schema_id": "document_id"
},
{
"id": 2,
"attribute": "field",
"field_schema_id": "supplier_id"
},
{
"id": 3,
"attribute": "relation"
}
],
"scope": {
"object": "queue",
"statuses": [
"failed_import",
"split",
"to_review",
"reviewing",
"in_workflow",
"confirmed",
"rejected",
"exporting",
"exported",
"failed_export",
"postponed",
"deleted"
]
},
"actions": [
{
"type": "fill_field",
"field_to_fill": "is_rossum_duplicate",
"value_to_fill": "true"
}
],
"matching_flow": ["1and2", "3"]
}
],
"trigger_events": ["annotation_content"],
"trigger_actions": ["initialize", "started", "user_update", "updated"]
}
]
}
Later, users can decide what to do with the is_rossum_duplicate
information. Typically, we display warning or error message.