-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Summary
When converting a PDF to a DoclingDocument using docling-java, the resulting JSON is missing the mandatory furniture field as defined in the DoclingDocument schema. This makes the document non-compliant with the schema and breaks downstream processing — for example, converting the document to Markdown fails because the furniture root node is absent.
Environment
- docling-java version: 0.4.7
- docling-serve version: v1.14.3
- Input file: https://dserver.bundestag.de/brd/2024/0266-24B.pdf
Steps to Reproduce
- Use
docling-javato convert a PDF via theConvertDocumentRequestAPI withOutputFormat.JSONandincludeImages(false). - Retrieve the
DoclingDocumentfromresponse.getDocument().getJsonContent(). - Serialize the
DoclingDocumentto JSON using JacksonObjectMapper. - Observe that the
furniturefield is absent from the output JSON.
Expected Behavior
The serialized JSON should include the furniture field, as it is a mandatory part of the DoclingDocument schema. When converting the same PDF using the docling-serve web UI (backed by the same docling-serve instance), the furniture field is correctly present:
"furniture": {
"self_ref": "#/furniture",
"parent": null,
"children": [],
"content_layer": "furniture",
"meta": null,
"name": "_root_",
"label": "unspecified"
}Actual Behavior
The JSON produced by docling-java is missing the furniture field entirely. The document starts directly with body after origin:
{
"schema_name": "DoclingDocument",
"version": "1.9.0",
"name": "4085-original",
"origin": { ... },
"body": { ... }
// ← no "furniture" field
}Please the diff of DoclingDocuments generated via docling-serve web UI and docling-java in the screenshot attached.
