Overview
This model utilizes a lightweight Vision Language Model (VLM) to convert document images into a structured, hierarchical JSON format.
By modeling documents as visual-semantic hierarchies, the model provides a unified representation that captures both layout and meaning, enabling intelligent document understanding and automation.
Key Capabilities:
Hierarchical Representation: Each document is modeled as a tree-like structure, where nodes correspond to layout elements , such as titles, paragraphs, tables, and form fields and branches represent their visual and logical relationships. This unified format accommodates complex structures including multi-column layouts, nested forms, and irregular content blocks.
Preserved Reading Order with Layout Integrity: Unlike traditional OCR pipelines, this model maintains the correct reading flow while preserving layout fidelity. Multi-column documents, embedded tables, and nested components are accurately interpreted and organized, ensuring that semantic coherence and visual context remain intact.
Semantic Relationship Modeling: The hierarchical structure supports explicit parent-child relationships, enabling deep semantic linkage between document elements. For instance, in structured forms, field labels (e.g., "Name:") are directly linked to their values ("John Doe"), and entire sections can be grouped under relevant headers. This enables precise extraction of not just content, but context and dependencies. This flexible and extensible representation makes the system highly adaptable to diverse document types, supporting advanced use cases in document parsing, semantic search, data extraction, and downstream automation.
IMPORTANT USAGE INFORMATION:
After subscribing to this product and creating a SageMaker endpoint, billing occurs on an HOURLY BASIS for as long as the endpoint is running.
-Charges apply even if the endpoint is idle and not actively processing requests.
-To stop charges, you MUST DELETE the endpoint in your SageMaker console.
-Simply stopping requests will NOT stop billing.
This ensures you are only billed for the time you actively use the service.
Highlights
- Recognized Element Types: * Headers - Section titles and document headings * Questions - Form labels such as "Name:" or "Date of Birth:" * Answers - Corresponding values like "Jane Doe" or "March 5, 1982" * Other - Unstructured content, including paragraphs and descriptive text * Parent - Child Relationships - Hierarchical links that provide structural and contextual context
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g4dn.12xlarge Inference (Batch) Recommended | Model inference on the ml.g4dn.12xlarge instance type, batch mode | $23.76 |
ml.g4dn.12xlarge Inference (Real-Time) Recommended | Model inference on the ml.g4dn.12xlarge instance type, real-time mode | $23.76 |
ml.g5.xlarge Inference (Batch) | Model inference on the ml.g5.xlarge instance type, batch mode | $5.94 |
ml.g5.2xlarge Inference (Batch) | Model inference on the ml.g5.2xlarge instance type, batch mode | $5.94 |
ml.g5.4xlarge Inference (Batch) | Model inference on the ml.g5.4xlarge instance type, batch mode | $5.94 |
ml.g5.12xlarge Inference (Batch) | Model inference on the ml.g5.12xlarge instance type, batch mode | $23.76 |
ml.g5.xlarge Inference (Real-Time) | Model inference on the ml.g5.xlarge instance type, real-time mode | $5.94 |
ml.g5.2xlarge Inference (Real-Time) | Model inference on the ml.g5.2xlarge instance type, real-time mode | $5.94 |
ml.g5.4xlarge Inference (Real-Time) | Model inference on the ml.g5.4xlarge instance type, real-time mode | $5.94 |
ml.g5.12xlarge Inference (Real-Time) | Model inference on the ml.g5.12xlarge instance type, real-time mode | $23.76 |
Vendor refund policy
No refunds are possible.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
This application leverages a lightweight Vision-Language Model (VLM) to transform document images into a structured, hierarchical JSON representation. By representing documents as hierarchies, we unlock a powerful and versatile way to capture layout and meaning in a unified format.
Additional details
Inputs
- Summary
Input Format
Chat Completion
Example Payload
Online Image Example
{ "model": "/opt/ml/model", "messages": [ { "role": "user", "content": [ { "type": "image_url", "url": "https://raw.githubusercontent.com/JohnSnowLabs/visual-nlp-workshop/7f5eec01dd96897dccb064d1e42a4ef2e90083a0/jupyter/data/funsd/83823750.png " } ] } ] }
For additional parameters:
Offline Image Example (Base64)
{ "model": "/opt/ml/model", "messages": [ {"role": "system", "content": "You are a helpful medical assistant."}, { "role": "user", "content": [ { "type": "image_url", "image_url": "data:image/jpeg;base64,..." } ] } ] }
Reference:
Important Notes: Model Path Requirement: Always set "model": "/opt/ml/model" (SageMaker's fixed model location)
- Input MIME type
- application/json
Support
Vendor support
For any assistance, please reach out to support@johnsnowlabs.com .
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products

