Revolutionizing agricultural knowledge management using a multi-modal LLM: A reference architecture

Handwritten documents are still an important form of data capture in agribusiness. Paper-based handwritten documents can be the result of business culture, lack of internet connectivity, lack of mobile devices or computers, or environmental conditions in the field or in an industrial setting. Because of the physical nature of the document, there might be a delay in transcription or even no transcription into a digital system for enterprise reporting, causing critical information to be unavailable. Using generative AI, handwritten notes can be scanned to record and analyze the document and establish automated workflows for product procurement, the supply chain, and entry into customer relationship management (CRM), enterprise resource planning (ERP), and farm management information systems (FMIS).

Multi-modal large language models (LLMs) are transforming the agriculture industry by integrating diverse data types such as text, images, video, and audio. This approach enhances AI’s understanding and decision-making in farming contexts. For example, a multi-modal LLM can analyze images to identify crop issues, then generate targeted recommendations for irrigation or pest control. Combining handwritten documents and satellite imagery with the power of LLMs can lead to better crop analytics and better yields.

In this blog post, we introduce a reference architecture that offers an intelligent document digitization solution that converts handwritten notes, scanned documents, and images into editable, searchable, and accessible formats. Powered by Anthropic’s Claude 3 on Amazon Bedrock, the solution uses the sophisticated vision capabilities of LLMs to process a wide range of visual formats, preserving the original formatting while extracting text, tables, and images. This enables businesses to digitize their knowledge bases, facilitate seamless collaboration, and integrate the digitized content into their existing digital workflows, enhancing productivity and unlocking the full potential of their information assets.

A comprehensive solution and reference architecture

This reference architecture helps agricultural companies to automatically capture, analyze, and process handwritten notes and images with data and reports that are generated by individuals working in farm fields. This is an example of how to create an end-to-end solution to ingest these documents in image format with Amazon Bedrock. The processed information can be consumed by downstream systems such as CRM, ERP, and FMIS to make better data driven decisions.

The solution uses Anthropic’s Claude 3 multi modal model hosted in Amazon Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Claude is Anthropic’s state-of-the-art LLM that offers important features for enterprises such as advanced reasoning, generating text from images, code generation, and multilingual processing. Claude 3 models have sophisticated vision capabilities and can process a wide range of visual formats, including photos, charts, graphs and technical diagrams. You can also use other models such as Llama 3.2 11B and 90B, which also support vision tasks.

The following diagram illustrates the reference solution.

Revolutionizing agricultural knowledge management using a multi-modal LLM: A reference architecture

The process includes the following steps:

A field worker uploads handwritten notes in an image format using a static website on their mobile device. The static website is accessed through Amazon CloudFront and hosted in Amazon Simple Storage Service (Amazon S3).
The worker is securely authenticated using Amazon Cognito.
After the worker is authenticated, the uploaded handwritten notes are sent to Amazon Bedrock for processing using Amazon API Gateway.
An AWS Lambda function stores and reads the image from Amazon S3. It sends the uploaded image and associated prompt information to Anthropic’s Claude 3 hosted in Amazon Bedrock.
Anthropic’s Claude 3 processes the image. It recognizes the handwritten text and analyzes the converted text based on the given prompt.
The converted digital text and analyzed information provided by Anthropic’s Claude 3 are stored in Amazon DynamoDB for further downstream processing.
The field worker uses an app to access the converted digital text and newly processed information stored in Amazon DynamoDB through API Gateway.
The processed information is published to Amazon Simple Notification Service (Amazon SNS) and is consumed by downstream systems.
The field worker’s location details and processed image information are consumed by two different Amazon Simple Queue Service (Amazon SQS) queues to be stored in downstream systems.
The downstream systems can include CRM, FMS, and FMIS.

Additionally, using this solution, geospatial information such as GPS and GIS information can be sent to the FMIS. This can help farmers in many ways including crop monitoring, soil health and nutrient management, pest control, water management, farm mapping, and much more.

Best practices and implementation guidelines

To implement a production-ready system, it’s important to consider the following best practices.

Responsible AI: Deployment of customer facing generative AI solutions raises concerns about responsible AI practices. To mitigate risks such as biased outputs, exposure of sensitive information, or misuse for malicious purposes, it’s crucial to implement robust safeguards and validation mechanisms. Amazon Bedrock Guardrails is a set of tools and services provided by AWS that you can use to implement safeguards and responsible AI practices when building applications with generative AI models.

Security: Follow secure coding practices throughout the development lifecycle to minimize vulnerabilities. Protect your web applications from common exploits by integrating with AWS WAF. The OWASP Top 10 for Large Language Model Applications is a set of guidelines that address the unique security risks associated with generative AI solutions. It covers vulnerabilities such as model inversion, membership inference, and adversarial attacks—all of which can compromise the confidentiality, integrity, and availability of LLMs.

Observability: Monitor all layers of a generative AI solution, including the application, prompt, LLM, knowledgebase, and response provided by the LLM. You can monitor health and performance using Amazon CloudWatch.

LLMOps: Implementing LLM operations (LLMOps) will help to scale your GenAI solutions. See FMOps/LLMOps: Operationalize generative AI and differences with MLOps for additional information.

Conclusion

In this post, we introduced a reference architecture for an intelligent document digitization solution in agriculture. This system uses Amazon Bedrock and the multi-modal capabilities of LLMs such as Anthropic’s Claude 3 to transform handwritten notes and multi-modal data into searchable, digital formats. We explored how this architecture bridges the gap between traditional field documentation and modern digital systems, enhancing data accessibility and decision-making in agribusiness.

The possibilities for customization and expansion are vast. For specific use cases, you can fine-tune the multi-modal model on your unique agricultural business data. You can also implement a combination of multi-modal processing and a specialized knowledge base using Amazon Bedrock Knowledge Bases, further enhancing the system’s accuracy and relevance.

AWS Architecture Blog

Revolutionizing agricultural knowledge management using a multi-modal LLM: A reference architecture

A comprehensive solution and reference architecture

Best practices and implementation guidelines

Conclusion

About the Authors

Resources

Follow

Learn

Resources

Developers

Help