AWS Public Sector Blog
Accelerating workflows with generative AI: EPA’s document processing journey
The United States Environmental Protection Agency (EPA)’s “Powering the Great American Comeback” initiative includes a pillar to make the U.S. the artificial intelligence (AI) capital of the world. EPA’s Office of Chemical Safety and Pollution Prevention (OCSPP) and Office of Pesticide Programs (OPP) are putting those words into action. At the AWS Summit Washington, DC 2025, scientists from the AWS Generative AI Innovation Center (GenAIIC) and EPA presented two proof-of-concepts (POCs) that can improve time to market and save millions of taxpayer dollars.
EPA’s mission is to protect human health and the environment, which includes protection from risks posed by pesticides and chemicals in commerce. In 2024, EPA and the AWS GenAIIC built the POCs to demonstrate efficiency gains that they can achieve by leveraging generative AI to reduce manual work and help EPA scientists to evaluate studies. In this post, we highlight how two EPA leaders are reimagining EPA’s mission delivery with intelligent document processing (IDP), powered by Amazon Web Services (AWS).
AI for science
EPA has reported its intent to use AI to assist staff as opposed to make policy decisions. When using AI for scientific work, it is important to ensure scientific integrity of the results. The AWS GenAIIC and EPA approached this with:
- Transparency: The system clearly displays confidence scores for AI-generated content like text extraction and highlights when responses have been human-reviewed.
- Verification tools: EPA scientists can ask questions to a chatbot integrated with the document to verify generated responses (e.g., “What was the sample size in this experiment?” or “How was randomization performed?”).
- Complete human control: Scientists review all AI-generated content and can modify or override any responses before accepting them.
- Responsible AI practices: The solution implements Amazon Bedrock Guardrails to prevent AI hallucination and ensure scientific accuracy.
Accelerating EPA’s chemical risk assessments
EPA evaluates thousands of chemicals, and bases policy recommendations on research studies conducted by research institutions, industry, government, and other sources of studies. A key step is to evaluate the quality of the studies based on criteria that is specific to the goal of an assessment. Today, two EPA scientists must evaluate criteria including, but not limited to, reporting quality, selection and performance, variable control, exposure methods, results presentation, and overall study confidence. This step has traditionally been a labor-intensive and time-consuming process requiring experts to review evidence and address criteria. Each criterion gets a judgement of good, adequate, deficient, or critically deficient. If the two scientists disagree on a judgement, a third scientist is brought in to break the tie. This process is labor- and time-intensive. With limited staff and mandates to assess more chemicals, EPA needed a more efficient approach without sacrificing scientific rigor.
Sean Watford, senior environmental data and systems scientist at EPA, sought an alternative. The AWS GenAIIC and Watford built a POC to evaluate studies based on predetermined criteria. Scientists from the AWS GenAIIC worked with EPA scientists to understand the specific EPA mission need, keep scientists in control, measure accuracy, and qualify the solution’s efficiency gains and return on investment (ROI).
Watford aimed to increase the rate at which his team could provide high-quality results for chemical assessments. The traditional manual review process was creating bottlenecks in their ability to meet deadlines. A new process could transform a task that currently takes an average of one hour per person to complete into a task that takes a few minutes. The AI could handle the laborious document analysis while the scientists maintain complete control over the final scientific determinations.
Here is how the chemical assessment process could work:
- IDP: When research papers are uploaded to Amazon Simple Storage Service (Amazon S3), Amazon Textract extracts and processes the text while preserving layout information. The new process was run on studies which had already been evaluated by EPA scientists, without any pre-training of the foundation models (FMs) on Amazon Bedrock.
- Automated evaluation: Using Amazon Bedrock with Anthropic Claude 3.7 Sonnet, the system generates responses to nine standard evaluation criteria that scientists historically had to produce manually. The criteria are defined in and retrieved from EPA Health Assessment Workspace Collaborative (HAWC), which is a collaborative platform for environmental research and repository for evidence collected in EPA science assessments.
- Human-in-the-loop (HITL) review: Scientists review AI-generated responses through an intuitive interface, where they can verify information by directly querying the document using a complementary chatbot that allows scientists to ask questions about specific sections of the research paper.
- Quality control: Scientists maintain complete control over the process, accepting or modifying the AI-generated evaluations before finalizing them for publication to HAWC. For this experiment, the model did well where the criteria were clearly defined. Where model inferences were not the same as previous human reviews of the same document, it was typically found that the prompt could be improved to facilitate more accurate judgement.
- Multi-document intelligence: The solution includes multi-document chat capability across sets of papers, allowing scientists to find patterns and insights across multiple studies.
- Seamless integration: The system is extendible to work with existing EPA workflows and systems to ensure a smooth transition for scientists without disrupting established processes.
- Cost: The pipeline processes studies in batches to reduce the cost of Amazon Bedrock inferences.
This solution can help EPA to improve key metrics, including:
- 85 percent reduction in processing time: Tasks that previously took months can now be completed in hours or days. The EPA can evaluate more chemicals in less time, and increase public health protections, like identifying chemicals that may disproportionately affect vulnerable populations.
- 85 percent accuracy rate: The AI-generated evaluations achieve high accuracy when compared to manually derived results. The models provide more consistent evaluations across chemical assessments, and EPA scientists can redirect their expertise toward complex questions that require human judgment.
- Cost-effective solution: Processing 250 research documents with nine evaluation prompts costs approximately $40 in Amazon Bedrock usage, which is less than one hour of staff time. In comparison, the traditional method takes roughly 500 hours from EPA scientists to do manually. The EPA makes effective use of taxpayer dollars by gaining efficiency and reducing labor costs.
Watford considered the experiment a success. It captured the performance baselines for the tool and proved that FMs can assist EPA scientists to accelerate mission delivery. Watford plans to continue to improve the tool by optimizing the study quality criteria language to get better AI-generated outputs.
Cooperative federalism: EPA’s efforts to streamline FIFRA applications
The Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA) empowers EPA to regulate pesticides in the United States. Companies in the pesticide industry need to get EPA’s approval to manufacture and market their products by demonstrating they do not present undue risk to human health or the environment. OPP receives more than ten thousand applications for regulatory approval per year, and 20 percent of these applications are fee for service (FFS)—meaning pesticide companies pay EPA, and EPA is subject to the Pesticide Registration Improvement Act (PRIA) statutory deadlines. OPP has a statutory deadline of four to 36 months to process and approve these applications.
OPP faces challenges to meet these statutory deadlines due to staffing shortages, staff retention, and cumbersome document review processes. Today, when a company submits a PRIA application to OPP, they submit a large volume of scientific studies related to the health and environmental exposure risk of their product. OPP scientists rigorously evaluate the studies for their adherence to guidelines prior to granting approvals. The first step is for OPP to create a data evaluation record (DER) for each study, and then review it for accuracy. Due to the volume of work, when done manually, the DER creation step can take four months, which is more than half of statutory review time.
In 2024, the AWS GenAIIC and Daniel Schoeff, senior advisor at EPA OPP, sought to solve these challenges. The POC inputs the health and safety studies, and outputs the DER in seconds. The DER is comprised of extracted text and literature summaries, which can be done quickly and at a fraction of the cost with Amazon Textract and Amazon Bedrock. The POC also boosts OPP scientist productivity with intelligent search capabilities. Schoeff said this could save EPA four months of waiting time per PRIA case, millions of dollars, and prevents companies from encountering competitive disadvantages due to uncertainty and costly waiting periods in EPA’s processes.
The GenAIIC demonstrated that Amazon Bedrock can make a four-month process happen in seconds, while reducing cost by 99 percent and improving staff satisfaction. By eliminating the need for manual data entry and associated costs, the solution not only pays for itself but also helps EPA to improve its on-time performance.
The FIFRA IDP pipeline uses S3, Amazon Textract, Anthropic Claude FMs on Amazon Bedrock, Amazon Titan embeddings models, Amazon DynamoDB, and Amazon OpenSearch Service to search with Amazon Bedrock Knowledge Bases. Here is how the new FIFRA process works:
- IDP: A batch job gets documents from an S3 bucket, extracts text with Amazon Textract, and provides the guidelines in the prompt to Claude 3.7 on Amazon Bedrock.
- Vector embeddings: The DERs are converted to vector format with Amazon Titan embeddings models, and the values are stored in Amazon OpenSearch Service. This enables intelligent search capabilities for OPP scientists to ask detailed questions about company, product, and ingredient history.
- Intelligent search: Amazon Bedrock Knowledge Bases allows OPP scientists to use Retrieval Augmented Generation (RAG)-based search via a search engine. This pulls back relevant results, a summary of the answers, and hyperlinks to internal source citations. This enables OPP scientists to search the extracted text, find documents faster, and get answers to questions across the FIFRA dataset.
- HITL with large language model (LLM)-as-a-judge: EPA analyzed extraction and summarization results. OPP compared manual review and LLM-as-a-judge methods to verify accuracy of the DERs created in the first step of the process by the IDP pipeline. The LLM-as-a-judge was deemed to be more accurate, consistent, and cost effective than human review. This means EPA might consider LLM-as-a-judge for the second step to verify accuracy of the DER, which can save additional time and money.
- Seamless integration: OPP is streamlining its FIFRA workflow tool. AWS has worked with OPP to define how to make the system extendible to add IDP in a later phase of the project. In addition, the tool can be configured to use existing guidelines and any new guideline criteria and EPA rules.
This solution can help EPA to improve key metrics, including:
- 99 percent potential reduction in processing time: Tasks that previously took four months now take seconds.
- High accuracy rate: The AI-generated evaluations achieve high accuracy. The documents tested were of low- and medium-complexity batches.
- Cost-effective solution: Schoeff mentioned that this process will reduce cost of the DER creation process by about 99 percent.
Schoeff said that this was successful, and the next step is to get Amazon Bedrock approved with EPA’s IT team to be used for this use case. He hopes this will be put into production in Federal Fiscal Year 2026.
Federal government innovation with generative AI
EPA’s implementation demonstrates how U.S. federal agencies can responsibly integrate generative AI into mission-critical workflows. The key lessons from this project include:
- Start with the mission need: Focus on specific, high-impact processes where AI can provide measurable benefits.
- Implement human-in-the-loop design: Keep domain experts in control while letting AI handle repetitive tasks.
- Measure and iterate: Quantify accuracy and efficiency gains to demonstrate value and guide improvements.
- Consider cost-effectiveness: Modern AI solutions can provide substantial ROI by freeing up skilled staff for higher-value, mission-critical work, and automating tasks with IDP.
Conclusion
The EPA’s collaboration with AWS demonstrates how generative AI can transform government operations to better serve the public. By automating labor-intensive aspects of document assessments, while maintaining scientific rigor, EPA is setting a new standard for how federal agencies can responsibly leverage AI to fulfill their missions more effectively.
Protecting public health and safety is of paramount importance for the EPA. By streamlining chemical risk assessments, EPA can make faster, more informed decisions to safeguard our communities and environment.
To learn more about how AWS is shaping the future of organizations with generative AI capabilities, refer to AWS Generative AI Innovation Center.
Disclaimer: The US EPA and its employees did not contribute to the writing of this post. EPA and its employees do not endorse any commercial products, services, or enterprises.