AWS for Industries

Generative AI enabled Medical Coding on AWS

Medical coding is a vital, yet intricate, process in the healthcare industry—it ensures accurate reimbursement for services rendered and is necessary for compliance with regulatory guidelines. However, traditional manual coding methods can be time-consuming, error-prone, and resource-intensive, leading to inefficiencies, bad outcomes, and potential revenue losses. With healthcare organizations processing thousands of patient encounters daily, these challenges can significantly impact financial performance, patient experience, and operational efficiency.

Provider organizations typically lose approx. $210,000 annually due to under-billing, with a study showing only 9.3 percent of eligible Medicare patients being billed for Transitional Care Management, despite 43.3 percent of the patients being qualified. Common revenue leaks stem from underestimating evaluation and management (E/M) levels and missed preventive service billing. A systematic review in BMC Primary Care (2023) confirms a substantial gap between smoking cessation service delivery and reimbursement claims. Evidence shows that while most eligible patients receive cessation services, only about one-third of these services result in submitted claims for reimbursement. This represents a significant missed revenue opportunity for healthcare providers.

How generative AI can help

Foundation models, such as large language models (LLMs), are transforming this landscape by offering unprecedented capabilities to analyze clinical documentation, interpret medical terminology, and assign appropriate codes with greater speed and accuracy than conventional methods. Amazon Web Services (AWS) provides a comprehensive suite of services that enable healthcare organizations to implement AI-powered solutions (including medical coding solutions) that can adapt to the complexity of modern healthcare environments.

We’ll explore how AWS services like Amazon Bedrock, AWS HealthLake, and AWS HealthScribe can be leveraged to build a complete medical coding solution that improves coding accuracy. These services can also accelerate the revenue cycle, reduce administrative burden on clinicians, and help verify regulatory compliance. We’ll demonstrate a practical architecture that healthcare organizations, of any size, can implement to modernize their medical coding operations.

Challenges

The process of medical coding is inherently complex, requiring a deep understanding of clinical terminology, anatomy, and medical procedures. Each patient encounter, whether routine or specialized, must be meticulously translated into a precise set of codes that accurately reflect the services provided and the patient’s diagnosis and past clinical history. The AMA, and other governing bodies, have established intricate guidelines and rules for coding, which must be strictly adhered to for compliance and reimbursement purposes.

Consider a patient presenting with chest pain, a straightforward scenario. However, the nuances involved in accurately coding this encounter are vast. Is the chest pain acute or chronic? Is it related to an underlying condition, such as coronary artery disease or gastroesophageal reflux disease (GERD)? Were diagnostic tests performed, and what were the results? These details, along with the patient’s medical history, medications, and any procedures or treatments administered, must be carefully documented, and coded precisely.

The complexity compounds when patients present with multiple conditions. A diabetic patient with heart disease and depression—an increasingly common trilogy in modern medicine—requires a sophisticated understanding of how these conditions interact and influence billing hierarchies. The AMA coding guidelines recognize ~68,000 diagnosis codes and ~8,000 procedure codes, with countless permutations between them.

Moreover, the regulatory environment adds another layer of complexity. Medicare’s ever-evolving rules, commercial payers’ varying requirements, and the constant threat of audits create a perfect storm of administrative burden. A single miscoded claim can trigger a cascade of investigations, potentially leading to accusations of fraud—even when the error was entirely unintentional.

Figure 1 shows a completed CMS 1500 health insurance claim form with patient information. The form contains various sections filled out including personal information, insurance details, and medical codes. Three key areas are highlighted with green boxes and explanatory callouts: “Diagnosis codes need to be mapped to billing codes” pointing to the diagnosis section, “Right modifiers need to be added” pointing to procedure code 97804, and “Calculate the relative value units (RVUs) for each service rendered” pointing to a value of 2. This form illustrates the complexity of medical coding and the relationship between diagnosis and billing codes that AI-enabled solutions would need to process.Figure 1 – an example CMS 1500 form, a paper claim form for billing healthcare services

Capturing these nuances falls on the shoulders of healthcare providers, who must meticulously document their clinical observations, findings, and treatment plans in the patient’s subjective, objective, assessment, and plan (SOAP) notes and electronic health record (EHR). Obtaining a comprehensive patient history, including past medical conditions, surgeries, and family history, is crucial for accurate coding and ensuring continuity of care. However, this process can be time-consuming and prone to errors, especially in high-volume healthcare settings.

Once the clinical encounter is documented, the challenge of coding begins—deriving the correct diagnosis codes (ICD-10) codes from the unstructured progress note, then mapping the diagnosis codes to the appropriate billing (CPT/HCPCS) codes, and apply critical modifiers as necessary. This information must then be accurately transcribed onto a CMS-1500 claim form (in ANSI ASC X12N 837P format) as shown in Figure 1. This ensures that every detail, from the patient’s demographics to the specific procedure codes, is correctly represented. Any discrepancies or omissions in this process can lead to rejections, delayed reimbursements, or even compliance violations.

Solution overview

Automated medical coding with LLMs on AWS involves two main workflows: the inference workflow and an optional training workflow.

Inference workflow
The inference workflow comes into play when a clinical encounter is completed, and the need for medical coding arises. This workflow orchestrates the seamless integration of various AWS services, leveraging a trained LLM model to generate accurate and compliant medical codes based on the patient’s clinical data and encounter details.

Figure 2 depicts a complete inference architecture for AI-enabled medical coding on AWS. The workflow is visualized as a five-stage process flowing from left to right: 1) Clinical Encounter (showing doctor and patient interaction with EHR), 2) Clinical Data to fast healthcare interoperability resources (FHIR) (using AWS HealthScribe to process clinical notes and AWS HealthLake to store FHIR data), 3) Automated Medical Coding (highlighted in a red box containing AWS Step Functions Workflow that orchestrates AWS Lambda and Amazon Bedrock for code generation), 4) Filled Out Medical Claim Form (showing CMS-1500 forms in ANSI ASC X12N 837P format stored in Amazon Simple Storage Service (Amazon S3)), and 5) Claim Submitted to Payor (showing final submission to the claims management system). The diagram illustrates how AWS services work together to transform clinical encounters into properly coded insurance claims.Figure 2 – Complete inference architecture for AI-enabled medical coding on AWS

The inference workflow comes into play when a clinical encounter is completed, and medical coding is required. Here is how it unfolds:

  1. Clinical encounter completed: A patient’s clinical encounter is documented in the EHR system, capturing all relevant details using AWS HealthScribe. Details include SOAP notes and structured data fields.
  2. Clinical Data to fast healthcare interoperability resources (FHIR) data standard: The EHR backend sends the clinical data in the HL7 FHIR format to AWS HealthLake, where it is persisted for further processing.
  3. Automated medical coding: An AWS Step Functions workflow involving AWS Lambda and an Amazon Bedrock inference endpoint is triggered. The workflow first queries AWS HealthLake to retrieve the patient’s comprehensive clinical history and details about the EHR encounter. The information then passes to the Amazon Bedrock inference endpoint, where the desired LLM model is activated in a zero-shot prompt. Amazon Bedrock allows access to a wide range of multiple LLMs and other foundation models through simple API calls—enabling users to choose and change between the models that are best for each use case. The model produces a completed CMS-1500 form in ANSI ASC X12N 837P format, which is saved to Amazon Simple Storage Service (Amazon S3), and the S3 object URI is sent back to asynchronously.
  4. Human review (optional): Medical coding specialists can review and validate the AI-generated codes before final submission, ensuring accuracy and compliance with coding guidelines. This human-in-the-loop validation can be configured based on organizational requirements and confidence thresholds.
  5. Completed medical claims form submission: Retrieves the completed ANSI ASC X12N 837P form from Amazon S3. The claim is then submitted to the Payor using a clearing house or AWS B2B Data Interchange.

Training workflow (optional, recommended)
Although pre-trained LLMs today show good promise, Healthcare Provider organizations who need to optimize performance and accuracy can leverage a prompt engineering (few-shot) and fine-tuning workflow. By leveraging their organization’s data, users can improve performance and accuracy to meet their unique needs, Providers with data science/machine learning expertise can achieve significantly better results.

The training workflow focuses on further fine-tuning an LLM, tailoring it specifically for the task of medical coding. This process involves acquiring and curating your organizational datasets of real-world clinical data, coding guidelines, and historical claims data. This serves as the foundation for training the model to understand the nuances of medical coding accurately. This fine-tuned model can be married with prompt engineering techniques, such as few-shot prompting.

Figure 3 illustrates the training architecture for AI-enabled medical coding on AWS, organized in three sequential phases. The diagram shows the complete workflow from data acquisition to deployment. Phase 1 (Data Acquisition) includes ingestion of longitudinal patient records (structured clinical data, unstructured SOAP notes, and past claims) and referential data sets (medical coding guidelines) into Amazon S3. Phase 2 (Model Development) showcases Amazon Bedrock, where a foundation model (for example, Amazon Nova) is selected, fine-tuned with training data, and enhanced with knowledge bases to create a specialized Medical Coding LLM. The process includes validation, quality assurance, and evaluation steps. Phase 3 (Inference Endpoint) demonstrates how the trained model is provisioned for throughput and connected to an Automated Medical Billing Inference Agent, with a feedback loop for continuous improvement based on usage. Numbered yellow circles (1-9) indicate the sequential workflow steps throughout the architecture.Figure 3 – Training architecture for AI-enabled medical coding on AWS

The training workflow focuses on developing and fine-tuning the LLM model for accurate medical coding. The process involves several steps:

  1. Data acquisition: The first step is to gather and prepare the necessary data for training the model. This includes real-world training data sets consisting of structured clinical data (HL7 FHIR), unstructured clinical data (SOAP notes), past claims submissions (837), and remittance data (835). Additionally, referential data sets (such as the AMA Coding Guidelines) are required.
  2. Model development: With the data acquired, the next step is to develop and train the LLM model. This involves ingesting the real-world training data sets and the AMA Coding Guidelines into Amazon S3. Then, using Amazon Bedrock, an LLM (such as Amazon Nova) is selected due to its ability to be fine-tuned, and a fine-tuning job is set up with the training data from Amazon S3. Amazon Bedrock Knowledge Bases are also set up with the AMA Coding Guidelines to provide managed retrieval-augmented generation (RAG). After training the model, it undergoes rigorous evaluation, validation, and quality assurance leveraging Amazon Bedrock model evaluation capabilities. Based on the results, the model may be retrained for further improvement before being provisioned in Amazon Bedrock for inference, along with leveraging prompt engineering techniques to improve model performance.
  3. Inference endpoint setup: Once the model is trained and validated, an inference endpoint is set up and integrated into the workflow. The model is then fine-tuned based on real-world usage data, providing continuous improvement and adaptation to changing coding requirements.

Solution demonstration

The true power of AI-enabled medical coding is best demonstrated through a real-world scenario. We’ll walk through a primary care use case, starting from a progress note captured by the physician during an encounter.

82-year-old female with progressive memory problems over 5-10 years, frequently forgets conversations and asks same questions repeatedly. Mother had dementia with auditory/visual hallucinations. Patient denies head trauma, strokes, dizziness, numbness, or speech/visual/muscular issues. Unable to draw clock face. No recent labs or imaging. Referred to neurology for dementia evaluation. Ordered TSH, B12, and started Aricept 5 mg qhs.

As shown in Figure 4, the solution uses AWS HealthScribe to perform entity detection, mapping out relationships between various conditions, medications and tests discussed during the encounter.

Figure 4 – AI based entity mapping of progress noteFigure 4 shows an AI-based entity mapping of a clinical progress note for dementia evaluation. The diagram displays how the AI system identifies and categorizes key clinical information against a white background. Patient demographics (82-year-old female) are identified, along with symptoms (progressive memory problems over 5-10 years, frequently forgets conversations, asks same questions repeatedly), family history (mother had dementia with auditory/visual hallucinations), and negations of conditions (no head trauma, strokes, dizziness, numbness, or speech/visual/muscular issues). It also displays clinical findings (unable to draw clock face), diagnostic workup (no recent labs or imaging, referral to neurology for dementia evaluation, ordered TSH and B12), and any medications (Aricept 5mg qhs). Each entity is color-coded and labeled with its clinical category (symptom, diagnosis, negation, test name, and more.) with relationship arrows showing connections between related entities (such as quality descriptors, frequency, and dosage information).Figure 4 – AI based entity mapping of progress note

This analysis helps the solution craft a comprehensive SOAP note, which is then fed into the LLM with a request to generate a CMS-1500 form in ANSI ASC X12N 837P format.

Following is a demonstration video, showcasing a complete workflow, from patient encounter all the way to generating a CMS-1500 form in ANSI ASC X12N 837P format with proper ICD-10 to CPT/HCPCS code mappings.

Demonstration video

Figure 5 – AWS generative AI Medical Bill Coding demonstration

Results

The implementation of AI-enabled medical coding demonstrates a successful generation of the CMS-1500 form in ANSI ASC X12N 837P format with appropriate CPT/HCPCS codes and mapped with the corresponding ICD-10 codes from the clinical encounter. Although the LLM model demonstrated good accuracy (Figure 6), we highly recommend using the afore mentioned techniques that can further improve outputs. Model fine-tuning using historical clinical data and CMS-1500 form in ANSI ASC X12N 837P format mappings has been shown to improve code selection precision. Additionally, expanding the collection of few-shot examples can strengthen the model’s pattern recognition capabilities across diverse clinical scenarios. The integration of Amazon Bedrock Knowledge Bases with referential data provides an additional layer of validation against established coding standards.

Figure 6 shows an AI-generated CMS-1500 health insurance claim form created using zero-shot prompting. The form is filled out for patient Samantha Johnson with her personal information, including address (456 Maple Lane, Oakville) and insurance ID. The medical details include a diagnosis code G30.9 (Alzheimer’s disease, unspecified) and procedure code 96125 (standardized cognitive performance testing) with a charge of $103.36. The service date is listed as 12/4/2024, and the referring provider is Brannon Mason. This example demonstrates the AI system’s capability to automatically generate a properly formatted CMS-1500 form with appropriate diagnostic and billing codes mapped correctly without requiring specific examples during training.Figure 6 – AI generated CMS-1500 form (zero-shot prompting used)

To confirm optimal performance and continuous improvement, a human-in-the-loop verification process should be implemented. This oversight mechanism validates the recommended codes and also generates valuable feedback data. The collected feedback can be systematically incorporated into the RAG system, creating a self-improving cycle that progressively enhances the LLM’s ability to generate more accurate CPT codes. This combination of automated processing and human verification strikes an effective balance between efficiency and accuracy in medical coding operations.

Conclusion

The implementation of AI-enabled medical coding using LLMs on AWS has the potential to transform revenue cycle management operations in healthcare. By leveraging advanced natural language processing capabilities, healthcare providers can streamline the medical coding process, verifying accurate and compliant billing, while reducing the administrative burden on clinical staff.

This solution presents a significant opportunity for health systems to optimize their revenue cycle management operations, prevent under-billing and over-billing scenarios, maximize reimbursements, and maintain compliance with regulatory guidelines. Additionally, the cost efficiencies achieved through AI-enabled medical coding can be reinvested into enhancing patient experiences, upgrading facilities, or investing in innovative medical technologies, benefiting both organizations and patients.

Reach out to an AWS Representative to learn more about the underlying technology and start your revenue cycle management journey today.

Further reading

Umair Khalid

Umair Khalid

Umair Khalid is a Sr. Industry Product Architect for Healthcare & Life Sciences in AWS Applied AI team, architecting solutions that transform patient care. With 15+ years in clinical workflows, patient services, and health informatics, he bridges technical capabilities with business outcomes across the HCLS value chain. Umair harnesses AI and cloud technologies to address critical healthcare challenges and improve patient outcomes.

Dr. E Hanh Le

Dr. E Hanh Le

Dr. E Hanh Le is a physician executive and product leader at Amazon, driving innovation at the intersection of healthcare and technology. With 15+ years of experience in product development across AWS for Healthcare & Life Sciences, Alexa Health & Wellness, Healthline Media, and Epocrates/Athenahealth, she leverages her background as a Primary Care physician to create impactful healthcare solutions.

Laks Sundararajan

Laks Sundararajan

Laks Sundararajan is a seasoned Enterprise Architect helping companies reset, transform and modernize their IT, digital, cloud, data and insight strategies. A proven leader with significant expertise around Generative AI, Digital, Cloud and Data/Analytics Transformation, Laks is a Sr. Solutions Architect with Healthcare and Life Sciences (HCLS).

Brannon Mason

Brannon Mason

Brannon Mason is a Sr Software Engineer in AWS Applied AI team. He works across industry verticals to unlock value-add outcomes for customers by building generative AI and agentic workflow differentiated software. Brannon sits in the sweet spot that blends product, science, and software to both prototype and scale services to production.