From beta to breakthrough: Scaling health AI from POC to production for everyday impact

Generative AI is redefining preventive health by enabling personalized, actionable, and dynamic experiences that bridge the gap between traditional clinical and episodic advice, helping individuals make healthier living choices every day. As health systems globally expand from “sick care” to include “well care,” these emerging technologies hold immense potential to engage, motivate, and empower individuals to manage their health proactively. Despite this, transitioning AI-powered solutions from proof of concept (POC) to production is no small feat. On top of navigating technical and nontechnical challenges typical of any platform that is put into production, the process requires careful design, stakeholder and user validation, and a relentless focus on the end user’s experience founded in evidence-based behavioral health.

In this post, we discuss Health Kaki, a generative AI health companion using the Amazon Bedrock foundation model (FM), and highlight how the team navigated the challenges of scaling from concept to real-world implementation.

Health Kaki: A health companion, just for me

Health Kaki (derived from the Malay word kaki, meaning “buddy or companion”) is a generative AI platform designed to empower individuals to take control of their health choices and lifestyle through personalized digital engagement. The Health Kaki POC was codeveloped by Synapxe and Temus with support from Amazon Web Services (AWS) and input from Singapore’s Health Promotion Board (HPB) for Singapore’s Ministry of Health (MOH), to help enable HealthierSG’s health plan.

The platform harnesses the power of AI tools using Amazon Bedrock and Anthropic’s Claude 3.5 Sonnet large language model (LLM) to generate personalized diet and exercise plans from the rich resources and information across Singapore public health’s ecosystem. These plans align with users’ health goals, cultural preferences, and clinical recommendations.

What makes this so powerful is that technology today can enable near infinite permutations of choices and information to empower healthier living.

For example, users can enter the natural language prompt:

I want to do Meatless Monday today. Health Kaki, change this recommended dish from chicken to tofu…oh and adjust the cooking times and nutrition information so I know my macros are on point.

And Health Kaki produces a personalized diet plan and recommendations, as shown in the following screenshot.

Figure 1. Health Kaki’s personalized diet plan and recommendations

Or, the user can enter this prompt:

Health Kaki, I pulled a late night at work yesterday, can you find a yoga class near me? And by the way, does that class qualify for a PAssion Card discount for this class in my local Community Centre?

And Health Kaki provides a personalized plan and recommendations for exercise, as shown in the following screenshot.

Figure 2. Health Kaki’s personalized exercise plan and recommendations

The following screenshot shows an exchange of questions from a user and responses from Health Kaki on the app. The chatbot feature allows users to inquire about diet and exercise, and update their preferences.

Figure 3. Health Kaki assistant

In short, the long-sought objective of the right intervention, to the right person, at the right time is finally on the horizon with the innovations in LLM and generative AI.

Designing the solution right, from day 1

It’s tempting to jump excitedly into new technologies and start experimenting. After all, isn’t that what innovation and part of being agile is about? We agree there is value and a time and place for that. But if the true objective is to scale an AI solution from POC to production then a different approach is more likely to succeed.

Instead, before spending time building a solution or letting the current capabilities of today’s technology guide your solution, focus on really understanding the needs, current processes, and constraints to determine the essential features and functions that will make the solution effective. Consider the following factors:

Viability: One of the critical aspects of transitioning a POC to production is validating its viability, which is founded upon what problem you’re solving. That is, how big is the problem and how does the solution address the underlying cause of the issue? The market is rife with digital apps and platforms offering generic health and disease management solutions. The vast majority have failed, sometimes quite spectacularly. This is even more important in the context of LLM and generative AI, where costs to develop, run, and operate have fewer precedents. Costs increase significantly as a product scales and individual engagement rises.
Evaluation and testing: LLMs and generative AI are emerging and evolving at a speed we haven’t seen in the past. Therefore, a clear understanding of system performance under varying conditions is required—and those conditions will change as the innovation progresses from POC to proof of value to production. Health Kaki used scenario testing, metric-based evaluations, and iterative validation to refine its features and verify reliability. Benchmarks like faithfulness scores and answer relevancy provided actionable insights into model outputs. With each advancement towards production, the level of rigor and types of testing evolves and increases. Only with this nuanced approach, rather than a one-size-fits-all testing for safety, can the solution strike the right balance between innovation and safety.
Audience: Move beyond out-of-date concepts of users and user acceptance testing (UAT). Consider who is really using the solution. Technologists need to appreciate that, in health decision-making and the resulting actions, there isn’t a singular user. For example, clinicians can make health recommendations (such as to reduce sugar intake), patients can agree that is an important goal, an insurer or government can fund the service, but a caregiver could be the one that decides what a family eats for dinner. Newer technologies might also elicit fear of the unknown or safety among any of those stakeholders, which also needs to be considered. Therefore, a digital solution in health must embrace these complex and potentially seemingly contradictory points of view in evaluation.

To amplify that complexity, in user testing, patients might say that they don’t need this level of detail on how to reduce sugar intake. But, when they are unexpectedly diagnosed with an acute health condition or several health conditions, they could change their minds. Our health and life situations change and therefore so does what users value and need.

Overcoming technical challenges

Like with any innovation, there are a multitude of challenges, known unknowns and, more importantly, unknown unknowns. In developing Health Kaki, some of the technical challenges, listed below, were addressed by employing advanced prompting techniques and integrating a contextualized knowledge base. Rigorous model evaluations using benchmarks like faithfulness scores and large model systems organization (LMSYS) leaderboards showed Anthropic’s Claude 3.5 Sonnet model consistently delivered reliable and context-aware outputs.

The team faced numerous technical challenges when developing Health Kaki, including:

Data quality and diversity: A cornerstone of successful AI scaling is maintaining the availability of high-quality, diverse datasets that reflect the population’s nuances. For Health Kaki, this meant addressing Singapore’s unique cultural and dietary landscape, including halal and vegetarian dietary habits, traditional Chinese medicine influences, and varied exercise preferences. To tackle this, the team employed a hybrid human-AI approach. Data from reliable sources like the Health Promotion Board was enriched with metadata using LLMs. Human experts validated and refined this data for cultural relevance and contextual accuracy. This rigorous process laid the foundation for generating tailored recommendations that resonated with users.
Infrastructure and experience design: Deploying AI solutions at scale demands a careful balance between robust infrastructure and seamless user experience. For example, generative AI systems, while powerful, can introduce computational delays that can impact real-time responsiveness. Health Kaki overcame this by adopting progressive loading strategies. These strategies provided users with engaging intermediate content while personalized outputs were being processed. Extensive user testing validated this approach, with participants appreciating the thoughtful UX design and clear progress indicators that minimized perceived wait times.
Tools that balance accuracy and scalability: Choosing the right models and tools is essential. The demonstrated ability of Anthropic’s Claude 3.5 Sonnet to balance personalization, scalability, and accuracy made it a cornerstone of Health Kaki’s architecture. Metrics such as ease of use, processing speed, and flexibility were analyzed to understand the models’ ability to handle evolving requirements and integrate seamlessly into the Health Kaki platform.
Consistency in personalization: For generative AI to build trust, recommendations must be both highly personalized and consistently accurate. Maintaining this balance across interactions posed a significant challenge for Health Kaki.

Guardrails

Implementing robust guardrails was a critical aspect of the Health Kaki project. These guardrails serve as essential safeguards, helping AI-generated health recommendations remain appropriate, safe, and trustworthy. The Health Kaki team approached this challenge by collaborating closely with subject matter experts to define a comprehensive set of parameters. These guardrails encompass various aspects, including dietary considerations, exercise guidelines, health condition precautions, and lifestyle factors. By integrating these safeguards into the core of the recommendation engine, Health Kaki can deliver personalized wellness plans that are not only engaging, but also align with each user’s unique health profile and needs. This approach demonstrates a commitment to responsible AI deployment in health technology, balancing innovation with user safety and overall efficacy.

From POC to impact: A blueprint for success

Scaling AI healthcare solutions demands more than technical prowess; it requires a comprehensive approach that integrates user-centric design, strong engineering capabilities, and iterative validation. Health Kaki’s success in navigating these complexities underscores the importance of collaboration, careful planning, and a relentless focus on the complex web of user needs.

As the project moves forward, the insights and learnings from continuous user validation will continue to shape the future development and expansion of the platform. The team’s focus on scalability, modular design, and continuous improvement positions Health Kaki to evolve alongside emerging AI technologies, ultimately driving an inclusive solution that can inspire and empower a wide demographic of residents to take steps every day towards living healthier and happier.

Try Amazon Bedrock today and learn how to get started building your own generative AI application.

AWS Public Sector Blog

From beta to breakthrough: Scaling health AI from POC to production for everyday impact

Health Kaki: A health companion, just for me

Designing the solution right, from day 1

Overcoming technical challenges

Guardrails

From POC to impact: A blueprint for success

Resources

Follow

Learn

Resources

Developers

Help