Building a multi-tenant FHIR server with AWS HealthLake

In today’s rapidly evolving healthcare technology landscape, managing patient data securely and efficiently is crucial. We’ll explore the implementation of multi-tenant FHIR (Fast Healthcare Interoperability Resources) solutions using Amazon Web Services (AWS) HealthLake, a robust service for healthcare organizations seeking to scale their data management capabilities.

Defining key terms

We will be diving into technical details and using terminology to outline various parameters to consider when making your decision on how to implement multi-tenancy in AWS HealthLake.

FHIR is the healthcare data standard defining both data formats and REST APIs for exchanging healthcare information.
Substitutable Medical Applications Reusable Technologies (SMART) on FHIR extends FHIR by providing the security and authorization framework for user authentication and application integration.

When we talk about cost, it will encompass both AWS service charges and development overhead. While technical complexity will refer to the engineering expertise and infrastructure needed for each multi-tenant approach. We’ll also examine how security requirements (including data isolation, access controls, and audit capabilities) can shape architectural decisions while maintaining compliance.

Overview of AWS HealthLake multi-tenancy

AWS HealthLake is a FHIR-native, fully-managed service and provides complete support for FHIR R4 transactions. It enables healthcare organizations to store, analyze, and securely share health data at scale while handling server maintenance and compliance requirements.

When working with a FHIR data store, a tenant represents an entity used to logically or physically group FHIR resources. The definition of a tenant adapts to each use case. It could represent healthcare organizations, enterprises, business units, clinics, or hospitals. Multi-tenancy builds on the concept, as a solution architecture that allows data from multiple tenants to be stored while meeting specific data isolation and access control requirements. The requirements typically include considerations around physical compared to logical data isolation, data encryption, and various access control patterns.

A key aspect of multi-tenant architecture is its ability to abstract the underlying implementation details from users of the solution. This abstraction commonly manifests as a single FHIR endpoint that serves all clients, while managing the complexity of data storage and tenant isolation behind the scenes.

Developers architecting multi-tenant solutions with AWS HealthLake must balance data isolation requirements, access control patterns, and cost considerations. The robust security features and FHIR capabilities of AWS HealthLake create a solid foundation for implementing scalable, maintainable, and compliant multi-tenant solutions tailored to specific use case requirements.

We’ll examine key considerations and best practices for building multi-tenant healthcare applications using AWS HealthLake. Our aim is to help you navigate common challenges and make informed architectural decisions when implementing these solutions.

What is an AWS HealthLake data store?

The first item to understand before discussing tenancy with AWS HealthLake is to understand what an AWS HealthLake data store actually is. An AWS HealthLake data store is a HIPAA-eligible transactional FHIR server that is fully managed by AWS and encrypts data using either an AWS Key Management Service (AWS KMS) key or your own access key system.

The data is accessible using FHIR API calls through a FHIR Base URL that is automatically provided by the service using FHIR API calls. As part of each data store, AWS HealthLake also populates the FHIR data into Apache Iceberg tables governed by AWS Lake Formation. You can then run SQL queries on the FHIR data, where each FHIR resource has its own table, using Amazon Athena, Amazon Redshift or AWS Glue.

When considering multi-tenancy, it is important to decide if you will use a unique AWS HealthLake data store for each tenant or a single AWS HealthLake data store to store all tenants. This decision may be strongly impacted by data isolation and encryption requirements, along with the overall long-term projected solution costs. These topics are discussed for each of the design patterns.

Authorization considerations: SMART compared with SigV4

When designing your multi-tenant AWS HealthLake FHIR server, you will need to consider how you will authorize user access to the data. AWS HealthLake supports two forms of authentication, SMART on FHIR and AWS Signature Version 4 (SigV4). SMART on FHIR is an industry-standard authentication and authorization protocol built on OAuth 2.0. It is designed specifically for healthcare applications to securely access FHIR resources using JSON Web Tokens (JWTs) with standardized claims and scopes.

SigV4, on the other hand, is a proprietary AWS request signing process that secures API calls across AWS services by cryptographically signing requests with access keys. This makes it the default authentication method for AWS HealthLake API access.

While SMART on FHIR is optimal for clinical applications due to its standardized approach to resource-level permissions, SigV4 is better suited for applications where you plan to handle user permissions outside of your FHIR server. The choice between these authentication methods often depends on your application’s needs. SMART on FHIR can be used for broader healthcare system compatibility and user-centric access control, while SigV4 can be used for tight AWS integration and programmatic access patterns.

Tenancy Options

Option A: Full tenant separation with different AWS KMS keys
A full tenant separation with different AWS KMS keys tenancy in AWS HealthLake is full tenant separation with uniquely encrypted data stores. This can be achieved by creating distinct AWS HealthLake data stores for each tenant or customer. Each data store is encrypted with its own unique AWS KMS key.

Advantages: This architecture confirms that protected health information (PHI), and other sensitive data from different customers, remains completely isolated. Each tenant’s data is stored in a separate FHIR-compliant data store with its own encryption boundary. The unique AWS KMS key for each data store provides an additional layer of security and compliance—even if one tenant’s key is compromised, other tenants’ data remains secure and inaccessible. There is also no risk of one tenant’s increased use of the system negatively impacting another tenant, as each tenant is isolated in their own AWS HealthLake instance.
Considerations: This architecture will cost the most to implement because each AWS HealthLake data store and AWS KMS key will add additional cost. You also need to maintain a decode or lookup table for each tenant, their respective AWS HealthLake URL, as well as their correct AWS KMS keys so that every request to your FHIR server is using the appropriate AWS resources. If you require a single AWS HealthLake FHIR URL, then you will need to put the AWS HealthLake data stores behind an Amazon API Gateway and router (explained further in the “To route or not” section).
Recommend Usage: This option is recommended for customers that require complete isolation of tenants and different encryption for each.

Figure 1 – Option A architecture of a multi-tenant AWS HealthLake

Option B: Single data store with application of logical separation
The single data store with the application of logical separation option implements multi-tenancy using a single AWS HealthLake data store that contains data for all tenants. Unlike Option A, which provides physical data separation, this approach relies on application-level logic. It enforces tenant isolation through data filtering by using AWS HealthLake-enabled FHIR tags and FHIR security labels. Access controls confirm tenants can only retrieve their own resources. All data is encrypted using the data store’s single access key.

Advantages: The primary benefit of this approach is its cost-effectiveness, particularly when managing multiple, smaller tenants. Operating a single data store significantly reduces infrastructure costs and streamlines overall management when many smaller tenants are required. Organizations benefit from reduced operational overhead and more efficient resource utilization across their tenant base, as resources are shared among all tenants rather than duplicated across multiple data stores. The advantage to this solution is the trade-off between the cost for the data store’s hourly charges compared with the data isolation achieved across multiple data stores.
Considerations: Several important factors must be weighed when evaluating this option. The use of a single data store creates a broader security risk radius compared to the physical separations of option A. All tenant data are accessible through one endpoint and protected by a shared encryption key. This architecture requires additional processing overhead for tenant-based filtering, and the performance impact will vary depending on specific use cases and data volumes. Application-level security controls and the filtering logic become especially critical in this model, as they serve as the primary mechanism for enforcing tenant isolation. You will also need to set up usage limits for each tenant to ensure no single tenant can monopolize system resources, keeping the service fair and responsive for all users.
Recommend Usage: This option is particularly well-suited for organizations managing multiple smaller tenants, where cost optimization takes priority over physical data isolation. It’s an excellent choice when the hourly data store costs of multiple instances would significantly impact operational expenses. Organizations considering this approach should have strong application-level security controls in place, and take extra care in the implementation of the filtering logic and testing. They should also be comfortable with logical, rather than physical, tenant separation. For specific cost considerations, review the current AWS HealthLake pricing details.

Figure 2 – Multi-tenant architecture with logical separation

To route or not

A router placed in front of AWS HealthLake becomes necessary in two specific scenarios:

When managing multiple FHIR AWS HealthLake data stores
When there’s a need to filter and modify FHIR responses

This router can be implemented either as an AWS Lambda function or as a container behind an Amazon API Gateway, utilizing Amazon DynamoDB as a lookup table for efficient routing decisions. The DynamoDB implementation enables dynamic routing configurations that can be updated without code changes, while also supporting sophisticated JSON filtering within the FHIR tree before client response delivery.

The FHIR request router follows this logical flow:

Receive and validate the incoming request
Extract and verify the tenant identifier from the URL
If using Option B (Application logic separation through a single data store), process based on FHIR operation type:
- Create: Add meta.tag with tenant ID for ownership or reject if tenant ID tag exists
- Batch: Process each resource in the bundle according to its specific method
- History/Read/VRead: Forward without modifications
- Update:
  - Insert tenant ID meta.tag for ownership tracking or reject if tag exist
  - Implement conditional update to verify tenant ownershi
    - For example: _tag=__TenantIDTag|123456789
- Delete:
  - Verify resource ownership through a preliminary read operation before executing delete
Route the request with its body to the appropriate HealthLake data store
Process the response, replacing HealthLake URLs with API Gateway URL
For Option B implementations, handle the response:
1. For individual resources or bundled responses:
  1. Remove resources not belonging to the requesting tenant
  2. Strip tenant ID tags from meta.tag attributes
Apply optional FHIR JSON structure filtering using allow or deny lists

The following Python code demonstrates this implementation for test environments, using HealthLake data store names for routing decisions (Note: intended for demonstration only).

def handler(event, context):
    """
    Main handler function for processing FHIR API requests through AWS HealthLake
    """
    try:
        # Validate input
        if not event or not isinstance(event, dict):
            raise ValueError("Invalid event object")
        
        required_keys = ["requestContext", "httpMethod", "path", "headers"]
        if not all(key in event for key in required_keys):
            raise ValueError("Missing required event parameters")

        # Extract request details
        full_url = construct_full_url(event)
        apigwrootdomain = event["requestContext"].get("domainName")
        method = event["httpMethod"]
        tenant_id = event["path"].split('/')[1]
        apigwstage = event["requestContext"].get("stage");

        logger.info(f"Processing request for tenant: {tenant_id}")

        # Get HealthLake endpoint
        # NOTE: This is only for demo purposes because of service limits on datastore name lookups.
        # In production, swap this section with a call to DynamoDB/another database for pulling the tenant URL.
        try:
            healthlake = boto3.client('healthlake')
            response = healthlake.list_fhir_datastores(
                Filter={'DatastoreName': tenant_id},
                MaxResults=1
            )

            if not response.get("DatastorePropertiesList"):
                raise ValueError(f"No datastore found for tenant: {tenant_id}")

            hlendpoint = response["DatastorePropertiesList"][0]["DatastoreEndpoint"]
            hlendpoint = hlendpoint.replace("https://", "").rstrip('/')

        except Exception as e:
            logger.error(f"HealthLake error: {str(e)}")
            return error_response(404, "Datastore not found")

        # Construct endpoint URL
        endpoint = f"https://{full_url.replace(apigwrootdomain, hlendpoint).replace(tenant_id, '').replace('//', '/')}"

        # If Option B, Application logic separation, prepare the request
        # here adding the tenant id tag with the appropriate logic
        # based on the request type

        # Create and send request
        try:
            request = AWSRequest(
                method=method,
                url=endpoint,
                data=event.get('body'),
                headers=event.get('headers', {})
            )
            
            session = URLLib3Session()
            response = session.send(request.prepare())
            
        except Exception as e:
            logger.error(f"Request error: {str(e)}")
            return error_response(500, "Failed to process request")

        # Process response
        try:
            filtered_response = response.text
            
            # If Option B, Application logic separation, prepare the response
            # Filter based on the requesting tenant id and store the result in 
            #     filtered_response
            # Remove the tenant id tag from the meta.tag attribute
        
            cleaned_response = response.text
            cleaned_response = cleaned_response.replace(hlendpoint, apigwrootdomain + "/" + apigwstage + "/" + tenant_id);

            return {
                'statusCode': response.status_code,
                'body': cleaned_response,
                'headers': {
                    'Content-Type': 'application/json'
                }
            }

        except Exception as e:
            logger.error(f"Response processing error: {str(e)}")
            return error_response(500, "Error processing response")

    except Exception as e:
        logger.error(f"Unexpected error: {str(e)}")
        return error_response(500, "Internal server error")

def error_response(status_code, message):
    """
    Helper function to create standardized error responses
    """
    return {
        'statusCode': status_code,
        'body': json.dumps({'error': message}),
        'headers': {
            'Content-Type': 'application/json',
            'Access-Control-Allow-Origin': '*'
        }
    }

def construct_full_url(event):
    """
    Constructs a full URL from an AWS API Gateway event
    """
    try:
        if not event.get('headers', {}).get('Host'):
            raise ValueError("Missing host in headers")

        host = event['headers']['Host']
        path = event['path'].strip('/')
        query_params = event.get('queryStringParameters', {})
        
        if query_params:
            query_string = '&'.join([f"{k}={quote_plus(str(v))}" for k, v in query_params.items()])
            return f"{host}/{path}?{query_string}"
        
        return f"{host}/{path}"

    except Exception as e:
        logger.error(f"URL construction error: {str(e)}")
        raise

Conclusion

We explored key considerations for building multi-tenant healthcare applications using AWS HealthLake, focusing on data segregation, tenant isolation, and access control patterns. We discussed strategies for implementing these principles, including the use of separate HealthLake data stores, Amazon API Gateway with router logic, and Amazon DynamoDB for efficient routing.

By following these best practices, healthcare organizations can build scalable, secure, and compliant multi-tenant applications that improve patient care and streamline operations. We encourage you to start building with AWS HealthLake today.

Contact an AWS Representative to know how we can help accelerate your business.

AWS for Industries

Building a multi-tenant FHIR server with AWS HealthLake

Defining key terms

Overview of AWS HealthLake multi-tenancy

What is an AWS HealthLake data store?

Authorization considerations: SMART compared with SigV4

Tenancy Options

To route or not

Conclusion

Further reading

Resources

Follow

Learn

Resources

Developers

Help