Containers
Migrate to Amazon EKS: Data plane cost modeling with Karpenter and KWOK
When migrating Kubernetes clusters to Amazon Elastic Kubernetes Service (Amazon EKS), organizations typically follow three phases: assessment, mobilize, and migrate and modernize. The assessment phase involves evaluating technical feasibility for Amazon EKS workloads, analyzing current Kubernetes environments, identifying compatibility issues, estimating costs, and determining timelines with business impact considerations. During the mobilize phase, organizations create detailed migration plans, establish EKS environments with proper networking and security, train teams, and develop testing procedures. The final migrate and modernize phase involves transferring applications and data, validating functionality, implementing cloud-centered features, optimizing resources and costs, and enhancing observability to fully use AWS capabilities.
One of the most significant challenges organizations face during the process is cost estimation, which happens in the assessment phase.
Karpenter is an open source Kubernetes node autoscaler that efficiently provisions just-in-time compute resources to match workload demands. Unlike traditional autoscalers, Karpenter directly integrates with cloud providers to make intelligent, real-time decisions about instance types, availability zones, and capacity options. It evaluates pod requirements and constraints to select optimal instances, considering factors such as CPU, memory, price, and availability.
Karpenter can consolidate workloads for cost efficiency and rapidly scale from zero to handle sudden demand spikes. It supports both spot and on-demand instances, and automatically terminates nodes when they’re no longer needed, optimizing cluster resource utilization and reducing cloud costs.
Karpenter uses the concept of Providers
to interact with different infrastructure platforms for provisioning and managing compute resources. KWOK (Kubernetes WithOut Kubelet) is a toolkit that simulates data plane nodes without allocating actual infrastructure, and can be used as a provider to create lightweight testing environments that enable developers to validate provisioning decisions, try various (virtual) instance types, and debug scaling behaviors.
In an innovative approach described in this post, you can use KWOK to help simulate migrations to Amazon EKS. By using KWOK as a provider, you can observe how Karpenter would allocate resources for your workloads, without launching actual Amazon Elastic Compute Cloud (Amazon EC2) instances. This method replicates current resource requirements and scheduling patterns in a virtual environment, providing clear visibility into the types and quantities of nodes that would be needed in EKS. The simulation helps organizations build more accurate cost estimates and develop focused migration plans, while reducing infrastructure expenses during the Assessment phase.
In this blog post, we demonstrate how to mimic a Kubernetes migration to Amazon EKS using Karpenter and KWOK. By creating a test environment, backing it up, restoring it in a new EKS cluster and analyzing Karpenter’s node provisioning decisions, we show how to estimate compute costs before progressing to the mobilize and migrate and modernize phases.
Solution overview
When migrating workloads to Amazon EKS, organizations can choose from several replication strategies, each offering distinct advantages depending on your application architecture, downtime tolerance, and operational requirements.
For this demonstration, we focus on the backup and restore methodology. Our scenario consists of a source cluster posing as an existing production environment, and a destination cluster for workload migration. While real-world migrations typically target a cluster hosted either on-premises, or on another cloud provider, we use EKS for both source and destination clusters for practicality. We implement backup and restore using Velero, an open source tool designed for Kubernetes cluster backup, disaster recovery, and migration.
As shown in the preceding diagram, the process includes backing up resources from the source cluster to Amazon Simple Storage Service (Amazon S3), creating a destination cluster with Karpenter and KWOK enabled, and restoring workloads from the backup. This approach will reveal Karpenter’s node provisioning decisions in response to restored workloads, providing valuable insights into the required instance types, quantities, and configurations for production EKS migrations.
Solution walkthrough
Our solution walkthrough guides you through setting up the simulation environment, migrating a sample workload, and analyzing the results, to estimate the cost of your EKS compute resources.
Prerequisites
To use Karpenter with KWOK, you need to build a custom Karpenter image. This image must be built on an amd64
architecture to help ensure compatibility with the m5.xlarge
instances in our destination cluster’s bootstrap node group. Such node group contains core cluster components, including the Karpenter controller. Once deployed, Karpenter will dynamically provision and manage separate instances for your workloads, based on scheduling demands and node requirements. While m5
instances are used here for demonstration, in a real-world scenario, you should choose instance types that best fit your specific use case. Building on the same architecture helps ensure that the Karpenter image will function correctly in our simulated environment.
- To make things less complicated for you, we created a few scripts that will set up an EC2 instance with the proper permissions and the required dependencies:
- In a production environment, control traffic to your EC2 instance using security groups and use least-privilege permissions.
- SSH into your instance:
- Install dependencies:
Step 1: Create the source EKS cluster
Run the following commands to create the source cluster:
Step 2: Deploy an example workload
You now need to deploy a representative workload, for example you can use the Guestbook Application, from the tutorials section of the Kubernetes documentation; it’s a basic multi-tier web application, consisting of a single-instance Redis database, to store guestbook entries and multiple web frontend instances.
A summary of commands is provided here for your convenience:
- Check the deployed pods:
Expected output:
- Scale the frontend
deployment
and inspect the pods increase:
Expected output:
Step 3: Extract cluster configuration with Velero
Velero is an open source tool for backing up and restoring Kubernetes cluster resources and persistent volumes. For our estimation approach, we use Velero to capture the complete configuration of the source cluster, export all deployments, statefulsets, daemonsets, and other workload definitions, and preserve resource requests and limits that will inform our estimation. This will give us a snapshot of our cluster’s resource requirements.
Velero consists of a command line interface (CLI) for initiating backups and restores and a server component that runs as a Kubernetes controller to handle backup operations. When you run a backup command, the CLI creates a Backup
object in Kubernetes, then the server’s BackupController
validates it, queries the API server for specified resources, and uploads the collected data to object storage.
- First, create an Amazon Simple Storage Service (Amazon S3) bucket, to act as your object storage:
- Create the necessary Velero user and associated permissions to access the bucket (in a production environment, restrict privileges according to the principle of least privilege):
- Then create a file to host the credentials:
- Install the Velero CLI:
- Install the Velero server in the source cluster:
- Check the
Deployment
:
- Confirm the backup location’s
PHASE
isAvailable
:
- Trigger the backup:
- Check the backup status:
Expected output: Completed
.
Step 4: Create the destination EKS cluster
Run the following commands to create the destination cluster:
For security considerations when setting up your cluster in a production environment, see the Security in Amazon EKS guide.
Step 5: Deploy Karpenter with the KWOK provider
In this setup, you install both Karpenter and KWOK on your EKS cluster and configure Karpenter to use the KWOK provider rather than the standard Amazon EC2 provider. This integration enables Karpenter to respond to pod scheduling demands by creating virtual nodes, eliminating the need for actual EC2 instance provisioning. The approach combines these technologies to create a lightweight, cost-effective simulation environment.
- Create an image repository on Amazon Elastic Container Registry (Amazon ECR), where the Karpenter-KWOK image will be pushed to:
- KWOK is configured to provide a set of simulated instance types that mirror real EC2 instances. These instance types are defined in the Karpenter repo, in the
kwok/cloudprovider/instance_types.json
file, which contains a structured array of instance specifications:
The file contains multiple instance types, each with potentially multiple offerings that vary by pricing, availability zone, and capacity type (spot or on-demand). This configuration allows KWOK to simulate the instance diversity that Karpenter would interact with in a real environment.
In this example, you will use instance families m4
and m5
to restore your backup, so you need to update the instance_types.json
file with up-to-date information. For your convenience, we provided the get_instance_details.py
script, which builds a new JSON with up-to date details (note that it’s your responsibility to verify the accuracy of the information, particularly instance pricing, to help ensure reliable compute cost estimations).
- Run the following commands to update
instance_types.json
:
- Install the build toolchain:
- Install the KWOK controller in the destination cluster:
- Build and deploy a new version of Karpenter, with KWOK provider support:
- To tell Karpenter to use the selected instances, create a new
NodePool
and theKWOKNodeClass
(a custom resource definition, to represent KWOK virtual nodes, similar to a regular KarpenterNodeClass
):
In this example NodePool
, we use m4
and m5
instance families for demonstration purposes. In a real-life scenario, you would select instance types that best match your workload requirements, considering factors such as performance needs, cost constraints, and specific compute characteristics. See the Amazon EKS documentation for Karpenter setup best practices and specifically NodePool
configuration.
Step 6: Restore the backup
- You can now install Velero in the destination cluster:
- Check that you can access the backup from the source cluster:
Expected output:
- You should now taint all existing real nodes, to deploy the backup workload on KWOK provided nodes only:
- You’re ready to restore the backup:
Expected output:
- Check the deployed pods:
Expected output:
- Inspect the nodes:
Expected output:
You can see that Karpenter, together with KWOK, created a virtual node (named bold-haibt-162796501
in the preceding example) with instance type m5.xlarge
to support our demo application.
Note that pods are not actually running on the virtual nodes—KWOK interacts with the Kubernetes API server by creating fake node objects that appear as real nodes to the control plane. These simulated nodes report their status, capacity, and other attributes just like real nodes would, but they don’t exist as physical or virtual machines. For scheduling, KWOK allows the standard Kubernetes scheduler to function normally. When you create a pod, the scheduler assigns it to one of the fake nodes based on regular scheduling constraints (resource requests, node selectors, taints and tolerations, and so on). After being scheduled, KWOK automatically updates the pod’s status to Running
without running any containers.
You can explore resource utilization within the virtual (and real) nodes with eks-node-viewer (preinstalled on your build machine).
Run:
Expected output:
By analyzing KWOK nodes and their associated costs, you can now estimate the compute resource requirements for your workload running on Amazon EKS.
While this simulation provides valuable insights, it’s important to understand its limitations:
- It doesn’t account for network latency or I/O performance, which can impact real-world application behavior.
- The simulation might not fully replicate complex interdependencies between services in your current environment.
- It doesn’t factor in costs for managed services or data transfer that might be part of your overall Amazon EKS implementation.
- The approach assumes that your current resource allocation is optimal, which might not always be the case.
Clean up
Clean up the resources created within the build instance:
Exit the SSH session:
Terminate the build instance:
Conclusion
By combining Karpenter and KWOK, organizations can get an indication of their AWS resource requirements before committing to a full migration. This approach reduces the risk of over or under-provisioning and provides concrete data for budgeting and capacity planning.
This method represents just one example of how modern cloud-centered tools can be combined in innovative ways to solve complex migration challenges. As you plan your journey to Amazon EKS, consider incorporating this estimation technique into your assessment phase for more predictable outcomes.
About the authors
Riccardo Freschi is a Senior Solutions Architect at AWS who specializes in Modernization. He helps partners and customers transform their IT landscapes by designing and implementing modern cloud-native architectures on AWS. His focus areas include container-based applications on Kubernetes, cloud-native development, and establishing modernization strategies that drive business value.