AWS Storage Blog

Optimizing stateful storage lifecycle on AWS with Kubernetes and Salesforce

Managing storage resources efficiently in cloud environments is a challenge for organizations of all sizes. As businesses scale their operations, they often accumulate unused storage volumes that continue to generate costs without providing value. This ‘orphaned’ storage problem is particularly acute in containerized environments, where the complexity of storage lifecycle management can lead to oversight and inefficiency. Salesforce came up with a solution for this problem after identifying millions of dollars in potential annual savings from unused storage resources within the Hyperforce infrastructure.

In this joint post from AWS and Salesforce, we demonstrate how to implement effective storage lifecycle management using AWS and Kubernetes, showcasing real-world implementations that have the potential of saving millions of dollars in cost savings. We show how to configure StatefulSet Persistent Volume Claim (PVC) auto-delete features, understand storage class policies, and implement automated clean up solutions that can transform your storage management practices. Whether you’re running a small deployment or operating at Salesforce’s scale, these practices help you maintain better control over your storage resources while making sure that critical data remains protected when needed.

The challenge of orphaned Amazon EBS volumes and cost

Prior to Kubernetes 1.27, there was no built-in mechanism to automatically delete Persistent Volume Claims (PVCs) and their associated Persistent Volumes (PVs) when deleting a StatefulSet. While Kubernetes 1.27 introduced this capability, the default behavior remains unchanged, leaving PVCs, PVs, and underlying storage resources (such as EBS volumes) orphaned after StatefulSet deletion. These orphaned resources lead to unnecessary costs for businesses. Figure 1 details a Kubernetes StatefulSet’s storage model. Each Pod of the StatefulSet connects to a unique Persistent Volume Claim, which maps to a Persistent Volume and physical storage, illustrating the persistent one-to-one relationship.

Diagram showing a Kubernetes StatefulSet's storage model. Each Pod of the StatefulSet connects to a unique Persistent Volume Claim, which maps to a Persistent Volume and then to Physical Storage, illustrating the persistent one-to-one relationship.

Figure 1: How StatefulSets manage storage

Figure 2 shows the default behavior of deleting a StatefulSet. The Persistent Volume Claims and Persistent Volumes remain, potentially leading to orphaned EBS volumes.

Diagram showing the result of deleting a Kubernetes StatefulSet until Kubernetes 1.23. The Persistent Volume Claims and Persistent Volumes remain, leading to orphaned EBS Volumes (Physical Storage)

Figure 2: Consequences of deleting StatefulSets without proper cleanup

At Salesforce, we noticed that teams were usually unaware of these orphaned volumes, leading to millions of dollars of costs from unused storage. Then, we identified ways for potentially saving in the millions annually within the Hyperforce infrastructure.

Salesforce solution

By implementing a two-pronged strategy, Salesforce is projected to save millions of dollars in orphaned storage costs in Hyperforce.

1. Custom automation (pre-Kubernetes 1.27)

Salesforce deployed our own custom platform-level solution using an automated job in our fleet, which involves logic to auto-delete orphaned PVCs periodically based on both Amazon CloudWatch and Kubernetes resource checks. This solution alone saved Salesforce millions in costs. We enabled this feature only in lower-end environments to avoid deleting any critical data in production environments without StatefulSet owner input.

Before taking this to production clusters, we considered implementing a more complex StatefulSet owner control mechanism, but instead waited for the native Kubernetes auto-delete feature that offers this fine-grained control.

2. Native Kubernetes auto-delete feature

Through close collaboration between AWS and Salesforce engineering teams, the StatefulSet PVC auto-delete feature was successfully validated during its beta phase, and is now generally available in Kubernetes 1.32 on Amazon Elastic Kubernetes Service (Amazon EKS). This new feature enables StatefulSet owners to automatically delete PVCs when a StatefulSet is deleted or scaled down.

This feature has a few limitations because it doesn’t take care of preexisting orphaned volumes, and doesn’t offer a solution if StatefulSet owners set PVC retention policy to Retain initially and post StatefulSet deletion/scale down decide the PVC is actually no longer needed. This is where a two pronged approach with a custom-built automation has been so effective.

Understand the Kubernetes auto-delete feature

The StatefulSet PVC lifecycle levers provided within Kubernetes offer different possible behaviors that are useful to Salesforce Hyperforce customers. For more information about this feature and other behaviors, see the official deep dive from Kubernetes.

1. PVCs not deleted (default)

By default, the PVC retention policy of StatefulSets mimics the default behavior of no deletions before the auto-delete feature was introduced.

persistentVolumeClaimRetentionPolicy:
   whenDeleted: Retain
   whenScaled: Retain

This behavior means that the PVCs of the StatefulSet deployments aren’t deleted even when the StatefulSet is deleted, leading to orphaned PVCs. This behavior is recommended if your PVC associated with the StatefulSet has critical data that you need regardless of whether the StatefulSet pod that claims it is present or not.

2. PVCs deleted when StatefulSet is deleted, but not when scaled down

Your PVC retention policy can be changed to automatically delete the PVCs of the StatefulSet when the StatefulSet is deleted, while keeping the PVC if the StatefulSet is scaled down.

persistentVolumeClaimRetentionPolicy:
   whenDeleted: Delete
   whenScaled: Retain

This behavior is recommended for workloads that want full clean up on deletion of StatefulSets, but want to retain PVCs in the event of scale down events that may happen during patching, autoscaling, or other use cases.

3. PVCs deleted when StatefulSet is deleted and when scaled down

Your PVC retention policy can be set to Delete such that the PVCs associated with the StatefulSet get deleted both during scale down and deletion:

persistentVolumeClaimRetentionPolicy:
   whenDeleted: Delete
   whenScaled: Delete

At Salesforce, we expect that most stateful services running on the Hyperforce platform may benefit from this behavior when the data is no longer needed beyond the lifetime of the replica. This setting can also lead to the most amount of savings if your Storage Class reclaimPolicy is set to Delete.

4. PVCs deleted when StatefulSet is scaled down, but not when deleted

Your PVC retention policy can also be changed to automatically delete the PVCs of the StatefulSet when the StatefulSet is scaled down, while keeping the PVC if the StatefulSet is deleted.

persistentVolumeClaimRetentionPolicy:
   whenDeleted: Retain
   whenScaled: Delete

Although Salesforce doesn’t foresee many internal teams using this, it might be useful for workloads that don’t want to re-use PVCs during scale up, but want the data to persist during deletion. Read this Kubernetes post for possible scenarios where this might be useful.

Implementation example

Add these settings to your StatefulSet deployment helm chart under spec.

apiVersion: apps/v1
kind: StatefulSet
metadata:
   name: statefulset1
   namespace: namespace1
spec:
   persistentVolumeClaimRetentionPolicy:
       whenDeleted: Delete
       whenScaled: Delete

Important note on Storage Classes

Although the Kubernetes auto-delete feature allows users to set PVC deletion policies during StatefulSet deletion or scale-down events, what happens to the associated PVs and EBS volumes depends on the reclaimPolicy of the Storage Class setting.

  • Delete (default): Whenever the PVC is deleted, the associated PV and EBS volume are deleted as well, leading to cost savings.
  • Retain: Whenever the PVC is deleted, the associated PV and EBS volume aren’t auto deleted, which still leads to orphaned EBS volumes, regardless of the persistentVolumeClaimRetentionPolicy

Figure 3 shows the impact of Kubernetes StorageClass and StatefulSet PersistentVolumeClaim reclaim policies. Retain in either results in orphaned volumes and Delete in both lead to cost savings.

Diagram showing the impact of Kubernetes StorageClass and StatefulSet PersistentVolumeClaim reclaim policies. 'Retain' in either results in orphaned volumes, and 'Delete' in both leads to cost savings.

Figure 3: Understand the difference between ‘Retain’ and ‘Delete’ Reclaim policies to manage costs effectively

To allow for cost savings and for facilitating automated deletion of EBS volumes, both the Storage Class reclaim policy and the persistentVolumeClaimRetentionPolicy need to be set to Delete.

Alternative solutions

Beyond Salesforce’s two-pronged solution, you could consider periodic AWS Lambda function or manual deletion.

  • Lambda function: Automate orphaned EBS volume cleanup. Read more in the AWS post.
  • Manual deletion: If you don’t expect this to happen very often or for very few clusters/workloads, then use kubectl directly in your cluster for manual PVC deletions.

FAQs

What happens to my data when a PVC is deleted?

When a PVC is deleted, whether the associated PV and data stored on the associated EBS volume are also deleted depends on your Storage Class reclaimPolicy. If reclaimPolicy: Delete, then the corresponding PV and EBS volume are deleted as well. Make sure that you have appropriate backups for any critical data before deleting a PVC.

What happens to my existing orphaned volumes after enabling this Auto-Delete feature?

Enabling this feature helps clean up any future PVCs, but doesn’t help clean up preexisting ones.

Can I use the StatefulSet Auto-Delete feature with any storage classes?

Yes, the StatefulSet auto-delete feature works with any storage class that supports dynamic provisioning.

Conclusion

The approach from Salesforce demonstrates proactive stateful storage management in Kubernetes on AWS. Combining custom automation with the native StatefulSet PVC auto-delete feature provides a way to automatically manage the lifecycle of PVCs, reducing the risk of orphaned volumes and unnecessary costs. Evaluate these strategies to optimize your deployments and save money. To learn more about Amazon Elastic Block Store (Amazon EBS), visit the Amazon EBS User Guide.

Anuj Butail

Anuj Butail

Anuj Butail is a Principal Solutions architect at AWS. He is based out of San Francisco and helps customers in San Francisco and Silicon Valley design and build large scale applications on AWS. He has expertise in the area of AWS, edge services, and containers. He enjoys playing tennis, reading, and spending time with his family.

Krishna Sarabu

Krishna Sarabu

Krishna Sarabu is a Senior Database Engineer with AWS. He focuses on containers, application modernization, infrastructure, and open-source database engines Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL. He enjoys working with users to help design, deploy, and optimize relational database workloads on AWS.

Sana Jawad

Sana Jawad

Sana has 16 years of experience and leads a global, cross-functional team building a substrate-agnostic, multi-tenant Kubernetes Platform at Salesforce. Her team delivers Kubernetes Platform-as-a-Service, allowing engineers to run microservices without managing infrastructure complexity. The platform, one of the largest in the industry, supports Salesforce's Hyperforce transformation with thousands of clusters powering products like Sales, Service, Commerce, MuleSoft, and Tableau. It emphasizes scalability, observability, and developer agility using technologies like Kubernetes, Knative, Karpenter, Terraform, Argo, and Spinnaker.

Sanya Nijhawan

Sanya Nijhawan

Sanya Nijhawan is a Senior Software Engineer at Salesforce, where she is instrumental in architecting and developing scalable solutions for the Salesforce Hyperforce compute platform, one of the largest multi-tenant Kubernetes Platform-as-a-Service offerings in the industry. With over 5 years of experience in software engineering, she has a proven track record of designing and implementing solutions that drive significant efficiency gains and millions of dollars in cost savings. Sanya holds a Bachelor's of Science in Computer Science from Yale University and is passionate about building impactful solutions.