Introducing Seekable OCI Parallel Pull mode for Amazon EKS

Containerization has transformed how customers build and deploy modern cloud native applications, offering unparalleled benefits in portability, scalability, and operational efficiency. Containers provide integrated dependency management and enable a standard distribution and deployment model for any workload. With Amazon Elastic Kubernetes Service (Amazon EKS), Kubernetes has emerged as a go-to solution for customers running large-scale containerized workloads that need to efficiently scale to meet evolving needs. However, one persistent challenge continues to impact specific deployment and scaling aspects of Kubernetes workload operations. Container image pulls, particularly when working with large and complex container images, can directly impact the responsiveness and agility of your systems. With the growth of AI/ML workloads, where we see particularly large images, this directly impacts operations as images may take several minutes to pull and prepare.

In our recent Under the Hood post for EKS Ultra Scale Clusters, we briefly touched on our evolving solution for this problem, Seekable OCI (SOCI) Parallel Pull. In this post, we’ll explain how container image pulls work and how they impact deployment and scaling operations, we’ll dive deeper into how SOCI parallel pull works, and finally show how it can help you improve image pull performance with your workloads on Amazon EKS.

Introducing Parallel Pull mode for the SOCI snapshotter

As average container image sizes have grown in recent years, container startup performance has become a critical element of modern cloud native system performance. Image pull and preparation can account for more than 75% of total startup time for new and scaling workloads. This challenge is particularly acute with the rise of AI/ML workloads on Amazon EKS. These workloads have driven significant growth in container image sizes, where images are commonly tens of gigabytes in size. Standard image pulling processes are not designed for this image size scale, and do not effectively use available system resources. This creates unnecessary delays that impact application responsiveness and resource efficiency.

Although SOCI’s existing lazy loading technology allows containers to start without downloading entire images, many users prefer upfront loading for AI/ML workloads due to the nature of these images. AI/ML containers typically need large libraries and SDKs, and are often bundled with data, model files, and extensive dependencies that are accessed immediately upon startup. This makes the complete image download inevitable regardless of lazy loading capabilities.

SOCI Parallel Pull introduces a new parallel pull mode that addresses this fundamental performance limitation by introducing configurable parallelization across both download and unpacking phases. SOCI Parallel Pull uses concurrent and memory-efficient HTTP range requests for layer downloads and parallel unpacking across layers to achieve significant improvements in image pull times on multi-core systems. In turn, this reduces the container cold start penalty that affects everything from serverless functions to batch processing workloads.

Understanding image pulls

Container image pulls consist of two distinct phases, each with unique I/O characteristics and system resource requirements. The layer fetch step involves downloading compressed image layers from a container registry over the network. Broadly adopted implementations use a single network connection to sequentially read each layer, which cannot saturate the entire network bandwidth even with the help of parallel downloads across multiple layers. This single-connection limitation creates a bottleneck where available bandwidth remains underutilized, leading to long pull times. This is especially inefficient when pulling very large image layers in high-bandwidth environments, where resources sit idle while waiting for the layer downloads to complete.

The layer unpacking step follows, where downloaded compressed layers are decompressed and extracted to form the container’s filesystem. This phase is typically CPU and disk I/O intensive, involving decompression algorithms (gzip, zstd) and numerous small file writes to construct the layered filesystem. Layers are unpacked serially one at a time. This typically leads to an I/O bottleneck, and the CPU resources of the host are underutilized by leaving multiple CPU cores idle.

The combination of these sequential operations means that neither network bandwidth during downloads nor compute resources during unpacking are fully utilized. These bottlenecks are particularly pronounced in modern cloud environments where high-bandwidth networks and multi-core instances are in use, yet image pulling operations fail to leverage these resources effectively. As a result, container startup times can negatively impact new workload deployments and cluster scaling operations, creating a fundamental mismatch between infrastructure capabilities and container runtime performance.

SOCI Parallel Pull details

SOCI Parallel Pull Mode transforms both phases of image pulls through configurable parallelization strategies. For layer fetch optimization, SOCI establishes multiple concurrent HTTP connections per layer, effectively multiplying download throughput beyond the single-connection limitation. SOCI Parallel Pull Mode splits large layers into chunks and downloads them simultaneously across multiple connections to saturate your available network bandwidth and dramatically reduce download times. This approach is particularly effective for the large layers common in AI/ML workloads, where a single layer can be several gigabytes.

For layer unpacking optimization, SOCI Parallel Pull Mode introduces parallel unpacking across multiple layers simultaneously. Instead of waiting for each layer to be fully unpacked before starting the next, SOCI uses your available CPU cores to decompress and extract multiple layers concurrently. This parallel processing approach transforms the traditionally I/O-bound unpacking phase into a CPU-optimized operation that scales with the number of available cores. The unpacking parallelization is carefully orchestrated to maintain filesystem consistency while maximizing throughput.

SOCI Parallel Pull Mode employs a dual-threshold control system with configurable parameters for both download concurrency and unpacking parallelism. This granular control provides great flexibility for you to finely tune SOCI Parallel Pull Mode’s behavior to meet your performance requirements and environment conditions.

Performance consideration

SOCI Parallel Pull Mode’s effectiveness depends on careful configuration tuning based on your image characteristics and system resources. The dual-threshold control system provides two key parameters: download concurrency (connections to the registry) and unpacking parallelism (concurrent layer processing). These settings allow you to optimize performance based on your specific infrastructure and workload requirements. Determining these settings is an important part of getting the most out of the solution for your specific use case and requirements.

Performance benefits scale with image complexity and size: Small, lightweight images see minimal improvement because traditional methods already handle them efficiently. However, larger and more complex images—particularly those common in AI/ML workloads with extensive dependencies leading to large layer sizes—experience substantial performance gains. The benefits become increasingly pronounced with these very large layers, and as layer count and overall image size grows. This makes SOCI Parallel Pull particularly valuable for containerized AI/ML applications and other workloads with large, complex images.

Resource utilization follows predictable patterns: SOCI Parallel Pull trades significantly reduced pull times for higher network, CPU, and storage utilization, using parallel processing to maximize image pull throughput. Unlike alternative approaches that buffer data in memory, SOCI writes directly to disk to maintain consistent and efficient memory usage regardless of parallelism settings. This design requires adequate storage performance to handle the increased I/O operations, making high-performance storage configurations recommended for optimal results.

SOCI Parallel Pull in action

To see SOCI Parallel Pull Mode in action, we demonstrate two deployments using an Amazon Deep Learning Container (DLC) with vLLM, with a container image size of ~10 GB. This example is best run on an instance that can support high network and storage IO requirements such as m6i.8xlarge.

To get the most impact from the solution, you should configure an optimized Amazon Elastic Block Store (Amazon EBS) volume with a throughput of 1000 MiB/s and IOPs of 16K to enable parallel pull to run effectively without being bound by storage constraints. For instances with instance store high performance NVMe disks, we recommend binding SOCI’s root dir on those NVMe disks in favor of the optimized Amazon EBS configuration.

Recent versions of the Amazon EKS Optimized AMI for Amazon Linux 2023 and Bottlerocket have SOCI Parallel Pull Mode built in with configuration support for both operating systems. Alternatively, you can install and configure SOCI from the upstream releases repository. In both operating systems, SOCI can be configured through Amazon Elastic Compute Cloud (Amazon EC2) userdata, enabling you to have different configurations per-node and workload if you choose.

Bottlerocket has added SOCI Parallel Pull configuration into their native configuration, as shown in the following example for Bottlerocket OS. For more information, go to the upstream documentation.

[settings.container-runtime]
snapshotter = "soci"
[settings.container-runtime-plugins.soci-snapshotter]
pull-mode = "parallel-pull-unpack"
[settings.container-runtime-plugins.soci-snapshotter.parallel-pull-unpack]
max-concurrent-downloads-per-image = 10
concurrent-download-chunk-size = "16mb"
max-concurrent-unpacks-per-image = 10
discard-unpacked-layers = true

# bind container resources and SOCI snapshotter root dir (/var/lib/soci-snapshotter) to instance store fast NVMe disks
[settings.bootstrap-commands.k8s-ephemeral-storage]
commands = [
    ["apiclient", "ephemeral-storage", "init"],
    ["apiclient", "ephemeral-storage" ,"bind", "--dirs", "/var/lib/containerd", "/var/lib/kubelet", "/var/log/pods", "/var/lib/soci-snapshotter"]
]
essential = true
mode = "always"

In this configuration, you are enabling SOCI Parallel Pull Mode, setting up some basic tunings for download concurrency (layer fetch step) and unpack concurrency (layer unpack step), and setting the layer download chunk size to 16 MB.

For instances that have instance store, we bind SOCI’s root dir (/var/lib/soci-snapshotter) to use those fast NVMe disks. This both increases performance and reduces cost for those instances that support it.

We can use two identical deployment workloads to compare image pull characteristics with and without SOCI Parallel Pull enabled and configured. For a full example and walkthrough using Karpenter, visit our SOCI snapshotter Karpenter blueprint.

Our baseline workload specification is as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm
  labels:
    app: vllm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vllm
  template:
    metadata:
      labels:
        app: vllm
    spec:
      securityContext:
        runAsUser: 1000
        runAsNonRoot: true
      containers:
      - name: vllm
        image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9-gpu-py312-ec2
        command: ["bash", "-c"]
        args: ["trap 'exit 0' TERM; sleep 9999 & wait"]
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
      nodeSelector:
        intent: non-soci-snapshotter
        kubernetes.io/arch: amd64
        node.kubernetes.io/instance-type: m6i.8xlarge

And our workload that uses SOCI Parallel Pull is as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-soci-br
  labels:
    app: vllm-soci-br
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vllm-soci-br
  template:
    metadata:
      labels:
        app: vllm-soci-br
    spec:
      securityContext:
        runAsUser: 1000
        runAsNonRoot: true    
      containers:
      - name: vllm
        image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9-gpu-py312-ec2
        command: ["bash", "-c"]
        args: ["trap 'exit 0' TERM; sleep 9999 & wait"]
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL        
      nodeSelector:
        intent: soci-snapshotter
        kubernetes.io/arch: amd64
        node.kubernetes.io/instance-type: m6i.8xlarge

You can review and compare the overall image pull time between the two configurations by looking at the Pulled event on each Pod.

$ kubectl describe pod/vllm-59bfb6f86c-7jh4p | grep Pulled
  Normal   Pulled            2m39s  kubelet            Successfully pulled image "763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9-gpu-py312-ec2" 
  in 1m52.876s (1m52.876s including waiting). Image size: 10778400361 bytes.

We can see that for the baseline deployment using the default containerd configuration, we observe a pull time of 1m52.876s.

$ kubectl describe pod/vllm-soci-br-74b59cc4bd-7klqx | grep Pulled
  Normal  Pulled     3m29s  kubelet            Successfully pulled image "763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9-gpu-py312-ec2" 
  in 45.121s (45.121s including waiting). Image size: 10778400361 bytes.

In comparison, when using SOCI Parallel Pull Mode we observe a pull time of 45.121s. This yields nearly 60% acceleration in pull time, resulting in nearly 150% time reduction in end-to-end image pull time.

Tuning configuration

As discussed previously, SOCI Parallel Pull Mode is controlled by several key configuration settings. Although the default values align with containerd’s standard configuration to ensure stability and safety, you can adjust these parameters to optimize performance based on your specific needs. Understanding these configuration elements and tuning them to your use case is necessary to realize the most benefit.

max_concurrent_downloads: Sets the maximum number of concurrent downloads allowed across all images. The default is -1 (unlimited).

max_concurrent_downloads_per_image: Limits the maximum concurrent downloads per individual image. The default is 3. For images hosted on Amazon Elastic Container Registry (Amazon ECR), we recommend setting this to 10–20.

concurrent_download_chunk_size: Specifies the size of each download chunk when pulling image layers in parallel. The default is empty, which uses the size of the layer. For Amazon ECR, we recommend a chunk size of “16mb”.

max_concurrent_unpacks: Sets the maximum number of concurrent layer unpacking operations system-wide. The default is -1 (unlimited).

max_concurrent_unpacks_per_image: Sets the limit for concurrent unpacking of layers per image. The default is 1. We recommend tuning this to match the average of your large layers (layers greater than 1 GB) count per image across all images that you may pull to a given node. We found that a good number for most AI/ML container images is 10.

discard_unpacked_layers: Controls whether to retain layer blobs after unpacking. Enabling this can reduce disk space usage and speed up pull times. The default is false. If you are not planning on using multiple snapshotters or pushing images from the nodes, then setting this to true is safe and beneficial.

Benchmark

We have conducted a benchmark to show how different SOCI Parallel Pull configurations impact image pull performance as well as infrastructure and resource usage.

The benchmark uses the same image from our example above, an Amazon DLCs vLLM image, 763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9-gpu-py312-ec2, which is about 10 GB in size. The benchmark was run with m6i.8xlarge instances using two different EBS volume configurations for both Bottlerocket and Amazon Linux 2023. We can see that Bottlerocket gives a slight performance improvement, because it uses a different decompression library.

Configuration shorthand:

default = containerd implementation
max_concurrent_downloads_per_image = mcdpi
max_concurrent_unpacks_per_image = mcupi
concurrent_download_chunk_size = cdcs

Getting started with SOCI Parallel Pull Mode

SOCI Parallel Pull Mode is available as part of the SOCI snapshotter for containerd. You can enable it in your Amazon EKS clusters or any other containerd-based environments by configuring the appropriate snapshotter settings. The feature integrates seamlessly with your existing container workflows while delivering substantial performance improvements for larger images.

For more information such as detailed configuration guidance and best practices, refer to the SOCI documentation, the AI/ML section of the EKS Best Practices Guide, and the Karpenter Blueprint using SOCI snapshotter parallel pull, where you can find comprehensive setup instructions and tuning recommendations for your specific use cases.

SOCI Parallel Pull represents a significant advancement and next step in container image pulling technology, addressing a critical performance bottleneck that affects modern cloud-native applications. With this solution, you can parallelize both download and efficient unpacking operations to achieve faster container startup times. This directly improves new deployment workloads and cluster scaling operations on Amazon EKS and in other containerd-based environments.

As your containerized workloads continue to grow in complexity and scale, particularly in AI/ML domains where large images are becoming the norm, SOCI Parallel Pull Mode provides the performance foundation needed to maintain application responsiveness and operational efficiency. We encourage you to explore this capability in your environments and experience the benefits of accelerated container startup firsthand.

About the authors

Jesse Butler is a Principal Product Manager for Amazon EKS, helping customers build with Kubernetes and cloud native technologies on AWS.

Henry Wang is a Senior Software Development Engineer at the Container Runtime Team. He is interested in container technologies, and an active contributor to open source projects like containerd.

Erez Zarum is a Senior Startups Solutions Architect at AWS. Erez is passionate about Containers and the AI/ML landscape, and his unique approach empowers Startups to accelerate AI/ML workloads on Amazon EKS.

Containers

Introducing Seekable OCI Parallel Pull mode for Amazon EKS

Introducing Parallel Pull mode for the SOCI snapshotter

Understanding image pulls

SOCI Parallel Pull details

Performance consideration

SOCI Parallel Pull in action

Tuning configuration

Benchmark

Getting started with SOCI Parallel Pull Mode

About the authors

Resources

Follow

Learn

Resources

Developers

Help