AWS Quantum Technologies Blog
Accelerating the Quantum Toolkit for Python (QuTiP) with cuQuantum on AWS
Simulating quantum systems using classical computers remains a computational challenge. In fact, the resources required for these simulations scale exponentially with the size of the system being simulated. This fact is at the core of the motivation to build a quantum computer. Quantum computers have the potential to surpass even the most powerful supercomputers when simulating quantum systems from nature. They also are expected to provide speedup for certain complex problems, with applications ranging from cryptography to large-scale optimization. However, solving practically relevant problems will require algorithms that execute billions of operations across hundreds of thousands of qubits. Current quantum hardware is far from meeting these demands: qubit counts remain limited and error rates of the operations remain high. Scientists and engineers worldwide are working on improving the design of quantum computing components and operations to enable large-scale quantum computation.
Designing high-performance classical simulations of quantum devices is an active field of research and sits at the intersection of experimental and computational sciences. Researchers often simulate a small part of the system of interest to obtain results in a reasonable amount of time, e.g., focusing on low-excitation states and a few qubits or couplers. However, scientists are finding that a more comprehensive understanding of the physical systems is essential to improve hardware performance and reducing error rates. This requires the inclusion of additional complex physics, such as interactions with other qubits on the device or the impact of the higher-energy states of the system. But these more realistic simulations are time-consuming.
In this post, researchers from the Theory of Superconducting Quantum Devices group at the Institut Quantique, Université de Sherbrooke, NVIDIA, and Amazon Web Services (AWS), have collaborated to increase the performance of open quantum simulations with classical compute resources. The team achieved this through integrating NVIDIA cuQuantum with The Quantum Toolbox in Python (QuTiP) enabling GPU accelerated simulations of quantum device dynamics. By leveraging Amazon Elastic Compute Cloud (Amazon EC2) instances, accelerated by NVIDIA GPUs, we demonstrate increased simulation speeds of up to 4000x, making it possible to model large-scale, multi-qubit systems with improved efficiency.
The qutip-cuquantum Plugin
QuTiP is an open-source toolkit for simulating the dynamics of open quantum systems and aims to provide user-friendly and efficient numerical simulations of a wide variety of Hamiltonians, including those with arbitrary time-dependence, commonly found in a wide range of physics applications such as quantum optics, trapped ions, superconducting circuits, and quantum nanomechanical resonators. NVIDIA cuQuantum is a software development kit (SDK) of optimized libraries and tools that accelerate quantum computing emulations at both the circuit and device level.
To expand the reach of computational experiments our teams have integrated cuQuantum with QuTiP via the qutip-cuquantum plugin available on PyPi. The plugin leverages a new cuQuantum library, cuDensityMat, designed to accelerate analog quantum dynamics solvers, providing acceleration capabilities for the time evolution of Schrödinger’s equation and Lindblad Master equation. cuDensityMat provides primitives for accelerating existing dynamics frameworks, like NVIDIA CUDA-Q Dynamics and now QuTiP, and for accelerating custom-built solvers. It provides low-level functionality for defining arbitrary pure or mixed quantum states, defining many-body operators and superoperators, computing the action of operators and superoperators on the state. It also supports functionality for gradients and multi-GPU, multi-node simulations for easy scaling.
Results
To benchmark the performance of the qutip-cuquantum plugin, we simulate a superconducting transmon qubit capacitively coupled to a resonator, which is driven by a microwave pulse. The system is operated in the dispersive regime, where the qubit–resonator coupling strength is much smaller than their frequency detuning. This configuration is the standard approach for qubit measurement in superconducting processors. In principle, increasing the microwave drive amplitude can shorten measurement times. However, in practice, this often induces unwanted qubit-state transitions, exciting the system far beyond the computational subspace. Accurately capturing these effects in simulation requires considering many qubit and resonator states, which expands the size of the Hilbert space of the system, and in turn makes simulating the dynamics of such a system computationally demanding. Here we report simulation results for systems with 512 resonator states and 32 and 64 qubit states, with the full Hamiltonian constructed using QuTiP.

Figure 1 – Circuit diagram of the system simulated, where the transmon qubit (green) is capacitively coupled to the resonator (blue) with an external drive.
The team conducted simulations on AWS with P4de, P5, and P5en instances accelerated by NVIDIA A100, NVIDIA H100, and NVIDIA H200 GPUs, respectively. Benchmarks show runtime reductions for the 32 qubit states system, achieving 725x speedup on a single H200 GPU compared to CPU-only simulations on an Hpc7a instance. Leveraging multiple H200 GPUs showed additional 1.2x, 2.2x, and 3.7x speedups for 2, 4, and 8 GPUs respectively. The 64-qubit state simulations utilized 8 H200 GPUs due to memory requirements and showed a 4,000x speedup on a P5en instance with 8 GPUs. Finally, we observed improvements across GPUs, with a 1.5x and 1.9x speedup when moving from P4de to P5, and P5en instances.
These advancements open the door to studying multi-qubit dynamics and interactions involving highly excited states, which is critical for optimizing operations such as readout and two-qubit gates. With these new tools, the team is now able to explore transmon physics at scales that were previously out of reach, marking an important step toward the next generation of quantum hardware. The integration of GPU acceleration into QuTiP also ensures that these capabilities are available to the broader research community.