Google announced new A3 supercomputers with Nvidia H100 GPUs as a new option for its virtual machines (VMs) to train and run the advanced artificial intelligence (AI) and machine learning (ML) models.
The A3 supercomputers combine the advanced Nvidia H100 GPUs and the state-of-the-art networking technologies to give customers the most powerful GPUs on the market. They are first to use_ GPU-to-GPU communication, bypassing the CPU_.
The A3 VMs are purpose-built for AI, being designed to train and run large language models (LLMs), generative AI, and diffusion models.
Google Cloud provides two robust VMs tailored for customers who want to develop intricate ML models without the hassle of maintenance or customization:
- the A3 supercomputers (powered by Nvidia H100 GPUs)
- the G2 VMs (previously released and powered by Nvidia L4 Tensor Core GPUs)
These VMs are not currently available for general use, but they will be ready for public release later this year.
Both of them are able to scale to tens of thousands of highly interconnected GPUs using Google’s cutting-edge networking technologies.
The A3 supercomputers employ Infrastructure Processing Units (IPUs) to facilitate direct communication between GPUs, bypassing the CPU. This significantly enhances communication speed and efficiency.
They use Google’s Jupiter network, which can adjust itself to different workloads. The network can change its speed and configuration based on the workloads. This gives the network a lot of flexibility.
In terms of computing power, they can do up to 26 exaflops of AI calculations, enabling them to train large models faster and cheaper. An exaflop is a billion billion calculations per second.
Key components
The next-generation A3 GPU supercomputers consist of:
- 8 H100 GPUs based on Nvidia’s Hopper architecture. This technology lets the GPUs perform 3x more calculations than Nvidia’s A100 GPUs).
- 3.6 terabytes/second (TB/s) bandwidth between its GPUs. This means that A3 can transfer 3.6TB/s of data between its GPUs, which is like sending thousands of movies every second.
- Latest 4th Gen Intel Xeon Scalable processors. These processors are flexible and can be easily adapted to different computational needs.
- 2TB of host memory using 4800 MHz DDR5 DIMMs. The DIMMs can send or receive data 4800 million times per second.
- 10x higher networking bandwidth enabled by Google’s IPUs (Infrastructure Processing Units). These are devices that can transfer data faster between GPUs across different servers.
How can you use the supercomputers A3
If you are a customer, you can use it in either of the following ways:
– Running it yourself on Google Compute Engine and Google Kubernetes Engine (GKE). You can choose from different configurations of A3 VMs, based on your needs and budget. You can also use the latest foundation models, while benefiting from features like autoscaling, workload orchestration, and automatic upgrades.
– Using it on Google Cloud AI Platform, as a managed service. You can access the A3 supercomputer through the AI Platform Training and Prediction services, which handle the infrastructure provisioning, scaling, and monitoring for you. You can also use the AI Platform Notebooks service to create and run Jupyter notebooks on A3 VMs.
Conclusion
The A3 supercomputer from Google Cloud is a cutting-edge computing solution for advanced ML tasks, helping users to train and use complex ML models faster and better.
If you want to experience the A3 supercomputer for your own ML projects, you can join the Preview waitlist here.
Learn more:
Release announcement: “Announcing A3 supercomputers with NVIDIA H100 GPUs, purpose-built for AI” (on Google Cloud)