Meta is investing in their next generation of AI infrastructure, from custom chips and data centers to supercomputers and software tools.
Meta has developed MSVP (Meta Scalable Video Processor, a device that processes video data), MTIA (Meta Training and Inference Accelerator, a custom chip that runs AI programs), and RSC (Research SuperCluster, an AI supercomputer designed for AI research).
Meta is also working on a new data center design which is AI-optimized for current products and future AI workloads.
The company’s vision is to create the “metaverse”, a virtual reality where people can connect, create, and explore. To achieve this, Meta needs a powerful AI infrastructure that can handle the massive amounts of data.
Custom hardware is the best choice for Meta’s AI applications, as it provides the most compute power and efficiency for its huge scale of operations.
MSVP (Meta Scalable Video Processor)
MSVP (Meta Scalable Video Processor) is Meta’s first in-house-developed ASIC (Application-Specific Integrated Circuit) designed for video transcoding.
According to Meta, people on Facebook spend 50% of their time watching videos. Every day, more than 4 billion videos are watched on Facebook on different devices including phones, laptops, and TVs.
The videos are compressed when they are stored and sent to Facebook, and they are decompressed when someone wants to watch them.
This process uses different video coding formats including H.264, VP9, and AV1.
MSVP is scalable and programmable and can handle all these formats. It can also help to create new types of videos for Meta’s apps, including the AI-generated content.
The chip (MSVP) is controlled by a computer called the host (server) connected via a PCIe interface. This interface allows fast data transfers. The host instructs MSVP on how to handle the video streams, such as decoding, encoding, preprocessing, or transcoding them.
The chip copies the video streams from the host memory to its own LPDDR memory, which is more power-efficient and faster than the host memory. After completing the video processing tasks, MSVP sends the video streams back to the host memory.
A practical example of MSVP’s working pipeline:
- A user uploads a 4K video to Facebook or Instagram using their smartphone.
- The video is sent to a data center where it is processed by MSVP.
- MSVP encodes the video into multiple bitstreams with different formats, resolutions, and quality levels, such as H.264, H.265, AV1, 1080p, 720p, 480p, etc.
- MSVP also applies some AI-based enhancements to the video, such as noise reduction, stabilization, or face detection.
- The encoded bitstreams are stored in a distributed storage system and delivered to the viewers through a content delivery network (CDN).
- Depending on the device and network conditions of the viewers, they can choose the best bitstream for their viewing experience.
MTIA (Meta Training and Inference Accelerator)
MTIA (Meta Training and Inference Accelerator) is a custom chip family designed by Meta for running AI models on PyTorch.
MTIA, which is expected to be available in 2025, offers more memory and compute power than CPUs for content understanding, Feeds, generative AI, and ads ranking. MTIA can be used for both training and inference of AI models.
Meta will use both MTIA chips and GPUs to run its AI models, depending on the type and requirements of each model.
MTIA chips are specialized for recommendation workloads, while GPUs are more general-purpose and can handle a variety of AI workloads, such as generative AI or content understanding.
Next-Gen data center
Meta is designing a new data center that will use liquid cooling and a fast network to run current and future AI hardware.
The data center will be specifically AI-optimized for both training and inference.
Research SuperCluster (RSC) AI supercomputer
Research SuperCluster (RSC) is a supercomputer that was built to train the next generation of large AI models.
It consists of 16,000 GPUs, which are specialized chips that can perform complex calculations very fast.
The RSC AI Supercomputer has been used for research projects such as LLaMA.
To power its AI applications, Meta uses PyTorch, an AI platform co-developed with the AI community back in 2016. PyTorch is now under the Linux Foundation, and Meta will keep supporting and using it.
To enhance its developers’ efficiency, Meta created Code Compose, a generative AI–based coding assistant comparable to GitHub’s Copilot.
Conclusion
Meta is working on the next generation of AI infrastructure, such as custom chips and supercomputers, with AI being a crucial part of its long-term vision of the “metaverse”.
This strategic approach highlights Meta’s intention to offer “better personalization; safer, fairer products and richer experiences” to users and advertisers across Meta’s apps.