AI Research SuperCluster will help build its metaverse
Meta on Monday announced its AI Research SuperCluster (RSC). Its goal is to accelerate Artificial Intelligence (AI) research by building newer and better AI models. The company hopes to use it not only to fulfill its metaverse ambitions. It also hopes to help improve the identification of harmful content, it said in a blog post.
“We can help advance research to perform downstream tasks such as identifying harmful content on our platforms as well as research into embodied AI and multimodal AI to help improve user experiences on our family of apps. We believe this is the first time performance, reliability, security, and privacy have been tackled at such a scale,” said Meta.
“RSC will help Meta’s AI researchers build better AI models that can learn from trillions of examples; work across hundreds of different languages; seamlessly analyze text, images, and video together; develop new augmented reality tools and more. Ultimately, the work done with RSC will pave the way toward building technologies for the next major computing platform — the metaverse, where AI-driven applications and products will play an important role,” said Meta.
A separate post described the compute hardware comprising the new cluster.
“RSC today comprises a total of 760 NVIDIA DGX A100 systems as its compute nodes, for a total of 6,080 GPUs — with each A100 GPU being more powerful than the V100 used in our previous system,” said Meta Technical Program Manager Kevin Lee and Software Engineer Shubho Sengupta.
They also offered details about the networking and storage infrastructure.
“Each DGX communicates via an NVIDIA Quantum 1600 Gb/s InfiniBand two-level Clos fabric that has no oversubscription. RSC’s storage tier has 175 petabytes of Pure Storage FlashArray, 46 petabytes of cache storage in Penguin Computing Altus systems, and 10 petabytes of Pure Storage FlashBlade,” they wrote.
18 months from idea to delivery – Nvidia
Nvidia separately confirmed its involvement in the project. In its blog post it offered more details about the new supercomputing installation. The company said it took 18 months from conception to delivery.
Nvidia and Meta previously collaborated, when Meta was still Facebook, on a first-generation AI high-performance cluster (HPC).
“Meta built the first generation of this infrastructure for AI research with 22,000 NVIDIA V100 Tensor Core GPUs that handles 35,000 AI training jobs a day,” said Nvidia.
Nvidia claims Meta’s early benchmarks show the RSC training natural language processing (NLP) models three times faster. Computer visi jobs run 20x faster. And they’re just getting started.
“In a second phase later this year, RSC will expand to 16,000 GPUs that Meta believes will deliver a whopping 5 exaflops of mixed precision AI performance. And Meta aims to expand RSC’s storage system to deliver up to an exabyte of data at 16 terabytes per second,” said Nvidia.