Facebook recently released the “next generation” AI model training hardware platform Zion, and also introduced two other types of computing custom ASIC (application-specific integrated circuit) chips: Kings Canyon for AI reasoning, and for film Transcoded Mount Shasta. These new designs are mainly for AI reasoning, AI training, and film transcoding. These calculations are not only a significant increase in load, switching to proprietary hardware, but also a rapidly expanding service category for Facebook.

From contemporary AI hardware to next-generation AI hardware

Facebook has long been deploying AI models for business computing on a large scale, with more than 100 trillion forecasts and more than 6 billion language translations per day. The image recognition model used by Facebook to identify and classify content is also trained using more than 3.5 billion images. Various AI-based services help users communicate daily and provide them with a unique and personalized experience.

Facebook’s self-developed AI platform FBLearner manages Facebook’s current AI model pipeline. FBLearner includes tools for storing features, managing training processes, and managing inference engines. In addition, Facebook is also based on the Open Computing Project (OCP) design hardware, which works with FBLearner to allow Facebook developers to quickly deploy models in large numbers.

After addressing the current pressing computing scale, Facebook continues to focus on R&D. The ultimate goal is to build a reliable, hard-to-find design for the future that is not only transparent to suppliers, but also continues to reflect Facebook’s discretionary design philosophy of maximizing execution efficiency. Facebook’s answer is the next generation of training and reasoning hardware platforms.

AI training with Zion

Zion is Facebook’s next-generation, high-capacity unified training platform with the goal of efficiently taking on higher computing loads. Zion was designed to efficiently handle many different neural network models such as CNN, LSTM, and sparse neural networks. The Zion platform offers high memory capacity, high bandwidth, and flexible high-speed internal connectivity, providing powerful computing power for key workloads within Facebook.

Zion is designed using Facebook’s new vendor transparent OCP acceleration model (OAM). The role of OAM is that Facebook buys hardware from vendors such as AMD, Habana, Graphcore, Intel, and Fidelity, as long as they develop hardware based on the Open Standards of Open Computing (OCP), which not only helps them innovate faster, but also Let Facebook freely expand between different hardware platforms and different servers in the same rack, only through a cabinet network switch. Even though Facebook’s AI training load is increasing and complex, the Zion platform can be extended.

Specifically, Facebook’s Zion system is divided into three parts: an eight-way CPU server, an OCP acceleration module, and a platform board that can load eight OCP acceleration modules.

The left is the modular server board. Each motherboard can be installed with 2 CPUs; the right is 4 motherboards and 8 CPUs form an eight-way server.

Left is an OCP acceleration module; 8 OCP acceleration modules are installed on one platform motherboard; right is a platform with 8 acceleration chips.


▲ Zion platform internal module wiring diagram.

The Zion platform is designed to decouple the memory, computing, and network components of the system, and each can be extended independently. The system’s eight-way CPU platform provides an oversized DDR memory pool that services jobs that require high memory capacity, such as an inline table of sparse neural networks. For dense CNN or sparse neural networks, they are more sensitive to bandwidth and computing power. Acceleration is mainly based on the OCP accelerator module wired to each CPU.

The system includes two high-speed wiring lines: one that interconnects all CPUs and the other that interconnects all accelerators. Because the accelerator has a high memory bandwidth and low memory capacity, Facebook engineers have thought of a way to make efficient use of total memory capacity: dividing models and memory, and storing frequently accessed data in accelerator memory, not often The accessed data is stored in the DDR memory of the CPU. The calculations and communication between all CPUs and accelerators are balanced and performed through high-speed and low-speed interconnect lines.

AI reasoning with Kings Canyon

Corresponding to the increasing AI training load, the AI ​​inference load is also increasing rapidly. In the next generation of designs, Facebook collaborates with companies such as Esperanto, Habana, Intel, Marvell, and Qualcomm to develop proprietary ASIC chips that are easy to scale and deploy. The Kings Canyon chip supports both INT8 (8-bit integer) calculations with emphasis on inference speed and FP16 (semi-precision floating point) calculations with higher accuracy.

Kings Canyon chips are mounted on M.2 form factor boards; six Kings Canyon chips are installed on each Glacier Point v2 motherboard; finally, two Glacier Point v2 boards and two single servers together form a complete Yosemite servo Device.

Facebook’s video transcoding ASIC chip Mount Shasta also uses this arrangement.

to sum up

According to Facebook’s illustrations and introductions, it seems that only the AI ​​training platform Zion has been used. The AI ​​inference chip Kings Canyon, the film transcoding chip Mount Shasta and related hardware have not seen the real thing. But Facebook is full of confidence in this design. In the future, they will disclose all designs and related specifications through OCP, which will facilitate wider cooperation. Facebook will work with current partners to improve the software and hardware design of the entire system