Current location:
Home > News > Company News > Battle of Interconnection Protocols: Data Centers, HPC, AI, and More
The rapid advancement of Generative AI has exponentially increased the demand for computational power and storage. This further intensifies the existing contradiction between computing and storage, calling for greater capacity in data transfer between them — the AIGC era demands larger bandwidth and faster data transmission paths.
PCI Express (PCIe) is widely known as the most common high-performance I/O communication protocol. However, constrained by PCIe's tree-like topology and limited device identification number range, it fails to form a large-scale network. Particularly with the widespread adoption of NVMe, which consumes a significant amount of PCIe lanes, the already limited channels become even more strained. Despite PCIe switches being used to alleviate the shortage of channels, they don't provide a permanent solution to the limited device ID numbers.
Furthermore, PCIe's design has two fatal flaws: memory address space isolation and lack of support for Cache Coherency transactions. Originally, PCIe was designed with a private address space separate from the CPU's address space, requiring address translation registers for base address translation. Although this doesn't affect data access between the CPU and PCIe, the lack of support for Cache Coherency transactions at the PCIe transaction layer prevents PCIe devices from caching data from the CPU's address domain, resulting in latency during data communication.
To address these issues, Intel, in collaboration with the industry, introduced the Compute Express Link (CXL) technology protocol in 2019 to accelerate interconnect communication between CPUs, GPUs, FPGAs, and other heterogeneous structures. In essence, CXL, based on PCIe technology, achieves device-to-CPU interconnectivity by mounting devices on the PCIe bus. CXL can be seen as an upgraded version of PCIe, thus it's compatible with existing PCIe ports on processors (most general-purpose CPUs, GPUs, and FPGAs). By separating computation and storage to form a memory pool, CXL dynamically allocates memory resources on demand, enhancing data center efficiency. As a newly emerged technology, CXL undergoes updates almost annually.
Building upon the technology foundation of CXL 1.0, CXL 2.0 introduces a crucial feature: successful memory resource pooling. While the issue of communication bottleneck has been alleviated with the introduction of NVMe drives, throughput remains a significant limitation, hence unable to fully replace memory. With the increasing demand for high-speed I/O in AI/ML and other fields, pooling becomes the optimal choice. The architecture of CXL 2.0 supports Memory sharing technology, which breaks the restriction of a physical memory belonging to a single server, enabling multiple machines to access the same memory address, thus facilitating resource sharing across system devices. Currently, CXL has been upgraded to version 3.0, doubling the bandwidth and supporting more complex connection topologies, such as connecting multiple switches to achieve interconnection and memory sharing among hundreds of servers.
Gen-Z, on the other hand, is another dominator in data center, HPC, AI, and other scenarios, apart from CXL interconnect technology. Gen-Z's emergence primarily aims to address the rack-level hierarchy outside the server nodes, long-distance transmission, and large-scale topology interconnection scenarios' deficiencies. It's worth noting that in 2022, the Gen-Z Consortium agreed to adopt the CXL technology protocol, achieving protocol compatibility between the two consortia.
NVIDIA has also introduced its independently developed NVLink technology, which provides high bandwidth and is suitable for connecting NVIDIA GPUs. NVLink also supports memory sharing between GPUs, optimizing the performance of large-scale parallel computing with lower latency in communication between GPUs. NVLink can support both CPU-GPU and GPU-GPU links. In addition, NVIDIA has developed its NVLink Switch, supporting up to 16 GPUs + NVLink Switch, albeit at a high price.
CXL excels in computation-related data processing, such as data centers, artificial intelligence, and scientific computing applications, offering higher flexibility and performance, while NVLink is mainly used for connecting NVIDIA GPUs, performing excellently in graphics processing and deep learning fields.
Initially, to bridge the memory gap between CPUs and devices, and between devices themselves, IBM pioneered the Coherent Accelerator Processor Interface (CAPI) interface. However, due to IBM's low share in data center devices and waning influence, CAPI did not see widespread use, later evolving into OpenCAPI. Then ARM joined with another open access and I/O network platform (CCIX) Cache Coherent Interconnect for Accelerators. In summary, in terms of development and release timeline: CAPI->GenZ->CCIX->NVLink->CXL.
Although the journey to solve the communication bottleneck between processors and memory is endless, we can foresee that with the development of CXL technology and the complete pooling of memory resources, the appearance of servers will undergo fundamental changes in the near future, with storage and processors being separated and housed in independent enclosures.
Related recommendations
Learn more newsLeading Provider of Server Solutions
YouTube