Shenzhen Gooxi Digital Intelligence Technology Co., Ltd.

NEWS & EVENTS

Insights on the Latest Trends and Evolving Market Dynamics

Current location：

Home > News > Company News > An Overview of 8-GPU Server Interconnection Technologies

An Overview of 8-GPU Server Interconnection Technologies

Release time：2024-11-06 share:

Today, one of the most popular AI server models on the market is the 8-GPU server. In practical use, these machines leverage the power of multi-GPU parallel computing to handle massive inference tasks quickly, boosting deep learning model training and inference. With exceptional graphics processing capabilities, they also support real-time rendering for gaming applications. Thanks to these strengths in AI, inference, machine learning, and cloud gaming, 8-GPU servers have become a standout choice.

Choosing Between Direct and Expanded Connection Models for 8-GPU Servers

One of the first decisions when selecting an 8-GPU server is whether to choose a direct-connection model or an expanded-connection model. Generally, 8-GPU servers come equipped with powerful motherboards and additional PCIe lanes to support simultaneous high-speed data transfer. However, the limited PCIe lanes available from the CPU may restrict communication channels for some GPUs in actual usage.

Common Types of 8-GPU Servers

I. CPU-GPU Interconnections in Standard GPU Servers

Direct Connection Model

Gooxi AMD Milan platform’s 4U 8-GPU AI server, for example, uses direct connections. Internally, it is equipped with two AMD third-generation processors offering 128 PCIe lanes each, with three XGMI connections between CPUs, providing a total of 160 PCIe lanes. With 8 double-width GPUs occupying 128 PCIe lanes, 32 lanes remain available for other components, such as network or RAID cards.

Expansion Model

Gooxi Intel Whitley platform 4U 10-GPU AI server is an example of the expanded-connection model. It utilizes two third-generation Intel® Xeon® processors, each providing 64 lanes, for a total of 128 PCIe lanes. Eight double-width GPUs typically require all 128 lanes (16x8=128), and with 10 GPUs, up to 160 lanes may be needed. Consequently, the system employs two switch chips to expand the signal capacity, enhancing the server’s PCIe scalability. This enables multiple PCIe slots for additional network or RAID cards, meeting varied user needs across complex applications.

II. Three Topology Types with Switch Connections

Switch connections offer different topology configurations: attaching the switch to one CPU, connecting switches to separate CPUs, or balancing the connections between the two. These options result in three topologies, commonly referred to as balance, common, and cascade.

Nvlink GPU-to-GPU Interconnections

Nvlink enables direct GPU-to-GPU interconnections, providing significantly higher bandwidth and lower latency than PCIe or CPU-CPU UPI connections, ideal for efficient inter-GPU communication.

HGX Super GPU Module Internal Topology

NVIDIA HGX is designed for large-scale computing with an array of eight SXM GPUs, interconnect backplanes, and NVLink switches, offering a high-bandwidth socket connection well-suited for NVIDIA DGX and HGX systems.

Ascend 8-GPU Internal Interconnections

The Ascend 8-GPU model differs from general-purpose servers with its four-CPU design. In this configuration, each CPU supports PCIe 4.0x40, connecting to two NPUs, and uses the HCCS link for high-efficiency direct connections across nodes.

III. Comparing Direct Connection and Switch-Connected Models

The 8-GPU direct connection model relies on CPU intermediation (CPU0→CPU1→GPU), which introduces some latency but offers a cost-effective solution for inference and cloud computing. The expanded-connection model, on the other hand, utilizes switch chips to boost signal transmission speed and PCIe scalability, making it ideal for scenarios where low-latency multi-GPU communication is essential, such as large-model training.

Among the numerous 8-GPU server options on the market, Gooxi AMD Milan platform 4U 8-GPU AI server strikes a balance between performance and cost. Its GPU communication efficiency can reach 17.22 GB/s, effectively enhancing large-model training speeds. Featuring third-generation AMD CPUs, it stands out for its high cost-effectiveness. (Feel free to contact us for pricing!)

[Previous] Wuhu Party Secretary Ning Bo and Delegation Visit Gooxi

[Next] What is the GPGPU, the King of AI Computing Chips?

Back to list

Related recommendations

2024-11-27

Gooxi Eagle Stream 2U Dual-Socket Server: Superior Performance, Flexible Expansion, Uncompromising Reliability

2024-11-22

Fostering Digital Innovation: Gooxi Welcomes Chenggong District Leadership for Dialogue

2024-11-12

Wuhu Party Secretary Ning Bo and Delegation Visit Gooxi

Learn more news