Softbank released test results from their experiment to demonstrate high-performance integration of GPUs and RAN over the past month. I appreciate their transparency, making the data public. Here are my initial observations:
- The performance of their cluster is impressive. They crowded 20 4T4R radios into a test area roughly 100m x 400m, and reached 1.48 Gbps peak total throughput in 100 MHz of spectrum. That’s almost 15 bps/Hz.
- Softbank reports that during the test, the centralized server (2 Grace Hopper GH200 ‘superchips’) consumed 500W of DC power on average. That’s remarkable and surprisingly low. Good job by NVIDIA, shutting down cores and other resources not used for this RAN workload.
- The ability to handle peak throughput was good. The test involved streaming HD video to 100 smartphones simultaneously. When the streaming sessions started, all smartphones were buffering at the same time, resulting in high peak throughput demand. Afterward, the buffers were full, and the throughput settled down to a much lower level. Softbank didn’t report on the DC power consumption during the peak.
Okay, so the test was a great way to show high performance in a RAN cluster. The next step is to determine “what does it mean?”. I have several thoughts here:
First of all, Softbank set this test up as a centralized RAN configuration with one central DU server and 20 remote RUs. This allowed Softbank to utilize Distributed MIMO processing, which has been known to increase spectral efficiency by 2-3x in previous products and trials. The D-MIMO aspect of performance is the most important reason for the high spectral efficiency, not the use of GPUs. I have seen other tests with spectral efficiency in the range of 12-20 bps/Hz using ASICs instead of GPUs.
If we remove the impact of D-MIMO from the spectral efficiency equation, the performance here is similar to 5G massive MIMO networks, with peak throughput around 5-8 bps/Hz. So my first conclusion is that GPUs can successfully perform at the same level of capacity as ASIC solutions or CPUs.
Second, I would point out that the power consumption reported by Softbank is the average power consumption, not the peak power required during a period of peak throughput. I suspect that there is some sleight-0f-hand here, as the two GH200 ‘superchips’ can consume up to 2kW of power. That’s still not too bad for a dense cluster of 20 radios, at only 100W per RU for the DU processing.
Supporting higher peak DC power draw means that the operators would need bigger AD/DC converters, and possibly big air conditioners or even liquid cooling.
Third, Softbank presents the low average DC power as evidence that this cluster is “cost effective”. That may be true on the OPEX side, with the surprisingly high energy efficiency at low throughput. But I have serious doubts on the CAPEX side. If I compare the cost of two GH200s to twenty ASIC-based DUs, the GPUs still look expensive at more than triple the cost.
At the end of the day, I conclude that this test is an excellent way to demonstrate high performance for a high density cluster. For a football stadium or airport, the centralized GH200 and D-MIMO could be a good choice. In that kind of crowded environment, running fiber for high numbers of radios would be possible and cost-effective, and the ultra-high density of D-MIMO drives the centralization anyway.
On the other hand, I don’t think that the GH200 approach passes the ‘low cost’ test for a widespread mobile network, especially in places where fiber is more difficult and expensive to deploy. There are also business-model and operational challenges with telcos offering AI services. I’ve published some detailed thoughts on this topic here.
For the broad market, I believe that we will see integrations of GPU cores with ASICs. No operator wants to pay for two GH200s (144 GPU cores and all of the memory and other support that comes along with them). Licensing the GPU technology and dropping one or two cores into an ASIC can be far less expensive. We can get many of the benefits of AI inferences, efficient RAN processing, and high capacity with a more surgical application of GPUs where they’re needed.