AI Chips in 2024: NVIDIA, MLPerf benchmarks, Huang’s law, and competition
What we learned on AI Chips in 2024 by keeping track of NVIDIA’s latest announcements, talking to industry experts, and scanning news and analyses
Exploring AI chips has been a pastime, as a well as a popular theme in Orchestrate all the Things articles. In 2023, we felt like we fell somewhat behind on that..but then again, does that matter? Doesn’t NVIDIA still reign supreme – 1 trillion valuation, more than 80% market share, H100s selling like hot bread and breaking all records and all? Well, yes, but..not so fast.
After having the chance to pick CPO of AI at HPE Evan Sparks’ brain at the AI Chips episode of our “What’s New in AI” series with O’Reilly, sit at a couple of NVIDIA’s press conferences, and scan a ton of news and analyses so you don’t have to, we have a more nuanced view to share on AI Chips in 2024. Here’s what’s going on and how it’s likely to affect AI going forward.
NVIDIA breaks MLPerf benchmark records
Let’s start with the news. Yesterday, NVIDIA announced their results from the latest MLPerf submissions. MLPerf is the de facto standard in AI workload benchmarks, and as more AI workloads emerge MLPerf keeps adding to it suite. With Generative AI taking off over the last year, MLPerf has added Gen AI workloads to its arsenal.
Having previously added a benchmark that uses a portion of the full GPT-3 data set to train a Large Language Model (LLM), the latest addition to MLPerf is a training benchmark based on the Stable Diffusion text-to-image model. NVIDIA aced both of these, as well as a few more. Intel and Google also boast big AI training gains.
NVIDIA Eos — an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking — completed a training benchmark based on a GPT-3 model with 175 billion parameters trained on one billion tokens in just 3.9 minutes.
That’s a nearly 3x gain from 10.9 minutes, the record NVIDIA set when the test was introduced less than six months ago. By extrapolation, Eos could now train that LLM in just eight days, 73x faster than a prior state-of-the-art system using 512 A100 GPUs. As for the Stable Diffusion benchmark, it took 1,024 NVIDIA Hopper architecture GPUs 2.5 minutes to complete it.
But that’s not all. As NVIDIA notes, the company was the only one to run all MLPerf tests, demonstrating the fastest performance and the greatest scaling in each of the nine benchmarks. In MLPerf HPC, a separate benchmark for AI-assisted simulations on supercomputers, H100 GPUs delivered up to twice the performance of NVIDIA A100 Tensor Core GPUs in the last HPC round.
Options for training AI models
Now, let’s start unpacking these results. The first thing to note is the various dimensions of scale. When Eos was first announced, it featured 4,608 H100s. Today, it features 10,752. But NVIDIA is not the only one to leverage Eos scale and performance.
As the company notes, a full-stack platform of innovations in accelerators, systems and software was used by both Eos and Microsoft Azure in the latest round. Azure did not submit in all categories, but in the GPT-3 benchmark where both submitted, results were practically identical. And Azure’s instance is commercially available too.
What’s more, the scaling efficiency for Eos was north of 80%. Ideally, double the number of GPUs would get twice the performance. Getting 80% of that, at this scale, is quite a feat. NVIDIA attributed this to its stack – the combination of hardware, software, and networking.
One takeaway here is that “Jensen’s Law“, the moniker used to describe the performance and scale-up that NVIDIA GPUs achieve, seems to be still in effect. But perhaps the real question is who should care, and why.
That kind of scale is not something anyone but the hyperscalers could normally handle, even if they wanted to. NVIDIA H100 GPUs are in short supply despite costing around $30K each. As the State of AI in 2023 report notes, organizations are in a stockpiling race. But there are good news as well.
First off, NVIDIA chips have remarkably long lifetime value: 5 years from launch to peak popularity. NVIDIA V100, released in 2017, is still the most commonly used chip in AI research. This suggests A100s, released in 2020, could peak in 2026 when the V100 is likely to hit its trough.
Plus, it’s questionable whether training a new Gen AI model from scratch is something most organizations will need to do. The majority of organizations will probably either only use pre-trained Gen AI models packaged under the hood to power applications, or choose to use something like ChatGPT over an API. Both of these options require exactly zero GPUs.
The flip side, of course, is that both of these options also provide zero autonomy and safety. But even for organizations that choose to develop in-house Gen AI, training something from scratch is probably not what makes the most sense for most. Taking an off-the-shelf open source Gen AI model and customizing it via fine-tuning or RAG (Retrieval Augmented Generation) is way faster and easier, and only requires a fraction of the compute.
How NVIDIA competitors may catch up
Either way, the long view here is that scaling up the way NVIDIA does makes more powerful AI models possible at a shorter time. We can expect results to trickle down, whether that means more powerful GPT-like models, open source models, or derivative applications.
But there’s another set of questions to consider here. Is NVIDIA’s dominance a good thing for the industry? Can, and should, it last? What is the competition up to? And why should the rest of the world care?
As myself and others have been noting, NVIDIA’s dominance is based not just on its hardware, but on the entirety of its stack. Furthermore, as noted by analyst Dylan Patel, NVIDIA also leverages a set of business tactics with regards to supply chain management, sales strategies and bundling which few others are able to replicate. But that does not mean that the competition is idling either.
As far as supercomputers and scaling up go, NVIDIA’s Eos is definitely not the only game in town. As Sparks mentioned, Intel’s Aurora featuring 60,000 of its own Ponte Vecchio GPUs is about to go online. Plus there are many other supercomputers in the world featuring a range of chips and architectures from different makers, and they are all capable of doing high-performance floating point arithmetic.
NVIDIA has an edge due to the fact that it was the first to focus on AI workloads, but each of its aspiring competitors has a roadmap to catch up. Until recently we used to think that CUDA, NVIDIA’s software layer, was the company’s biggest moat.
As Patel notes, many machine learning frameworks have come and gone, but most have relied heavily on leveraging NVIDIA’s CUDA and performed best on NVIDIA GPUs. However, with the arrival of PyTorch 2.0 and OpenAI’s Triton, NVIDIA’s dominant position in this field, mainly due to its software moat, is being disrupted. These frameworks make it easier for NVIDIA’s competition to build their own stack.
Of course, as Patel adds in a different note outlining NVIDIA’s own plan to stay ahead of the pack, NVIDIA isn’t sitting on their hands. While NVIDIA is extremely successful, they are also one of the most paranoid firms in the industry, with CEO Jensen Huang embodying the spirit of Andy Grove. It’s no accident that NVIDIA highlighted that its team currently employs twice as many software engineers than hardware engineers.
Success breeds complacency. Complacency breeds failure. Only the paranoid survive.
Competition, scale, performance, and TCO
Patel goes as far as to question some of NVIDIA’s tactics, which is something we don’t have an opinion on. What we can say is that even though NVIDIA’s relentlessness does not let them go complacent, having any single vendor own over 80% market share for very long is not very healthy. It will probably be a good thing for everyone to see the competition catch up.
At this point, hyperscalers, incumbent competition such as AMD and Intel as well as a flock of upstarts are all working on their own custom AI chips for 2024 and beyond. It’s estimated that NVIDIA has a 1000% margin on H100s, which are also at short supply. No wonder that everyone wants to have a piece of the action and/or grow their autonomy. For consumers, more competition will mean more choice and autonomy, as well as better performance and prices.
For the time being however, NVIDIA is still the undisputed leader – albeit with a footnote or two. When asked to directly compare NVIDIA’s MLPerf results with Intel’s Gaudi, for example, director of product marketing in the Accelerated Computing Group at NVIDIA Dave Salvator pointed out two things. First, Gaudi submissions were nowhere near the 10K scale. Second, NVIDIA results were about 2X better compared on a normalized basis. Others such as analyst Karl Freund, however, consider Gaudi2 a credible alternative.
Footnote #1: MLPerf is a widely acclaimed benchmark in the industry. Like all benchmarks, however, it’s not perfect. As Sparks noted, one crucial element missing from MLPerf is pricing. While it’s understandable that incorporating pricing in any benchmark is tricky for a number of reasons, it also means that results need to be put in context. For example, as per Patrick Kennedy’s analysis, Intel’s Gaudi2 has 4x better performance per dollar than NVIDIA’s H100.
Footnote #2: Performance alone is rarely the only metric that matters to prospective buyers. More often than not, what matters most is the performance to cost ratio: how much does it cost to perform a certain operation within a certain timeframe. To arrive at that metric, the total cost of ownership (TCO) for AI chips should be factored in. That is a complex exercise that requires deep expertise.
A big part of the TCO for AI chips is inference, i.e. the use of trained AI models in production. Training an AI model is typically a costly and complex endeavor. Inference may be simpler in comparison, but it typically constitutes the bulk of a model’s lifetime and operational cost.
Training and inference workloads have different characteristics. This means that a system that does well at training does not necessarily do equally well at inference. Case in point – when Salvator was asked to comment on Eos performance on inference, he referred attendees to future briefings. Meanwhile, people are building new systems focused on inference, while others are trying to make the most of existing ones.
NVIDIA just showcased that its leadership does not show signs of waning in the immediate future. However, that’s not necessarily a good thing for the rest of the world. The competition is there, and the chance to catch up is there too, distant as it may seem at this point. AI chips in 2024 will be something to keep an eye on. In any case, how benchmark highlights translate to actual impact, usability and TCO for organizations aspiring to develop and use AI is not linear.