The Latest and Greatest AI Chips
How Fast and How Much?
We’ve heard of Artificial
Intelligence (AI) and the computers required to carry out the job required. They
consume massive power but can accomplish near-miraculous data processing tasks.
But how fast and how much are values mostly unknown to most of us. Let’s look
under the hood to get some idea of what’s inside.
The latest and greatest
NVIDIA chip, the Blackwell GB200, is a key component of the NVIDIA GB200
NVL72, a multi-node, liquid-cooled, rack-scale system, which comprises 600,000
parts, including 72 GB200s. It's designed to be capable of delivering 720
petaflops* for training and 1.4 exaflops* for inferencing. (And it reportedly
weighs 3,000 pounds – about the same as a mid-sized car.) Nvidia's Blackwell AI chip is expected to cost between
$30,000 and $40,000 per unit, according to CEO Jensen Huang. This price is competitive with current Nvidia AI chips.
Nvidia's Blackwell
generation boosts power consumption even further, with the B200 consuming up to
1,200W, and the GB200 (which combines two B200 GPUs and one Grace CPU) expected
to consume 2,700W (Watts per hour). To better understand 2700W, note that the home
A/C unit normally consumes about 3000W/hr.
Conclusion. AI chips are big, heavy, fast, and consume
vast amounts of electrical power. It is no wonder that modern data centers
require their own power-generating plants (also huge).
NOTES.
1. For reference, a typical
PC CPU chip weighs about 1.0 ounces. So, the Blackwell chip weights 3000/(1/16)
= 48000 PC CPU chips. At $500 per CPU chip, this comes to about $24 million.
2. For reference, the speed
of the Blackwell is roughly 10,000 times faster than that of a typical desktop
PC. Typical supercomputers run at speeds of 100 petaflops, which is
faster than the Blackwell, but not by orders of magnitude. These speeds are almost beyond imagination – certainly mine.
3. Intel claims its Gaudi 3 chip is 1.5
times faster than the Blackwell. It should be launched later in 2024. If you
know anything about Gaudi architecture in Spain, you will be amused.
4. The new Tesla Data Center in Austin,
TX will employ up to 85,000 Blackwell chips by the end of 2024.
5. It is easy to understand why China
wants to take over Taiwan, where these chips are manufactured. And not to bomb
it to smithereens (easy), but intact. It’s easy to see why the USA (and the
rest of the world) should try to prevent this. Whoever thought chips would be
more important than bombs in warfare?
6. All the information, except for a few
comparisons, in this report is available on the Internet.
*Petaflop: a unit of computing speed equal to one
thousand million million (1015) floating-point operations
per second. An exaflop (1018) is 1000 times faster than a petaflop. A
FLOP is the time it takes to multiply floating point numbers, not just integers,
but all kinds of numbers, where an exponent is needed to indicate magnitude) in
the range from the tiniest to the largest.
P.S. When I did supercomputing some years back, chip speed was measured in GIGAFLOPS. These new chips are millions of times faster.
Comments
Post a Comment
Please Comment.