February 6, 2025

The Real Upside of DeepSeek: Cheaper Inference

DeepSeek, a company associated with High-Flyer, an $8 billion Chinese hedge fund, changed the AI narrative when it claimed OpenAI-like capabilities for a mere $6 million. The company challenged conventional thinking about artificial intelligence and its seemingly insatiable need for big, bad data centers and billions of dollars in infrastructure and chip investments.

This is not the first time a company has thought differently about infrastructure and figured out a way to do more with less. In the not-so-distant past, Juniper Networks and Google followed the same recipe and changed networking and infrastructure in the process. Will DeepSeek’s approach do the same? My bet is that it will.

DeepSeek’s approach reinforces a point often overlooked in today’s discussions: chip costs decrease by a factor of three every year, while algorithmic improvements have accelerated from doubling to about quadrupling annually. This strategy means anyone can now consider innovative architectural design and efficient training strategies, making it possible to develop high-performing AI models without relying on the most advanced and expensive hardware.

These optimizations remind us that we’re still in the early phase of the AI journey, and despite all the hoopla and hype, many opportunities to innovate remain. DeepSeek has shown that you don’t need the absolute cutting-edge, most expensive AI models for most real-world applications. Instead, what’s required are efficient, well-engineered products that can solve complex problems.

Continue reading on CrazyStupidTech.com

The broader point I make in my longer piece is about how DeepSeek allows people to think about cheaper inference. Inference is a bigger part of the AI puzzle. Chief executives of two of the largest hyper scalers, Microsoft, and Amazon made the same point in their Q4 2024 earnings calls. Here is what they said:

Andy Jassy, CEO Amazon.

Sometimes people assume that if you’re able to decrease the cost of any type of technology component—in this case, we’re really talking about inference—it will somehow lead to less total spending on technology. We have never seen that to be the case.

We saw the same pattern with cloud computing when we launched AWS in February 2006, offering S3 object storage for $0.15 per gigabyte and compute for $0.10 per hour-both of which are much lower now, many years later. At the time, people thought companies would spend a lot less money on infrastructure technology.

What actually happened was that companies spent less per unit of infrastructure, which was highly beneficial for their businesses. However, they then became excited about what else they could build-things they previously considered cost-prohibitive—and ultimately ended up spending much more in total on technology. I believe the same thing will happen with Al. The cost of inference will substantially decrease. What you heard in the last couple of weeks at DeepSeek is just one piece of this, but everyone is working on it. I believe the cost of inference will meaningfully decline, making it much easier for companies to integrate inference and generative Al into all their applications.

Satya Nadella, Microsoft CEO:

There’s Moore’s Law that’s working in hyperdrive. Then on top of that, there is the AI scaling laws, both the pre-training and the inference time compute that compound, and that’s all software. You should think of what I said in my remarks, which we have observed for a while, which is 10x on improvements per cycle just because of all the software optimizations on inference. And so that’s what you see. And then to that, I think DeepSeek has had some real innovations.

At the end of the day, if you think about it, right, what was the big lesson learned from client server to cloud, more people bought servers except it was called cloud. And so when token prices fall, infants computing prices fall, that means people can consume more, and there’ll be more apps written.

And it’s interesting to see that when I referenced these models that are pretty powerful, it’s unimaginable to think that here we are in sort of beginning of ’25 where on the PC, you can run a model that requires a pretty massive cloud infrastructure. So that type of optimizations means AI will be much more ubiquitous. And so therefore, for a hyperscaler like us, a PC platform provider like us, this is all good news as far as I’m concerned.

My piece makes the same points with much more context. You can read it on CrazyStupidTech.com.

Updated on Feb 6, 2025 @ 6.15 p.m. PST

My Reading Picks, Technology

0 comments

The Real Upside of DeepSeek: Cheaper Inference

Subscribe to discover Om’s fresh perspectives on the present and future.

Share on Mastodon