quarta-feira, março 12, 2025
HomeArtificial IntelligenceNot Always Bigger – O’Reilly

Not Always Bigger – O’Reilly


On May 8, O’Reilly Media will be hosting Coding with AI: The End of Software Development as We Know It—a live virtual tech conference spotlighting how AI is already supercharging developers, boosting productivity, and providing real value to their organizations. If you’re in the trenches building tomorrow’s development practices today and interested in speaking at the event, we’d love to hear from you by March 12. You can find more information and our call for presentations here. Just want to attend? Register for free here.


A few weeks ago, DeepSeek shocked the AI world by releasing DeepSeek R1, a reasoning model with performance on a par with OpenAI’s o1 and GPT-4o models. The surprise wasn’t so much that DeepSeek managed to build a good model—although, at least in the United States, many technologists haven’t taken seriously the abilities of China’s technology sector—but the estimate that the training cost for R1 was only about $5 million. That’s roughly 1/10th what it cost to train OpenAI’s most recent models. Furthermore, the cost of inference—using the model—is roughly 1/27th the cost of using OpenAI.1 That was enough to shock the stock market in the US, taking nearly $600 million from GPU chipmaker NVIDIA’s valuation.


Learn faster. Dig deeper. See farther.

DeepSeek’s licensing was surprisingly open, and that also sent shock waves through the industry: The source code and weights are under the permissive MIT License, and the developers have published a reasonably thorough paper about how the model was trained. As far as I know, this is unique among reasoning models (specifically, OpenAI’s o3, Gemini 2.0, Claude 3.7, and Alibaba’s QwQ). While the meaning of “open” for AI is under debate (for example, QwQ claims to be “open,” but Alibaba has only released relatively small parts of the model), R1 can be modified, specialized, hosted on other platforms, and built into other systems.

R1’s release has provoked a blizzard of arguments and discussions. Did DeepSeek report its costs accurately? I wouldn’t be surprised to find out that DeepSeek’s low inference cost was subsidized by the Chinese government. Did DeepSeek “steal” training data from OpenAI? Maybe; Sam Altman has said that OpenAI won’t sue DeepSeek for violating its terms of service. Altman certainly knows the PR value of hinting at “theft,” but he also knows that law and PR aren’t the same. A legal argument would be difficult, given that OpenAI’s terms of service state, “As between you and OpenAI, and to the extent permitted by applicable law, you (a) retain all ownership rights in Input and (b) own all Output. We hereby assign to you all our right, title, and interest, if any, in and to Output.” Finally, the most important question: Open source software enabled the vast software ecosystem that we now enjoy; will open AI lead to an flourishing AI ecosystem, or will it still be possible for a single vendor (or nation) to dominate? Will we have open AI or OpenAI? That’s the question we really need to answer. Meta’s Llama models have already done much to open up the AI ecosystem. Is AI now “out of the (proprietary) box,” permanently and irrevocably?

DeepSeek isn’t the only organization challenging our ideas about AI. We’re already seeing new models that were built on R1—and they were even less expensive to train. Since DeepSeek’s announcement, a research group at Berkeley released Sky-T1-32B-Preview, a small reasoning model that cost under $450 to train. It’s based on Alibaba’s Qwen2.5-32B-Instruct. Even more recently, a group of researchers released s1, a 32B reasoning model that, according to one estimate, cost only $6 to train. The developers of s1 employed a neat trick: Rather than using a large training set consisting of reasoning samples, they carefully pruned the set down to 1,000 samples and forced s1 to spend more time on each example. Pruning the training set no doubt required a lot of human work—and none of these estimates include the cost of human labor—but it suggests that the cost of training useful models is coming down, way down. Other reports claim similarly low costs for training reasoning models. That’s the point: What happens when the cost of training AI goes to near-zero? What happens when AI developers aren’t beholden to a small number of well-funded companies spending tens or hundreds of millions training proprietary models?

Furthermore, running a 32B model is well within the capabilities of a reasonably well-equipped laptop. It will spin your fans; it will be slow (minutes rather than seconds); and you’ll probably need 64 GB of RAM—but it will work. The same model will run in the cloud at a reasonable cost without specialized servers. These smaller “distilled” models can run on off-the-shelf hardware without expensive GPUs. And they can do useful work, particularly if fine-tuned for a specific application domain. Spending a little money on high-end hardware will bring response times down to the point where building and hosting custom models becomes a realistic option. The biggest bottleneck will be expertise.

We’re on the cusp of a new generation of reasoning models that are inexpensive to train and operate. DeepSeek and similar models have commoditized AI, and that has big implications. I’ve long suspected that OpenAI and the other major players have been playing an economic game. On one end of the market, they are pushing up the cost of training to keep other players from entering the market. Nothing is more discouraging than the idea that it will take tens of millions of dollars to train a model and billions of dollars to build the infrastructure necessary to operate it. On the other end, charges for using the service (inference) appear to be so low that it looks like classic “blitzscaling”: offering services below cost to buy the market, then raising prices once the competitors have been driven out. (Yes, it’s naive, but I think we all look at $60/million tokens and say, “That’s nothing.”) We’ve seen this model with services like Uber. And while we know little that’s concrete about OpenAI’s finances, everything we’ve seen suggests that they’re far from profitable2—a clear sign of blitzscaling. And if competitors can offer inference at a fraction of OpenAI’s price, raising prices to profitable levels will be impossible.

What about computing infrastructure? The US is proposing investing $500B in data centers for artificial intelligence, an amount that some commentators have compared to the US’s investment in the interstate highway system. Is more computing power necessary? I don’t want to rush to the conclusion that it isn’t necessary or advisable. But that’s a question complicated by the existence of low-cost training and inference. If the cost of building models goes down drastically, more organizations will build models; if the cost of inference goes down drastically, and that drop is reflected in consumer pricing, more people will use AI. The net result might be an increase in training and inference. That’s Jevons paradox. A reduction in the cost of a commodity may cause an increase in use large enough to increase the resources needed to produce the commodity. It’s not really a paradox when you think about it.

Jevons paradox has a big impact on what kind of data infrastructure is needed to support the growing AI industry. The best approach to building out data center technology necessarily depends on how those data centers are used. Are they supporting a small number of wealthy companies in Silicon Valley? Or are they open to a new army of software developers and software users? Are they a billionaire’s toy for achieving science fiction’s goal of human-level intelligence? Or are they designed to enable practical work that’s highly distributed, both geographically and technologically? The data centers you build so that a small number of companies can allocate millions of A100 GPUs are going to be different from the data centers you build to facilitate thousands of companies serving AI applications to millions of individual users. I fear that OpenAI, Oracle, and the US government want to build the former, when we really need more of the latter. Infrastructure as a service (IaaS) is well understood and widely accepted by enterprise IT groups. Amazon Web Services, Microsoft Azure, Google Cloud, and many smaller competitors offer hosting for AI applications. All of these—and other cloud providers—are planning to expand their capacity in anticipation of AI workloads.

Before making a massive investment in data centers, we also need to think about opportunity cost. What else could be done with half a trillion dollars? What other opportunities will we miss because of this investment? And when will the investment pay off? These are questions we don’t know how to answer yet—and probably won’t until we’re several years into the project. Whatever answers we may guess right now are made problematic by the possibility that scaling to bigger compute clusters is the wrong approach. Although it’s counterintuitive, there are good reasons to believe that training a model in logic should be easier than training it in human language. As more research groups succeed in training models quickly, and at low cost, we have to wonder whether data centers designed for inference rather than training would be a better investment. And these are not the same. If our needs for reasoning AI can be satisfied by models that can be trained for a few million dollars—and possibly much less—then grand plans for general superhuman artificial intelligence are headed in the wrong direction and will cause us to miss opportunities to build the infrastructure that’s really needed for widely available inference. The infrastructure that’s needed will allow us to build a future that’s more evenly distributed (with apologies to William Gibson). A future that includes smart devices, many of which will have intermittent connectivity or no connectivity, and applications that we are only beginning to imagine.

This is disruption—no doubt disruption that’s unevenly distributed (for the time being), but that’s the nature of disruption. This disruption undoubtedly means that we’ll see AI used more widely, both by new startups and established companies. Invencion’s Off Kilter. blog points to a new generation of “garage AI” startups, startups that aren’t dependent on eye-watering infusions of cash from venture capitalists. When AI becomes a commodity, it decouples real innovation from capital. Innovation can return to its roots as making something new, not spending lots of money. It can be about building sustainable businesses around human value rather than monetizing attention and “engagement”—a process that, we’ve seen, inevitably results in enshittification, which inherently requires Meta-like scale. It allows AI’s value to diffuse throughout society rather than remaining “already here…just not unevenly distributed yet.” The authors of Off Kilter. write:

You will not beat an anti-human Big Tech monopolist by you, too, being anti-human, for you do not have its power. Instead, you will win by being its opposite, its alternative. Where it seeks to force, you must seduce. Thus, the GarageAI firm of the future must be relentlessly pro-human in all facets, from its management style to its product experience and approach to market, if it is to succeed.

What does “relentlessly pro-human” mean? We can start by thinking about the goal of “general intelligence.” I’ve argued that none of the advances in AI have taught us what intelligence is—they’ve helped us understand what intelligence is not. Back in the 1990s, when Deep Blue beat chess champion Garry Kasparov, we learned that chess isn’t a proxy for intelligence. Chess is something that intelligent people can do, but the ability to play chess isn’t a measure of intelligence. We learned the same thing when AlphaGo beat Lee Sedol—upping the ante by playing a game with even more imposing combinatorics doesn’t fundamentally change anything. Nor does the use of reinforcement learning to train the model rather than a rule-based approach.

What distinguishes humans from machines—at least in 2025—is that humans can want to do something. Machines can’t. AlphaGo doesn’t want to play Go. Your favorite code generation engine doesn’t want to write software, nor does it feel any reward from writing software successfully. Humans want to be creative; that’s where human intelligence is grounded. Or, as William Butler Yeats wrote, “I must lie down where all the ladders start / In the foul rag and bone shop of the heart.” You may not want to be there, but that’s where creation starts—and creation is the reward.

That’s why I’m dismayed when I see someone like Mikey Shulman, founder of Suno (an AI-based music synthesis company) say, “It’s not really enjoyable to make music now. . . .It takes a lot of time, it takes a lot of practice, you need to get really good at an instrument or really good at a piece of production software. I think the majority of people don’t enjoy the majority of the time they spend making music.” Don’t get me wrong—Suno’s product is impressive, and I’m not easily impressed by attempts at music synthesis. But anyone who can say that people don’t enjoy making music or learning to play instruments has never talked to a musician. Nor have they appreciated the fact that, if people really didn’t want to play music, professional musicians would be much better paid. We wouldn’t have to say, “Don’t quit the day job,” or be paid $60 for an hour-long gig that requires two hours of driving and untold hours of preparation. The reason musicians are paid so poorly, aside from a few superstars, is that too many people want the job. The same is true for actors, painters, sculptors, novelists, poets—any creative occupation. Why does Suno want to play in this market? Because they think they can grab a share of the commoditized music market with noncommoditized (expensive) AI, with the expense of model development providing a “moat” that deters competition. Two years ago, a leaked Google document questioned whether a moat was possible for any company whose business model relied on scaling language models to even greater sizes. We’re seeing that play out now: The deep meaning of DeepSeek is that the moat represented by scaling is disappearing.

The real question for “relentlessly pro-human” AI is: What kinds of AI aid human creativity? The market for tools to help musicians create is relatively small, but it exists; plenty of musicians pay for software like Finale to help write scores. Deep Blue may not want to play chess, but its success spawned many products that people use to train themselves to play better. If AI is a relatively inexpensive commodity, the size of the market doesn’t matter; specialized products that assist humans in small markets become economically feasible.

AI-assisted programming is now widely practiced, and can give us another look at what “relentlessly human” might mean. Most software developers get their start because they enjoy the creativity: They like programming; they like making a machine do what they want it to do. With that in mind, the real metric for coding assistants isn’t the lines of code that they produce; it’s whether programming becomes more enjoyable and the products that software developers build become more usable. Taking the fun part of the job away while leaving software developers stuck with debugging and testing is a disincentive. We won’t have to worry about programmers losing their jobs; they won’t want their jobs if the creativity disappears. (We will have to worry about who will perform the drudgery of debugging if we have a shortage of well-trained software developers.) But helping developers reason about the human process they are trying to model so they can do a better job of understanding the problems they need to solve—that’s pro-human. As is eliminating the dull, boring parts that go with every job: writing boilerplate code, learning how to use libraries you will probably never need again, writing musical scores with paper and pen. The goal is to enable human creativity, not to limit or eliminate it. The goal is collaboration rather than domination.

Right now, we’re at an inflection point, a point of disruption. What comes next? What (to quote Yeats again) is “slouching towards Bethlehem”? We don’t know, but there are some conclusions that we can’t avoid:

  • There will be widespread competition among groups building AI models. Competition will be international; regulations about who can use what chip won’t stop it.
  • Models will vary greatly in size and capabilities, from a few million parameters to trillions. Many small models will only serve a single use case, but they will serve that use case very well.
  • Many of these models will be open, to one extent or another. Open source, open weights, and open data are already preventing AI from being limited to a few wealthy players.

While there are many challenges to overcome—latency being the greatest of them—small models that can be embedded in other systems will, in the long run, be more useful than massive foundation/frontier models.

The big question, then, is how these models will be used. What happens when AI diffuses through society? Will we finally get “relentlessly human” applications that enrich our lives, that enable us to be more creative? Or will we become further enmeshed in a war for our attention (and productivity) that quashes creativity by offering endless shortcuts? We’re about to find out.


Footnotes

  1. $2.19 per million output tokens for R1 versus $60 per million output tokens for OpenAI o1.
  2. $5B in losses for 2024, expected to rise to $14B in 2026 according to sacra.com.



RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments