SCOTT DETROW, HOST:
I just asked ChatGPT to write an introduction to a radio segment about artificial intelligence. My prompt - write a 30-second introduction for a radio news segment. The topic of the segment - how after years of promise and sky-high expectations, there are suddenly doubts about whether the technology will hit a ceiling. Here's part of what we got.
(Reading) For years, it was hailed as the future - a game-changer destined to reshape industries, redefine daily life and break boundaries we haven't even imagined. But now the once-limitless promise of this breakthrough technology is facing new scrutiny. Experts are asking, have we hit a ceiling?
So that was ChatGPT. Handing the wheel back to humans - MIT put out a report this past week throwing cold water on the value of AI in the workplace. Consumers were disappointed by the newest version of ChatGPT released earlier this month. OpenAI CEO Sam Altman floated the idea of an AI bubble, and tech stocks took a dip.
We're going to talk about all this with Cal Newport, a contributing writer for The New Yorker and a computer science professor at Georgetown. Welcome.
CAL NEWPORT: Thanks for having me.
DETROW: Let's just start with ChatGPT in the latest version. Was it really that disappointing?
NEWPORT: It's a great piece of technology, but it was not a transformative piece of technology, and that's what we had been promised ever since GPT-4 came out, which is, the next major model was going to be the next major leap, and GPT-5 just wasn't that.
DETROW: One of the things you pointed out in your recent article is that there have been voices saying, it's not a given that it's always going to be exponential leaps, and they were really drowned out in recent years. And kind of the prevailing thinking was, of course it's always going to be leaps and bounds until we have superhuman intelligence.
NEWPORT: And the reason why they were drowned out is that we did have those leaps at first. So there was an actual curve. It came out in a paper in 2020 that showed, this is how fast these models will get better as we make them larger, and GPT-3 and GBT-4 fell right on those curves. So we had a lot of confidence in the AI industry that, yeah, if we keep getting bigger, we're going to keep moving up this very steep curve. But sometime after GPT-4, the progress fell off that curve and got a lot flatter.
DETROW: ChatGPT is the leader. It is the most high-profile of all of these models out there, so obviously, this is a big data point. But what are you looking at to get a sense of, is this just one blip, or what is the bigger picture here?
NEWPORT: This is an issue across all large language models. Essentially, the idea that simply making the model bigger and training it longer is going to make it much smarter - that has stopped working across the board. We first started noticing this around late 2023, early 2024. All of the major large language models right now has shifted to another way of getting better. They're focusing on what I call post-training improvements, which are more focused and more incremental, and all major models from all major AI companies are focused on this more incremental approach to improvement right now.
DETROW: I want to talk about that in a moment. First, I want to get your thoughts on this other big headline from recent days. This MIT report - the headline that was all over the place was,95% of generative AI pilots at companies are failing - 95%. Do you find that number surprising?
NEWPORT: I don't find that number surprising at all. What we were hoping was going to happen with AI in the workplace was the agentic revolution, which was this idea that maybe language models would get good enough that we could give them control of software, and then they could start doing lots of stuff for us in the business context. But the models aren't good enough for that. They hallucinate. They're not super reliable. They make mistakes or make odd behavior. And so these tools we've been building on top of language models - as soon as we leave very narrow applications where language models are very good, these more general business tools, they're just not very reliable yet.
DETROW: You're talking about hopes, and a lot of these companies have hopes, and a lot of investors have hopes. But there's been a lot of people who've been really freaked out about all of this, whether it means job security, whether it means some of the more, you know, high-flung, sci-fi type views of what happens down the line with AI. Do you think a slowdown is necessarily good news for people who are worried, or do you think this continues to be the focus in so many industries, and it will continue to take more and more, et cetera?
NEWPORT: I think it's good news for those who are worried about, let's say, the next five years.
DETROW: OK.
NEWPORT: I think this idea, like, Dario Amodei floated that we could have up to 20% unemployment, that we could have up to 50% of all new white-collar jobs being automated in the near future - the technology is not there, and we do not have a route for it to get there in the near future. The farther future is a different question, but I do not think those scenarios of doom we've been hearing over the last six months or so - I think right now, they're seeming unrealistic.
DETROW: You mentioned post-training before. You had a great metaphor for it, involving cars. Can you walk us through that?
NEWPORT: Well, there's two ways of improving a language model. The first way is making it bigger, training it longer. This is what's called pre-training. This is what gives you the basic capabilities of your model. Then you have this other way of improving them, which we can think of as post-training, which is a way of souping up or improving the capabilities they already have. So if pre-training gives you, like, a car, post-training soups up the car. And what has happened is we've turned our attention in the industry away from pre-training and towards post-training, so less trying to build a much better car and more focused on trying to get more performance out of the car we already have.
DETROW: How much is this leading to broad-scale rethinking of what comes next? Or is it just kind of tweaking the current approach to how these models get better and better?
NEWPORT: I think it's almost a crisis moment for AI companies because the capital expenditure required to build these massive models is astonishingly large. And in order to make a huge amount of money from these technologies, you need hugely lucrative applications. How are we going to make enough revenue to justify hundreds of billions of dollars of capital expenditures that's required to train these models?
DETROW: What does this mean in the immediate term for people who have already started to use AI in their everyday lives at work, at home? Does that continue? Do you think that we hit kind of a bubble there? Like, what comes next on the small consumer scale, do you think?
NEWPORT: I think we're going to get a lot more effort on product market fit. So instead of just having this focus on making the models bigger and bigger and maybe you just access them through a chat interface, now we're going to have to have a lot more attention on building bespoke tools on top of these foundation models for specific use cases. So I actually think the footprint in regular users' lives is going to get more useful because you might get a tool that's more custom fit for your particular job.
There's still plenty of things to be worried about. Language models, as we have them today, can do all sorts of things that are a pain. It's generating slop for the internet. It makes it much easier to have persuasive misinformation. The fraud possibilities are explosive. All of these things are negatives. But I'll probably just get some better tools in the near future as just an average user. That's not necessarily so bad.
DETROW: That is Cal Newport, author and professor of computer science at Georgetown University. Thanks for coming in.
NEWPORT: Thank you.
(SOUNDBITE OF MUSIC) Transcript provided by NPR, Copyright NPR.
NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.
 
 
 
