(versione italiana qua)
«Artificial Intelligence (AI) is going to reproduce human intelligence. AI will eliminate disease. AI is the single biggest, most important invention in human history. You've likely heard it all—but probably none of these things are true».
This is the opening of a special issue titled Hype Correction, subtitled It's time to reset expectations, published in December 2025 by the Technology Review of the Massachusetts Institute of Technology (MIT), one of the most authoritative sources on scientific and technological research in the United States and worldwide.
I had anticipated back in April 2025 that things were heading in this direction, a trajectory now confirmed by the analysis of such a prestigious publication.
You certainly remember how the launch of ChatGPT-3.5 (Generative Pre-trained Transformer) at the end of 2022 captured the world's attention, prompting both private and public entities worldwide to invest heavily in the LLM (Large Language Model) technology that underpins Generative AI (GenAI) systems. Many believed this was the path toward achieving Artificial General Intelligence (AGI), an intelligence similar to human intelligence but even more powerful, versatile, and tireless, which would free us from labor and find solutions to all our problems. However, after three years of continuous promises, the launch of ChatGPT-5 in August 2025 – perceived as merely incremental rather than revolutionary – began to make the exaggeration behind that vision increasingly evident.
In October 2025, I pointed out the first signs of a course correction already underway.
Summarizing what has emerged from various reports and studies conducted in 2025, there are four key elements that explain the current state of affairs.
- LLM-based systems are not the path to AGI. This has been stated, among others, by Yann LeCun, who served as Meta’s chief scientist until November 2025 and left the company precisely because he disagreed with its continued insistence on LLMs. Ilya Sutskever, former Chief Scientist and co-founder of OpenAI (the company behind ChatGPT), has observed that LLMs' ability to generalize – that is, to extract general principles and apply lessons learned during training to new situations – is far more limited than that of human beings. In January 2026, The Atlantic (one of the oldest and most prestigious magazines in the US) published an article declaring "Large language models don't ‘learn’ – they copy," based on work by researchers from Stanford and Yale, who succeeded in getting four of the most widely used GenAI systems to reproduce nearly entire books or very large portions of them. A scientific review by researchers from Caltech and Stanford, published in January 2026, highlighted how even the most recent models, even those presented as "capable of reasoning," actually have significant problems reasoning correctly.
- GenAI systems remain prone to hallucinations (i.e., making things up) at a rate estimated between 15% and 25% — an unacceptable level for most consequential decisions and interactions, in both personal and professional life. This is especially troubling because, unlike a human being, these systems are incapable of self-correcting through experience. Any ordinary worker may make mistakes at first but usually learns and improves. This does not and cannot happen with GenAI systems, precisely because they are based on an essentially statistical learning of language, namely on how frequently words appear near one another, and they lack causal reasoning capabilities. The special issue cited at the beginning observes how surprising it is that this approach managed to create artificial systems that produce human-like expressions when prompted with any question, but the fact that we perceive them as intelligent is our own projection. See my first two articles on the subject from March 2023 and April 2023.
- For routine tasks, GenAI systems can outperform the average person, but because they fail to deliver expert-level performance reliably in real-world contexts, they have not managed to drive meaningful productivity gains at the enterprise level. We were misled when we saw successive versions of these systems pass professional qualification exams, but it later became clear that such performance was largely due to having “memorized” all available test materials in those fields rather than to any genuine understanding of their core concepts. In the words of Andrej Karpathy (inventor of the popular term vibe coding, which we will return to in a future article), these are "versatile but shallow and error-prone" tools capable of helping ordinary people accomplish things they would otherwise need an expert for (such as getting the gist of a legal or medical document), but not easily integrated into a productive workflow.
- Certainly, the majority of people now use GenAI systems daily, both personally and professionally, but in most cases they do so free of charge, given that at least a dozen companies make them available. The upshot is that, after a cumulative $600 billion in investment between 2021 and 2025, there is still no viable business model — and this is prompting investors to rethink their positions. It is no coincidence that talk of a bubble began circulating in 2025, including from prominent industry figures such as Sundar Pichai, CEO of Alphabet (Google's parent company), in November of that year. Daron Acemoglu, 2024 Nobel laureate in Economics, analyzed the influence of the entire AI sector on the US economy through 2035 and concluded that only about 5% of tasks will be effectively performed by AI, and GDP will increase by only 1.1% to 1.8%. One capability still lacking is the ability, given a specific work situation, to reliably provide context-dependent information to solve emerging problems. Indeed, in January 2026, a Washington Post article reported that «economic data shows the technology largely has not replaced workers», and the Remote Labor Index analysis, conducted jointly by the Center for AI Safety and Scale AI, confirmed Acemoglu's predictions for now: on average, only 2.5% of jobs posted on a platform offering paid tasks to independent workers were successfully completed by leading GenAI systems. Additionally, also in January 2026, a survey by Apollo Global Management (one of the world's largest investment management firms) of CFOs (Chief Financial Officers) showed that the majority of them in 2025 «are seeing no impact from AI on labor productivity, decision-making speed, customer satisfaction or time spent on high value-added tasks».
None of this means GenAI tools are useless — far from it. They do augment our cognitive capabilities, provided we scrutinize their outputs carefully. They are extremely useful for carrying out routine tasks in areas we already master (so that we can correct any mistakes). A very recent example is their use as evaluators of the scientific rigor of theoretical computer science papers, which according to 81% of authors helped increase clarity and readability. They will certainly continue to improve, although to achieve major leaps in quality, it will be necessary to integrate them with systems based on a symbolic approach, and it is far from clear when this will happen.
There is still a long road ahead. What do you think?
--The original version (in italian) has been published by "StartMAG" on 16 February 2026.
Nessun commento:
Posta un commento
Sono pubblicati solo i commenti che rispettano le norme di legge, le regole della buona educazione e sono attinenti agli argomenti trattati: siamo aperti alla discussione, non alla polemica.