Abstract

Do today’s predictive models understand, or do they simulate understanding with increasing fidelity? This note maps the contemporary debate over whether next-token prediction is a sufficient basis for general intelligence. It is organised around four influential positions and the three questions on which they actually divide: reasoning, grounding, and compression. The debate is the open frontier that the connectionist paradigm reaches but does not settle.

The systems at the centre of this argument are the large language models traced in the history of Transformers: decoder-only networks trained to predict the next token. Their fluency has shifted the question from whether they are useful, which is no longer in doubt, to something harder: what kind of intelligence, if any, a system trained only to predict can possess.

Four positions

The contemporary discussion is anchored by a small number of clearly stated views. The disagreement is genuine and runs deep, because each position rests on a different account of what intelligence is.

ResearcherPositionCore claim
Geoffrey HintonDigital emergentismA sufficiently scaled predictor must develop internal representations that approximate conceptual understanding; high-fidelity prediction requires a structured internal model of the world.
Yann LeCunWorld-model realismAutoregressive token prediction is inherently limited; intelligence needs architectures (such as JEPA) that learn structured world models capable of causal reasoning and planning.
Jürgen SchmidhuberCompressionismIntelligence is optimal compression of environmental regularities; autonomy further requires intrinsic motivation and active exploration.
Emily BenderLinguistic skepticismLLMs map linguistic forms to other forms; apparent reasoning may be a statistical artefact rather than grounded semantic understanding.

Where the disagreement actually lies

Beneath the labels, the positions divide on three concrete questions.

Reasoning: emergent, or architectural?

A model can be given extra computation at inference time to refine its output before committing to it. The two camps read this differently. To the emergentist, such deliberation extends predictive modelling, letting the system explore latent conceptual paths before answering. To the structuralist, it is heuristic search layered on top of pattern recognition, and genuine reasoning would require an explicit world model able to simulate consequences rather than refine text continuations. The disagreement is whether multi-step reasoning is an emergent effect of scale or a sign of architectural insufficiency.

Grounding: text, or world?

Biological agents learn through continuous sensorimotor interaction, causal intervention, and feedback. A language model trained on a static corpus does not. The critique from grounding is that a relatively small amount of embodied, closed-loop interaction may yield deeper causal understanding than vast quantities of passively read text. The concern is not data volume but the absence of a loop between action and consequence. This sharpens into a single question: is intelligence rooted in statistical structure, or in embodied engagement with reality?

Compression: prediction as abstraction

From the standpoint of algorithmic information theory, a system that predicts data accurately must have internalised the regularities that generate it, so prediction implies compression, and compression implies abstraction. This is the strongest formal argument that a good predictor is doing more than memorising. Its limit is also clear: compression accounts for understanding the world, not for wanting anything in it. Without intrinsic curiosity, goal-directed exploration, and self-generated learning, a powerful predictor remains a passive one, which is precisely the gap that separates it from an agent.

The convergence: hybrid systems

The sharp dichotomy between prediction and structured world-modelling is already softening in practice. Multimodal foundation models integrate vision, language, and action; tool-augmented models add external memory and planning; latent-space architectures such as JEPA predict in representation space rather than over tokens; and agentic systems couple language models to reinforcement learning and environment interaction. The likely resolution is not the victory of one paradigm but an architectural synthesis in which predictive modelling, structured representation, and embodied interaction coexist, the same combinatorial turn seen across the paradigms of AI.

Two readings of the same evidence

The debate ultimately reduces to two interpretations of identical results.

The emergentist reading holds that intelligence arises gradually from sufficiently powerful predictive systems, and that at scale the distinction between simulating understanding and possessing it becomes functionally irrelevant. The structuralist reading holds that intelligence depends on specific architectural properties, causal world models, hierarchical planning, embodiment, goal-directed interaction, without which a predictor reproduces intelligent behaviour without instantiating its mechanism.

What remains open

The “Hinton versus LeCun” disagreement is not really about model scaling. It is about the nature of knowledge and representation. That predictive models are powerful is settled empirically. What is not settled is whether predictive competence alone constitutes intelligence, or whether intelligence requires structured engagement with a world. The question sits at the intersection of machine learning, cognitive science, and the philosophy of mind, and it is, for now, genuinely undecided.