Can Machines Dream of New Mathematics?
An Expedition into the Mind of Humans, Logic, and Language Models
In a world of pure formalism, no real king would have risen by blood or sword — but by solving equations.
Before the advent of computation, at the beginning of 16th century, aristocrats wielded algebraic cunning like knights wielded swords. One such duel: find two numbers whose sum and product are both 2. A small riddle, easily expressible:
Simple substitutions reveal the quadratic:
But here, mathematics balks. The discriminant (b^2 - 4ac) is negative.
What, pray, is the square root of -1?
To the most brilliant ancient minds, this was an abomination; to rigorous thought, an impossibility.
Instead, Renaissance minds chose rebellion: they imagined a solution. They posited an entity, i, satisfying:
In a single act of creative defiance, they birthed complex numbers — a conceptual leap across the void where formal deduction feared to tread.
Thus, complex numbers were not deduced — they were conjured. Summoned not by logical inevitability, but by an imaginative leap across an abyss where no rule could tread.
Mathematics, in that moment, revealed its true nature: it is not the passive following of laws, but the active forging of new realms where none existed.
Mathematics was not just calculation. It was creation.
The pattern is actually recursive.
When Euclid built geometry upon axiomatic pillars, he believed the edifice eternal. A few centuries later, Riemann and Lobachevsky shattered one postulate — and with it, the tyranny of flatness. In one stroke, parallel lines met, space bent, and "truth" revealed itself as contingent, relative, pliable.
Mathematics evolves not by obedience, but by insurrection.
And so the central question emerges: if the essence of mathematics is creative rebellion, can machines — born to execute rules — ever become mathematicians?
A machine can solve:
It is capable to announce that
But,
would it dare invent i without precedent?
Would it dare imagine spaces where logic itself collapses and reforms?
The limits of any formal system, Gödel taught us, lie not in failure but in fecundity: every consistent system is incomplete, inviting transcendence.
Human mathematicians do not merely discover new theorems; they invent new worlds.
True intelligence — whether silicon or flesh — must not only follow rules, but try to create them, and succeed in doing so.
Not merely solve problems, but pose impossible ones — and then, against all formal reason, leap across.
Mathematics is not a monument to certainty. It is an eternal rebellion against the silence of the void.
And here lies our starting point: if human mathematics stems from rebellion against the rules, can machines — built to follow rules — ever truly replace mathematicians?
The Machinery of Thought: Logic, Layers, and Limitations
On the Architecture of Reasoning
Before we can assess machines and their capabilities, we must first understand the architecture of human reasoning. Mathematics is not a monolithic activity; it is a multilayered, evolving landscape, each layer building on the one before it.
First-order logic concerns propositions about objects in a domain. For example, "For all objects x, if x is a triangle, then x has three sides."
Second-order logic speaks about properties, sets, and relations themselves. For instance, "There exists a set S such that an object x is a triangle if and only if x belongs to S."
Higher-order logics generalize this pattern, allowing quantification over sets of sets, properties of properties, and so on, climbing infinitely.
Humans intuitively navigate these layers of logic and abstraction. When a mathematician introduced complex numbers, they did not merely solve an equation; they expanded the ontology of mathematics itself, inventing a new domain.
This act of expanding our conceptual toolkit allowed us to understand phenomena that were previously unimaginable.
For example, consider the birth of the imaginary unit i, where:
this was not just a trivial computational trick.
It represented the creation of an entirely new mathematical object, enabling the construction of the field of complex numbers:
where R denotes the real numbers.
Complex numbers provide insight into phenomena such as wave behavior, electrical circuits, and quantum mechanics. The invention of ii was, and remains, a profound ontological expansion within mathematics.
Gödel, Incompleteness, and the Limits of Formal Systems
David Hilbert more than once dreamed of a complete, consistent, and decidable formal system for all mathematics. However, Kurt Gödel shattered this hope with his incompleteness theorems, which revealed that there are inherent limits to formal mathematical systems.
First Incompleteness Theorem:
"Any consistent formal system F rich enough to express elementary arithmetic contains true statements that cannot be proven within F."
Formally, there exists a statement G such that:\(F \nvdash G \quad \text{and} \quad F \nvdash \lnot G \)meaning G is undecidable within F.
Second Incompleteness Theorem:
"A formal system cannot demonstrate its own consistency."
This revealed the ultimate irony: no system, however carefully constructed, could be both consistent and complete about itself. A system that attempts to be both complete and self-verifying will inevitably face contradictions.
Gödel’s work basically illuminated the limitations of formalism — the realization that truth transcends formal systems. What can be proven by a system does not always reflect what is true about the world, as certain truths are inaccessible within the boundaries of any particular system.
Computation, Church-Turing Thesis, and Beyond
Parallel to Gödel’s insights, Hilbert posed another monumental question: the Entscheidungsproblem, or decision problem. He asked whether there could exist a mechanical procedure — an algorithm — that could decide the truth of any mathematical statement.
Alonzo Church and Alan Turing answered definitively, once for all: No.
Church introduced λ calculus, a formalism that could express computation. Turing introduced what we now call the Turing Machine, a simple yet universal model of computation. Despite its simplicity, the Turing Machine encapsulates the essence of computability. Turing Machines are formally defined as a 7-tuple:
where:
Q is a finite set of states,
Σ is the input alphabet,
Γ is the tape alphabet Σ⊆Γ,
δ:Q×Γ→Q×Γ×{L,R} is the transition function,
q0 is the initial state,
qaccept and qreject are special halting states.
Their combined insight crystallized into the Church-Turing Thesis:
"Everything that can be effectively computed can be computed by a Turing Machine."
However, this insight, profound as it is, pertains only to computation — the manipulation of existing symbols according to well-defined rules. The creative reasoning involved in expanding mathematical domains is a different and far more complex process. Through their work, humans have not merely computed; we have created new realities, new possible worlds within mathematics.
For instance, the introduction of non-Euclidean geometries, topology, category theory, and the abstract structures of modern mathematics was not a matter of computing truths that were already there. It was an act of ontological expansion, the creation of new entities and systems within mathematics itself.
A Way Beyond Computation
Thus, to assess artificial intelligence and machine reasoning properly, we must distinguish between:
Computation — the mechanical manipulation of symbols according to fixed rules.
Creative Reasoning — the human capacity to invent new frameworks, to push the boundaries of knowledge, and to create new mathematical objects and structures.
Machines may one day compute as well or better than humans. However, whether they can ever invent mathematics, in the sense that the real and complex numbers, topology, or category theory were invented, remains an open question. The fundamental limitation of computation, as revealed by Gödel, Church, and Turing, is that machines operate within the confines of predefined rules and symbols. They are bound by the ontological limitations of their design.
Humans, on the other hand, navigate effortlessly across layers of abstraction. A mathematician inventing complex numbers was not simply solving equations — they were expanding the very ontology of mathematics. In their work, they were shifting between layers of thought, introducing new kinds of objects, new dimensions of reality.
Gödel’s incompleteness theorems exploded Hilbert’s dream of formal completeness, revealing that no formal system could ever be both consistent and complete about itself. This understanding laid bare the profound limits of formalism in describing the universe of mathematical truths.
Similarly, the work of Church and Turing in formalizing computation tackled the Entscheidungsproblem and revealed the limits of algorithmic decision-making. Their insight — that there is no general algorithm for determining all mathematical truths — marked the advent of computability theory.
Ultimately, Turing Machines are a brilliant abstraction of computation, universally powerful yet fundamentally limited. They can compute anything that is computable, but they cannot invent new objects, new mathematical structures, or new worlds within mathematics.
Thus was born the Church-Turing Thesis:
"Everything computable can be computed by a Turing Machine."
But this is not the full story. What humans have accomplished with the invention of complex numbers, topology, and category theory is not merely computation. It is ontological expansion — the creation of new objects, new realms of possibility. In mathematics, as in the world, we find that the most profound moments arise not from computation but from invention.
Machines, Programming Paradigms, and the Shape of Minds
Suppose you wanted to build a mathematician.
Would you code them as an imperative program? ("Do X, then Y.")
As a procedural script? ("Apply this method recursively.")
As an object-oriented system? ("Define mathematical objects as classes with attributes and methods.")
Or perhaps, as a functional language like Haskell, where computation is pure, immutable, and referentially transparent?
Each programming paradigm captures one style of thinking:
Imperative: Linear, command-based. (early calculators)
Procedural: Modular, reusable operations. (structured proofs)
Object-Oriented: World-building. (algebraic structures as classes)
Functional: Pure transformation. (category theory, homological algebra)
Logic programming: Search and inference. (Prolog, automated theorem proving)
But mathematics is not any one paradigm. It is meta-paradigmatic — it shifts the paradigm itself.
Just as humans invented imaginary numbers when none existed, or mapped numbers onto points when it helped, mathematicians switch and build new modes of thought as needed.
No LLM today — no transformer, no GPT, no diffusion model — natively invents new modes. They operate within a statistical structure. Tokens are woven into likely sequences, but the ontological leap — the birth of i, of Galois theory, of non-Euclidean geometries — is absent.
LLMs are great cartographers of existing territory, but not explorers of the truly unknown.
A Case Study: Reasoning, Abstraction, and the Missing Fire
Consider a really simple prompt:
Find the altitude of a right triangle with hypotenuse 4 and leg 2.
An LLM might succeed. It recalls Pythagoras:
And then it proceeds to solve it. Fine.
But now, amazed by the previous result, you want to push it harder and you try to ask it:
Invent a new mathematical structure between groups and rings that satisfies X and Y properties.
It hesitates. It starts by regurgitating known algebraic structures: monoids, semirings, modules, and so on.
Because it was never truly asked to invent — merely to interpolate between familiar points.
Similarly, ask an LLM:
What happens if we redefine i not as the square root of -1, but as an operator on a different algebra?
It will try to "guess" from prior training, but not freely explore beyond its horizon.
The key cognitive acts — abstraction beyond known paradigms, ontology shift, meta-layer reasoning — are outside the training regime.
Can Machines Replace Mathematicians?
Machines are pretty powerful.
They simulate reasoning with uncanny precision. They formalize deduction, traverse symbolic landscapes, and retrieve deep stores of knowledge in milliseconds.
They are astonishing tools — faithful replicators of what has been said and skilled manipulators of what is known.
But mathematical invention — real, generative, world-bending invention — is more than deduction.
It is a dance across conceptual layers:
From first-order logic, where we reason about objects and relations
To second-order structures, where we describe properties of properties, sets of sets, building richer ontologies
Through abstraction, where disparate domains converge — mapping algebra onto geometry, or number theory into topology
And into meta-mathematics, where we ask: What counts as proof? What counts as logic? — and invent new rules entirely.
Here, the mathematician is not a user of logic.
She (or he) is an architect of new logics, redefining what it means to reason at all.
Today's language models — even the most advanced, like GPT-4o — are stunning.
They compose, translate, deduce, imitate. They suggest connections, complete patterns, echo ideas.
But they are assistants, not revolutionaries.
They amplify creativity, but rarely originate it, they are capable to simulate insight, but do not yet stand at the edge of the unknown and leap.
A Turing Machine can compute all that is computable.
But it cannot conceive what lies beyond its formal system.
Perhaps that is the final word on mathematical thought:
It is not bound by the syntax of its time.
It conceives what does not yet exist — and then gives it form.
Not just a mirror of the world, but a forge.
The Halting Horizon: What Machines Can't Know
In the 1930s, a young Alan Turing sat with a problem that, at first sight, seemed innocent enough.
In its distilled and briefly summarized form, it basically asked:
Could there be a general procedure — a mechanical one — that tells you whether any given computer program will finish running or loop forever?
It turns out, no such procedure could exists.
What Turing discovered wasn't just a clever puzzle or a technical footnote. It was a real revelation. He had found a limit — a boundary in logic itself. A place where computation, no matter how refined, must fail. And that’s a real game-changer.
This became known as the Halting Problem, and it drew the first bright line in the landscape of computability. It essentially told us:
There are questions no algorithm can answer.
Not because they’re hard. Not because they require more time or faster hardware.
But because they are formally undecidable — they exist beyond the reach of mechanical certainty.
Today, we live in a world totally saturated with computation. Our phones talk, write, and reason. Large Language Models can finish our sentences most of the time, sketch outlines of proofs, even mimic the cadence of mathematicians.
But the Halting Problem still haunts these systems.
Because at their core, no matter how dazzling their outputs, LLMs are machines.
Machines that manipulate symbols, follow patterns, and remain forever bounded by what is Turing-computable.
They can simulate thought, but they cannot transcend their own substrate.
And this is where the contrast with human thinkers becomes sharp.
When Gödel showed that some truths can’t be proven within a system, mathematicians didn’t stop — they invented new systems.
When classical logic broke under the strain of paradox, we built new logics, new languages, new foundations. We didn’t just compute — we conceived.
That’s the quiet, yet underestimated superpower of the human mind:
It alredy knows when to step outside the frame and, more importantly, how to do so in a proper way.
It’s not that we’re faster, or smarter in every dimension. But when the system breaks, we imagine a bigger one, or we try to fix the problems at different layers. When the program loops, we are able to ask a better question.
The Halting Problem remains unsolved — not because we failed to solve it, but because we learned to live with its truth.
And maybe that’s the real frontier: not what machines can compute, but what we can understand beyond computation.
Language Models and First-Order Reasoning
Today's Large Language Models — GPT-4o among them — are remarkable.
At their core, they are statistical engines, exquisitely trained to approximate the structure and flow of human language.
They excel at syntactic mimicry, completing thoughts with uncanny fluidity, and performing surface-level deduction with increasing accuracy.
In the language of logic, we might say they operate comfortably within the domain of first-order reasoning.
They manipulate well-formed formulas, instantiate variables, apply basic inference rules, and respond fluidly to prompts embedded in rigid logical syntax.
Given a coherent set of assumptions, they can often produce valid consequences.
They can sketch outlines of proofs, suggest plausible lemmas, and even generate clever analogies that resemble mathematical intuition.
But here's the crux: first-order logic has boundaries.
It is constrained to reasoning about individual objects, using fixed quantifiers and clearly defined relations.
It is effective for formal systems where the structure is entirely known in advance — where all the pieces are on the board, and the rules are explicit.
Mathematical creativity, however, rarely stays within those lines.
Why Language Models Struggle with Higher-Order Thought
The real frontier lies in higher-order abstraction:
Reasoning not just about objects, but about the properties of those objects.
Not just about formulas, but about entire systems of reasoning.
Not just about what is, but about what could be, what must be added, or what must be redefined for consistency to hold.
This is the world of second-order logic, category theory, model theory, and meta-mathematics.
A domain where logic bends back on itself — where mathematicians debate not only the truth of statements, but the validity of the frameworks used to define truth in the first place.
And this is where the scaffolding begins to wobble for language models.
Because while an LLM can emulate higher-order discourse — stringing together words about abstraction, axioms, or topoi — it lacks a grounded semantic understanding of what those abstractions are.
It cannot internally construct a new logic.
It is not capable of intentionally shift from one formal system to another in response to a contradiction.
It can’t say, as humans once did in the wake of Gödel, “This foundation isn’t enough. Let’s build a new one that fits best.”
This is not to diminish what LLMs can do, their ability to simulate structured reasoning has already transformed education, programming, and idea generation.
But we must be clear-eyed:
Simulation is not true invention.
Inference is not insight.
And fluency is not foundational understanding.
In mathematics, the truly creative act is not in applying known rules — it is in questioning them, extending them, and sometimes replacing them entirely.
That act, as of now, remains uniquely human.
Beyond Proofs: The Act of Conceptual Creation
Real mathematical revolutions are rarely about "solving" an existing problem inside an existing framework.
They are about changing the framework itself.
The invention of category theory was not a solution to an equation.
It was the realization that mathematics could be better understood as a web of relationships between structures, rather than as isolated objects.
The birth of non-Euclidean geometries was not the completion of a calculation.
It was the recognition that the parallel postulate could be negated — that new, consistent universes of geometry could exist.
In these acts of creation, the question itself changes.
And more crucially: the space of allowed moves changes.
No existing AI/LLM today has the architecture to propose a new metalanguage, to redefine the game it is playing.
At most, it can rearrange within the linguistic space it was trained on.
Conceptual creation requires stepping out of the space — a move not easily reducible to pattern completion.
Programming, Proofs, and the Deep Structure of Reasoning
There is a deep and profound connection between programming and proofs.
The Curry-Howard correspondence tells us that a program is a proof, and a proof is a program.
In this light, we can view LLMs as program synthesizers:
given a description, they attempt to "prove" a response by synthesizing an answer.
Yet this analogy hides an important asymmetry.
Human mathematicians do not merely write programs.
They invent new types, new logics, new type theories when needed.
When Gödel discovered his incompleteness theorems, he didn’t just operate within arithmetic. He encoded statements about arithmetic inside arithmetic, via arithmetization — inventing a new meta-level of self-reference.
This meta-creativity — the ability to see the system from outside, to redefine the representation itself —
remains the domain of human intelligence.
Current LLMs remain inside the system they were trained on.
Toward a Theory of Mathematical Intelligence
If we ever build an AI that can replace mathematicians — not just assist them — it won’t be a larger language model. It will need a fundamentally different kind of mind.
Such a system would go beyond computation and mimicry. It would embody the generative, self-questioning, and world-building nature of true mathematical thought.It would need:
Meta-Representation
The ability to reason about formal systems, not just within them — inventing new logical frameworks when existing ones fail.Self-Referential Abstraction
The capacity to reflect on its own reasoning:
“This system can’t resolve the paradox. A new axiom is needed.”
This isn’t recursion — it’s self-directed evolution.
Non-Computable Creativity
To move beyond the limits of algorithmic reasoning — stepping outside formal systems to create new ones from scratch.
This would require a qualitative leap:
A blend of symbolic reasoning, meta-learning, and reflective cognition.
Because in the end, mathematicians aren’t just problem solvers.
They’re architects of new problems, new worlds — and new ways of thinking.
What do you think? Is a new kind of AI possible — an explorer, not just a cartographer?
Or will the frontier of mathematics remain forever a human domain?