Transformers as Effective Fields: From Quantum Physics to AI

Changqing Fu (CEREMADE, Paris Dauphine University - PSL and Paris AI Institute (PRAIRIE))

Mon Feb 23, 08:00-09:00 (3 months ago)

Abstract: Approximation and algebraic theories are not yet sufficient to prove the optimality of Transformers: it is known that even shallow infinite-width neural networks are approximately universal, and ReLU networks are within the rational function class under tropical (max-plus) algebra. However, these facts still cannot explain the effectiveness of Transformers, since a constructive proof of their form is missing.

In this talk, we propose a novel theory to fully classify all possible neural networks and argue that linear/softmax Transformers are optimal under several minimal axioms. To model the reasoning process, we treat the neural ODE as the geodesics of some canonical field, where time represents layer depth. To model the interaction among concepts, we pass from the vector flow to the matrix flow, denoted as $\bm X$, whose rows are tokens and columns are neurons. The Transformer is then a natural consequence:

Linear Attention: the first interaction term under left unitary invariance.

Softmax Attention: the entropic regularization of the field under left permutation invariance.

Two-Layer ReLU Network: the projected gradient flow, where the feasible set is conic and permutation invariant.

Gated Activation Network: the minimal nonlinear non-interactive field under left permutation invariance.

Sparse Attention*: the token-pairwise non-commutative correction that leads to a mask on the attention matrix.

In conclusion, we provide a theoretical proof for why the “bitter lesson” holds, a theoretical guarantee for the technical path of Transformers, and a paradigm to study the interpretability of intelligence.

computation and languagemachine learningroboticsmathematical physics

Audience: learners

( paper | slides | video )

Tropical mathematics and machine learning

Series comments: Tropical mathematics, machine learning, category theory and anything tech+math are welcome.

Organizer:	Eric Dolores-Cuenca*
	*contact for this listing

Export talk to