BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Changqing Fu (CEREMADE\, Paris Dauphine University - PSL and Paris
  AI Institute (PRAIRIE))
DTSTART:20260223T080000Z
DTEND:20260223T090000Z
DTSTAMP:20260606T185238Z
UID:TropicalmathandML/31
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/Tropicalmath
 andML/31/">Transformers as Effective Fields: From Quantum Physics to AI</a
 >\nby Changqing Fu (CEREMADE\, Paris Dauphine University - PSL and Paris A
 I Institute (PRAIRIE)) as part of Tropical mathematics and machine learnin
 g\n\n\nAbstract\nApproximation and algebraic theories are not yet sufficie
 nt to prove the optimality of Transformers: it is known that even shallow 
 infinite-width neural networks are approximately universal\, and ReLU netw
 orks are within the rational function class under tropical (max-plus) alge
 bra. However\, these facts still cannot explain the effectiveness of Trans
 formers\, since a constructive proof of their form is missing.\n\nIn this 
 talk\, we propose a novel theory to fully classify all possible neural net
 works and argue that linear/softmax Transformers are optimal under several
  minimal axioms. To model the reasoning process\, we treat the neural ODE 
 as the geodesics of some canonical field\, where time represents layer dep
 th. To model the interaction among concepts\, we pass from the vector flow
  to the matrix flow\, denoted as $\\bm X$\, whose rows are tokens and colu
 mns are neurons. The Transformer is then a natural consequence:\n\nLinear 
 Attention: the first interaction term under left unitary invariance.\n\nSo
 ftmax Attention: the entropic regularization of the field under left permu
 tation invariance.\n\nTwo-Layer ReLU Network: the projected gradient flow\
 , where the feasible set is conic and permutation invariant. \n\nGated Act
 ivation Network: the minimal nonlinear non-interactive field under left pe
 rmutation invariance.\n\nSparse Attention*: the token-pairwise non-commuta
 tive correction that leads to a mask on the attention matrix.\n\nIn conclu
 sion\, we provide a theoretical proof for why the “bitter lesson” hold
 s\, a theoretical guarantee for the technical path of Transformers\, and a
  paradigm to study the interpretability of intelligence.\n
LOCATION:https://researchseminars.org/talk/TropicalmathandML/31/
END:VEVENT
END:VCALENDAR