Reiss Lecture: Six Principles For Evaluating Cognitive Capabilities in AI Models

Melanie Mitchell (Santa Fe Institute)

Thu May 14, 16:15-17:15 (2 weeks ago)

Abstract: Modern AI systems have exceeded human performance on many benchmarks meant to evaluate general cognitive capacities. However, it is often the case that benchmark performance does a poor job of predicting general capacities in real-world settings. In this article I describe several issues related to evaluation that can cause this mismatch, and propose six principles, inspired by developmental and comparative psychology, that need to be adopted to enable rigorous evaluation for AI systems. These principles are illustrated by case studies from the psychology and AI literature.

astrophysicscomputational biologycomputational engineering, finance, and sciencegeneral mathematicsnonlinear sciencescomputational physicsfluid dynamicsgeneral physics

Audience: researchers in the topic


Northwestern Applied Mathematics Seminar

Organizer: Hermann Riecke*
*contact for this listing

Export talk to