An Open Datacenter Dataset for AI Enabled Optimization

Siddharth Samsi and Vijay Gadepally (MIT-Lincoln Lab)

02-Apr-2021, 16:00-17:00 (3 years ago)

Abstract: The first step in training an AI is to get the right data. In order to apply AI to the problem of data center optimization, such as identifying faults with servers, energy or cooling systems, before they become critical, the MIT Lincoln Laboratory Supercomputing Center is developing a state-of-the-art dataset. This dataset contains rich information such as: physical information about building management; system information such as scheduler and filesystem logs; and node-level information such as utilization, memory, GPU activity (both job level statistics as well as time-series monitoring collected via NVIDIA’s DCGM tool), energy utilization, etc. In this talk, we will describe the dataset, detail how developers can get access to this data, and discuss a number of open problems associated with datacenter analytics.

Computer scienceMathematics

Audience: researchers in the topic


Computational Research in Boston and Beyond Seminar (CRIBB)

Curator: Shirley Entzminger*
*contact for this listing

Export talk to