• Discoverability Visible
  • Join Policy Restricted
  • Created 09 Nov 2020

November 18, 2015

SC15 Birds-of-a-Feather Session:

Fresco: An Open Failure Data Repository for Dependability Research and Practice

This BoF will unveil a recently awarded NSF-supported effort for an open failure data repository meant to enable data-driven resiliency research for large-scale computing clusters from design, monitoring, and operational perspectives. To address the dearth of large publicly available datasets, we have started on this 3-year project to create a repository of system configuration, usage, and failure information from large computing systems. We have seeded the effort using a large Purdue computing cluster over a six-month period. Here we seek to collect requirements for a larger, multi-institution repository and demonstrate the usage and data analytics tools for the current repository.

October 6, 2015

The Purdue and Illinois teams met to kick off the CRI project at Urbana, IL.

September 20, 2015

Our proposal for a Birds-of-the-feather (BOF) session "Fresco: An Open Failure Data Repository for Dependability Research and Practice" has been accepted by the Supercomputing 2015 conference, to be held in Austin, TX, Nov. 15-20, 2015. 

September 18, 2015

 
NSF Project Seeks to Improve Supercomputer Reliability

Link to HPCWire article

August 26, 2015

 
Purdue supercomputer usage and failure data research could make supercomputers even more super

Link to the article