November 18, 2015
SC15 Birds-of-a-Feather Session:
Fresco: An Open Failure Data Repository for Dependability Research and Practice
This BoF will unveil a recently awarded NSF-supported effort for an open failure data repository meant to enable data-driven resiliency research for large-scale computing clusters from design, monitoring, and operational perspectives. To address the dearth of large publicly available datasets, we have started on this 3-year project to create a repository of system configuration, usage, and failure information from large computing systems. We have seeded the effort using a large Purdue computing cluster over a six-month period. Here we seek to collect requirements for a larger, multi-institution repository and demonstrate the usage and data analytics tools for the current repository.
October 6, 2015
The Purdue and Illinois teams met to kick off the CRI project at Urbana, IL.
September 20, 2015
Our proposal for a Birds-of-the-feather (BOF) session "Fresco: An Open Failure Data Repository for Dependability Research and Practice" has been accepted by the Supercomputing 2015 conference, to be held in Austin, TX, Nov. 15-20, 2015.
September 18, 2015
NSF Project Seeks to Improve Supercomputer Reliability
August 26, 2015
Purdue supercomputer usage and failure data research could make supercomputers even more super