Reliability-aware runtime adaption through a statically generated task schedule

Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Device scaling, increasing number of components in a single chip, varying environmental issues, and aging effects have brought severe reliability challenges that impose tight constraints on the operation of a system. To cope with these challenges this thesis proposes a reliability aware scheduling framework that combines static and dynamic analysis to improve the overall system resiliency to different kind of faults (i.e. intermittent, transient, and permanent). The static analysis technique employs genetic algorithms to optimize the overall system reliability by considering Reliability Level (RL) as an intermediate scheduling dimension, and creating a task-to-RL mapping. This enables the RL-to-core mapping to be efficiently adapted at runtime according to fault rate variations, while the task-to-RL mapping can still be reused. The dynamic analysis tracks faults appearing in each core and measures the time correlation of those faults to update the RL-to-core mapping. The proposed reliability aware framework is implemented in a state of the art runtime system, DARTS, so as to quantitatively show the advantages of using the overall framework in existing multicore platforms. Experimental results show that the proposed technique delivers up to 30% improvement in application execution time and up to 72% improvement in faults occurring at runtime.
Description
Keywords
Citation