Multigrain parallelism: bridging coarse-grain parallel programming and fine-grain event-driven multithreading

Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
The recent evolution in hardware landscape, aimed at producing high-performance computing systems capable of reaching extreme-scale performance, has reignited the interest on fine-grain multithreading, particularly at the intranode level. Novel finegrain Program Execution Models (PXM) are being proposed and have shown promising results on this quest. Similarly, fine-grain constructs are being integrated into popular programming models, such OpenMP. Nevertheless, classical coarse-grain constructs are still heavily present in numerous of scientific applications, sometimes used in conjunction with fine-grain directives. This forces OpenMP runtime to support both models of execution, potentially reducing the advantages obtained when executing an application using a fully fine-grain environment. ☐ In order to evaluate the advantages offered by novel fine-grain PXMs in the execution of scientific applications and to help programmers transition into its use, this dissertation presents a multigrain parallel programming environment that allows using OpenMP simple and familiar interface for the generation of fine-grain multithreaded applications to be run on top of a fine-grain event-driven PXM. The major component in this environment is omp2cd, a multigrain parallel compiler based on clang, that automatically determines the granularity of the fine-grain tasks in the output application at different levels of parallelism, based on the directives used by the developer, the parallelism available in the input program, and compiler hints in the form of OpenMP clause extensions guiding the compiler in the presence of resource constraints, such as power and resilience. The multigrain analysis and optimizations designed to determine the size and number of the fine-grain tasks are presented, as well as how these tasks are grouped into asynchronous functions for increased locality of shared data. The strategies and compiler techniques used to support the semantics and syntax defined by the different versions of the OpenMP standard are also described. ☐ Experimental results with a set of scientific benchmarks show that fine-grain applications generated by and run on the multigrain parallel programming environment with two runtimes implementing a fine-grain event-driven PXM can outperform OpenMP for data-intensive workloads with irregular and dynamic parallelism, while remaining competitive for more regular and structured applications.
Description
Keywords
Applied sciences, Codelets, Dataflow, Extreme-scale performance, Fine-grain multithreading, Fine-grain program execution model, Multigrain parallel compiler
Citation