High performance sparse Fast Fourier Transform using OpenACC

Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
The Sparse Fast Fourier Transform (sFFT) is a recent algorithm developed by Hassanieh et al. at MIT for Discrete Fourier Transforms on signals with a sparse frequency domain. A reference implementation of the algorithm exists and proves that the sFFT can be faster than modern FFT libraries for signals of sparse nature. The algorithm has been parallelized using multiple approaches, such as over multicore by Cheng et al. over GPGPUs using CUDA by Cheng et al. and optimized for serial execution using SSE intrinsics by Schumacher et al. ☐ While the increase in number of cores and memory bandwidth on modern architectures provide an opportunity to improve performance through sophisticated parallel algorithm design, the sFFT is inherently complex, embarrassingly parallel, and numerous challenges need to be addressed to deliver the optimal performance. In this Masters Thesis, we employ a high-level directive-based parallel programming model, OpenACC to create a performance portable sFFT code. We call our implementation, ACCsFFT. Our implementation can target heterogeneous platforms consisting of x86 or Power Processors integrated with GPUs. The performance of our implementation is compared against existing parallel implementations that have used either low-level or proprietary software on CPUs and GPUs. ☐ Several optimizations are proposed and implemented in ACCsFFT. The performance is also compared against the highly optimized parallel FTW library. We also target GPUs from different families, old to the most modern hardware, to verify if the algorithm is performance portable, scalable and reproducible across generations. ☐ We deliver a high-level parallel sparse FFT library capable of running one version of the code in a serial manner on CPU or in parallel on multicore, GPGPUs and other architectures that OpenACC currently supports. A programmer would only need to change input parameters of the algorithm for the different runs.
Description
Keywords
Applied sciences, GPU, HPC, Irregular algorithm, OpenACC, Parallel, Sparse Fast Fourier Transform
Citation