Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	FILIPOVIČ Jiří PLHÁK Jan STŘELÁK David
Year of publication	2015
Type	Article in Proceedings
Conference	Proceedings of IEEE International Conference on High Performance Computing & Simulation
MU Faculty or unit	Faculty of Informatics
Citation
Doi	https://doi.org/10.1109/HPCSim.2015.7237020
Field	Informatics
Keywords	RMSD; GPU; code optimization; cache
Description	In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4x speedup in clustering and 62.7x speedup in 1:1 dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5 % and 91.6 % of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance.
Related projects:	Zaměstnáním nejlepších mladých vědců k rozvoji mezinárodní spolupráce