Fast convolution Kernels on Pascal GPU with high memory efficiency

Simulation Series

Qiong Chang
Masaki Onishi
Tsutomu Maruyama

巻: 50
号: 4
開始ページ: 24
終了ページ: 35
記述言語
掲載種別: 研究論文（国際会議プロシーディングス）

© 2018 Society for Modeling & Simulation International (SCS). The convolution computation is widely used in many fields, especially in CNNs. Because of the rapid growth of the training data in CNNs, GPUs have been used for their acceleration and memory-efficient algorithms have been the focus of attention due to their high performance. In this paper, we propose two convolution kernels for single-channel and multi-channel convolution respectively. Our two methods achieve high performance by hiding the access delay of the global memory efficiently, and achieving high ratio of floating point Fused Multiply-Add operations per fetched data from the global memory. In comparison to the latest Cudnn library developed by Nvidia aimed to accelerate the deep-learning computation, the average performance improvement of our research is 2.6X for the single-channel, and 1.4X for the multi-channel.

リンク情報

Scopus: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85055288126&origin=inward
Scopus Citedby: https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85055288126&origin=inward

ID情報

ISSN : 0735-9276
SCOPUS ID : 85055288126

エクスポート: BibTeX RIS

大西正輝

論文

Fast convolution Kernels on Pascal GPU with high memory efficiency

メニュー