論文

査読有り
2018年

Fast convolution Kernels on Pascal GPU with high memory efficiency

Simulation Series
  • Qiong Chang
  • ,
  • Masaki Onishi
  • ,
  • Tsutomu Maruyama

50
4
開始ページ
24
終了ページ
35
記述言語
掲載種別
研究論文(国際会議プロシーディングス)

© 2018 Society for Modeling & Simulation International (SCS). The convolution computation is widely used in many fields, especially in CNNs. Because of the rapid growth of the training data in CNNs, GPUs have been used for their acceleration and memory-efficient algorithms have been the focus of attention due to their high performance. In this paper, we propose two convolution kernels for single-channel and multi-channel convolution respectively. Our two methods achieve high performance by hiding the access delay of the global memory efficiently, and achieving high ratio of floating point Fused Multiply-Add operations per fetched data from the global memory. In comparison to the latest Cudnn library developed by Nvidia aimed to accelerate the deep-learning computation, the average performance improvement of our research is 2.6X for the single-channel, and 1.4X for the multi-channel.

リンク情報
Scopus
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85055288126&origin=inward
Scopus Citedby
https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85055288126&origin=inward
ID情報
  • ISSN : 0735-9276
  • SCOPUS ID : 85055288126

エクスポート
BibTeX RIS