MISC

2017年6月30日

Auto-Tuning on NUMA and many-core environments with an FDM code

Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
  • Takahiro Katagiri
  • ,
  • Satoshi Ohshima
  • ,
  • Masaharu Matsumoto

開始ページ
1399
終了ページ
1407
記述言語
英語
掲載種別
速報,短報,研究ノート等(学術雑誌)
DOI
10.1109/IPDPSW.2017.27
出版者・発行元
Institute of Electrical and Electronics Engineers Inc.

In this paper, we focus on auto-tuning (AT) performance on nonuniform memory access (NUMA) and many-core architectures. Code from the finite difference method (FDM) is selected to evaluate AT performance, and results on the Xeon Phi (Knights Landing, KNL) for four kinds of memory (FLAT and CACHE) and cluster modes (QUADRANT and SNC4) yielded the following findings: (1) The KNL memory mode did not affectoverall performance, except FLAT-SNC4. The difference ofexecution time for the CACHE mode to the FLAT mode was only 0.99%. (2) Hyper-threading (HT) technology worked well, and yielded 1.86x (baseline) and 1.50x (with AT). (3) Varying hybrid MPI/OpenMP execution was very effective for KNL. Themaximum factors of speedups were 2.16x in the baseline and2.91x with AT. (4) AT with code selection persisted as a powerful tool, even in KNL. We obtained speedups by AT for a maximum of 1.64x. Moreover, we had room to speedup by a further 1.31x by adapting AT for the fastest execution.

リンク情報
DOI
https://doi.org/10.1109/IPDPSW.2017.27
ID情報
  • DOI : 10.1109/IPDPSW.2017.27
  • ISSN : 2164-7062
  • SCOPUS ID : 85028080944

エクスポート
BibTeX RIS