論文

査読有り
2019年12月

Skew-Aware Collective Communication for MapReduce Shuffling

IEICE Transactions on Information and Systems
  • Daikoku, Harunobu
  • ,
  • Kawashima, Hideyuki
  • ,
  • Tatebe, Osamu

E102-D
12
開始ページ
2389
終了ページ
2399
記述言語
英語
掲載種別
研究論文(学術雑誌)
DOI
10.1587/transinf.2019PAP0019
出版者・発行元
IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

This paper proposes and examines the three in-memory shuffling methods designed to address problems in MapReduce shuffling caused by skewed data. Coupled Shuffle Architecture (CSA) employs a single pairwise all-to-all exchange to shuffle both blocks, units of shuffle transfer, and meta-blocks, which contain the metadata of corresponding blocks. Decoupled Shuffle Architecture (DSA) separates the shuffling of meta-blocks and blocks, and applies different all-to-all exchange algorithms to each shuffling process, attempting to mitigate the impact of stragglers in strongly skewed distributions. Decoupled Shuffle Architecture with Skew-Aware Meta-Shuffle (DSA w/ SMS) autonomously determines the proper placement of blocks based on the memory consumption of each worker process. This approach targets extremely skewed situations where some worker processes could exceed their node memory limitation. This study evaluates implementations of the three shuffling methods in our prototype in-memory MapReduce engine, which employs high performance interconnects such as InfiniBand and Intel Omni-Path. Our results suggest that DSA w/ SMS is the only viable solution for extremely skewed data distributions. We also present a detailed investigation of the performance of CSA and DSA in various skew situations.

リンク情報
DOI
https://doi.org/10.1587/transinf.2019PAP0019
ID情報
  • DOI : 10.1587/transinf.2019PAP0019
  • ISSN : 1745-1361

エクスポート
BibTeX RIS