2018年11月1日
Porting DDalphaAMG solver to K computer
- ,
- 記述言語
- 掲載種別
- 機関テクニカルレポート,技術報告書,プレプリント等
We port Domain-Decomposed-alpha-AMG solver to the K computer. The system has<br />
8 cores and 16 GB memory par node, of which theoretical peak is 128 GFlops<br />
(82,944 nodes in total). Its feature, as many as 256 registers par core and as<br />
large as 0.5 byte/Flop ratio, requires a different tuning from other machines.<br />
In order to use more registers, we change some of the data structure and<br />
rewrite matrix-vector operations with intrinsics. The performance is improved<br />
by more than a factor two for twelve solves including the setup. The efficiency<br />
is still about 5% after the optimization, which is lower than a previously<br />
tuned mixed precision solver for the K computer, 22%. The throughput is,<br />
however, more than two times better for a physical point configuration.
8 cores and 16 GB memory par node, of which theoretical peak is 128 GFlops<br />
(82,944 nodes in total). Its feature, as many as 256 registers par core and as<br />
large as 0.5 byte/Flop ratio, requires a different tuning from other machines.<br />
In order to use more registers, we change some of the data structure and<br />
rewrite matrix-vector operations with intrinsics. The performance is improved<br />
by more than a factor two for twelve solves including the setup. The efficiency<br />
is still about 5% after the optimization, which is lower than a previously<br />
tuned mixed precision solver for the K computer, 22%. The throughput is,<br />
however, more than two times better for a physical point configuration.
- ID情報
-
- arXiv ID : arXiv:1811.00355