論文

査読有り
2016年

Low Latency and Resource-aware Program Composition for Large-scale Data Analysis

2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID)
  • Masahiro Tanaka
  • ,
  • Kenjiro Taura
  • ,
  • Kentaro Torisawa

開始ページ
325
終了ページ
330
記述言語
英語
掲載種別
研究論文(国際会議プロシーディングス)
DOI
10.1109/CCGrid.2016.88
出版者・発行元
IEEE

The importance of large-scale data analysis has shown a recent increase in a wide variety of areas, such as natural language processing, sensor data analysis, and scientific computing. Such an analysis application typically reuses existing programs as components and is often required to continuously process new data with low latency while processing large-scale data on distributed computation nodes. However, existing frameworks for combining programs into a parallel data analysis pipeline (e.g., workflow) are plagued by the following issues: (1) Most frameworks are oriented toward high-throughput batch processing, which leads to high latency. (2) A specific language is often imposed for the composition and/or such a specific structure as a simple unidirectional dataflow among constituting tasks. (3) A program used as a component often takes a long time to start up due to the heavy load at initialization, which is referred to as the startup overhead. Our solution to these problems is a remote procedure call (RPC)-based composition, which is achieved by our middleware Rapid Service Connector (RaSC). RaSC can easily wrap an ordinary program and make it accessible as an RPC service, called a RaSC service. Using such component programs as RaSC services enables us to integrate them into one program with low latency without being restricted to a specific workflow language or dataflow structure. In addition, a RaSC service masks the startup overhead of a component program by keeping the processes of the component program alive across RPC requests. We also proposed architecture that automatically manages the number of processes to maximize the throughput. The experimental results showed that our approach excels in overall throughput as well as latency, despite its RPC overhead. We also showed that our approach can adapt to runtime changes in the throughput requirements.

リンク情報
DOI
https://doi.org/10.1109/CCGrid.2016.88
DBLP
https://dblp.uni-trier.de/rec/conf/ccgrid/TanakaTT16
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000382529800042&DestApp=WOS_CPL
URL
http://dblp.uni-trier.de/db/conf/ccgrid/ccgrid2016.html#conf/ccgrid/TanakaTT16
ID情報
  • DOI : 10.1109/CCGrid.2016.88
  • ISSN : 2376-4414
  • DBLP ID : conf/ccgrid/TanakaTT16
  • Web of Science ID : WOS:000382529800042

エクスポート
BibTeX RIS