Low Latency and Resource-aware Program Composition for Large-scale Data Analysis

2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID)

Masahiro Tanaka
Kenjiro Taura
Kentaro Torisawa

開始ページ: 325
終了ページ: 330
記述言語: 英語
掲載種別: 研究論文（国際会議プロシーディングス）
DOI: 10.1109/CCGrid.2016.88
出版者・発行元: IEEE

The importance of large-scale data analysis has shown a recent increase in a wide variety of areas, such as natural language processing, sensor data analysis, and scientific computing. Such an analysis application typically reuses existing programs as components and is often required to continuously process new data with low latency while processing large-scale data on distributed computation nodes. However, existing frameworks for combining programs into a parallel data analysis pipeline (e.g., workflow) are plagued by the following issues: (1) Most frameworks are oriented toward high-throughput batch processing, which leads to high latency. (2) A specific language is often imposed for the composition and/or such a specific structure as a simple unidirectional dataflow among constituting tasks. (3) A program used as a component often takes a long time to start up due to the heavy load at initialization, which is referred to as the startup overhead. Our solution to these problems is a remote procedure call (RPC)-based composition, which is achieved by our middleware Rapid Service Connector (RaSC). RaSC can easily wrap an ordinary program and make it accessible as an RPC service, called a RaSC service. Using such component programs as RaSC services enables us to integrate them into one program with low latency without being restricted to a specific workflow language or dataflow structure. In addition, a RaSC service masks the startup overhead of a component program by keeping the processes of the component program alive across RPC requests. We also proposed architecture that automatically manages the number of processes to maximize the throughput. The experimental results showed that our approach excels in overall throughput as well as latency, despite its RPC overhead. We also showed that our approach can adapt to runtime changes in the throughput requirements.

リンク情報

DOI: https://doi.org/10.1109/CCGrid.2016.88
DBLP: https://dblp.uni-trier.de/rec/conf/ccgrid/TanakaTT16
Web of Science: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000382529800042&DestApp=WOS_CPL
URL: http://dblp.uni-trier.de/db/conf/ccgrid/ccgrid2016.html#conf/ccgrid/TanakaTT16

ID情報

DOI : 10.1109/CCGrid.2016.88
ISSN : 2376-4414
DBLP ID : conf/ccgrid/TanakaTT16
Web of Science ID : WOS:000382529800042

エクスポート: BibTeX RIS

田浦健次朗

論文

Low Latency and Resource-aware Program Composition for Large-scale Data Analysis

メニュー

共著者の一覧

フォロー一覧