2011年2月15日
Web Tracking Site Detection Based on Temporal Link Analysis and Automatic Blacklist Generation (特集 分散処理とネットワークサービス) -- (ネットワークセキュリティ)
情報処理学会論文誌
- ,
- ,
- 巻
- 52
- 号
- 2
- 開始ページ
- 633
- 終了ページ
- 644
- 記述言語
- 英語
- 掲載種別
- 出版者・発行元
- 情報処理学会
Web tracking sites or Web bugs are potential but serious threats to users' privacy during Web browsing. Web sites and their associated advertising sites surreptitiously gather the profiles of visitors and possibly abuse or improperly expose them, even if visitors are unaware their profiles are being utilized. In order to prevent such sites in a corporate network, most companies employ filters that rely on blacklists, but these lists are insufficient. In this paper, we propose Web tracking sites detection and blacklist generation based on temporal link analysis. Our proposal analyzes traffic at the network gateway so that it can monitor all tracking sites in the administrative network. The proposed algorithm constructs a graph between sites and their visited time in order to characterize each site. Then, the system classifies suspicious sites using machine learning. We confirm that public black lists contain at most 22-70% of the known tracking sites respectively. The machine learning can identify the blacklisted sites with true positive rate, 62-73%, which is more accurate than any single blacklist. Although the learning algorithm falsely identified 15% of unlisted sites, 96% of these are verified to be unknown tracking sites by means of a manual labeling. These unknown tracking sites can serve as good candidates for an entry of a new backlist.Web tracking sites or Web bugs are potential but serious threats to users' privacy during Web browsing. Web sites and their associated advertising sites surreptitiously gather the profiles of visitors and possibly abuse or improperly expose them, even if visitors are unaware their profiles are being utilized. In order to prevent such sites in a corporate network, most companies employ filters that rely on blacklists, but these lists are insufficient. In this paper, we propose Web tracking sites detection and blacklist generation based on temporal link analysis. Our proposal analyzes traffic at the network gateway so that it can monitor all tracking sites in the administrative network. The proposed algorithm constructs a graph between sites and their visited time in order to characterize each site. Then, the system classifies suspicious sites using machine learning. We confirm that public black lists contain at most 22-70% of the known tracking sites respectively. The machine learning can identify the blacklisted sites with true positive rate, 62-73%, which is more accurate than any single blacklist. Although the learning algorithm falsely identified 15% of unlisted sites, 96% of these are verified to be unknown tracking sites by means of a manual labeling. These unknown tracking sites can serve as good candidates for an entry of a new backlist.
- リンク情報
- ID情報
-
- ISSN : 1882-7764
- CiNii Articles ID : 110008507902
- CiNii Books ID : AN00116647