MISC

2011年2月15日

Web Tracking Site Detection Based on Temporal Link Analysis and Automatic Blacklist Generation (特集 分散処理とネットワークサービス) -- (ネットワークセキュリティ)

情報処理学会論文誌
  • Akira Yamada
  • ,
  • Masanori Hara
  • ,
  • Yutaka Miyake

52
2
開始ページ
633
終了ページ
644
記述言語
英語
掲載種別
出版者・発行元
情報処理学会

Web tracking sites or Web bugs are potential but serious threats to users' privacy during Web browsing. Web sites and their associated advertising sites surreptitiously gather the profiles of visitors and possibly abuse or improperly expose them, even if visitors are unaware their profiles are being utilized. In order to prevent such sites in a corporate network, most companies employ filters that rely on blacklists, but these lists are insufficient. In this paper, we propose Web tracking sites detection and blacklist generation based on temporal link analysis. Our proposal analyzes traffic at the network gateway so that it can monitor all tracking sites in the administrative network. The proposed algorithm constructs a graph between sites and their visited time in order to characterize each site. Then, the system classifies suspicious sites using machine learning. We confirm that public black lists contain at most 22-70% of the known tracking sites respectively. The machine learning can identify the blacklisted sites with true positive rate, 62-73%, which is more accurate than any single blacklist. Although the learning algorithm falsely identified 15% of unlisted sites, 96% of these are verified to be unknown tracking sites by means of a manual labeling. These unknown tracking sites can serve as good candidates for an entry of a new backlist.Web tracking sites or Web bugs are potential but serious threats to users' privacy during Web browsing. Web sites and their associated advertising sites surreptitiously gather the profiles of visitors and possibly abuse or improperly expose them, even if visitors are unaware their profiles are being utilized. In order to prevent such sites in a corporate network, most companies employ filters that rely on blacklists, but these lists are insufficient. In this paper, we propose Web tracking sites detection and blacklist generation based on temporal link analysis. Our proposal analyzes traffic at the network gateway so that it can monitor all tracking sites in the administrative network. The proposed algorithm constructs a graph between sites and their visited time in order to characterize each site. Then, the system classifies suspicious sites using machine learning. We confirm that public black lists contain at most 22-70% of the known tracking sites respectively. The machine learning can identify the blacklisted sites with true positive rate, 62-73%, which is more accurate than any single blacklist. Although the learning algorithm falsely identified 15% of unlisted sites, 96% of these are verified to be unknown tracking sites by means of a manual labeling. These unknown tracking sites can serve as good candidates for an entry of a new backlist.

リンク情報
CiNii Articles
http://ci.nii.ac.jp/naid/110008507902
CiNii Books
http://ci.nii.ac.jp/ncid/AN00116647
URL
http://id.ndl.go.jp/bib/024165425
URL
http://id.nii.ac.jp/1001/00072828/
ID情報
  • ISSN : 1882-7764
  • CiNii Articles ID : 110008507902
  • CiNii Books ID : AN00116647

エクスポート
BibTeX RIS