論文

査読有り
2020年

Video Caption Dataset for Describing Human Actions in Japanese.

Proceedings of The 12th Language Resources and Evaluation Conference(LREC)
  • Yutaro Shigeto
  • ,
  • Yuya Yoshikawa
  • ,
  • Jiaqing Lin
  • ,
  • Akikazu Takeuchi

開始ページ
4664
終了ページ
4670
記述言語
掲載種別
研究論文(国際会議プロシーディングス)
出版者・発行元
European Language Resources Association

In recent years, automatic video caption generation has attracted
considerable attention. This paper focuses on the generation of Japanese
captions for describing human actions. While most currently available video
caption datasets have been constructed for English, there is no equivalent
Japanese dataset. To address this, we constructed a large-scale Japanese video
caption dataset consisting of 79,822 videos and 399,233 captions. Each caption
in our dataset describes a video in the form of "who does what and where." To
describe human actions, it is important to identify the details of a person,
place, and action. Indeed, when we describe human actions, we usually mention
the scene, person, and action. In our experiments, we evaluated two caption
generation methods to obtain benchmark results. Further, we investigated
whether those generation methods could specify "who does what and where."

リンク情報
DBLP
https://dblp.uni-trier.de/rec/conf/lrec/ShigetoYLT20
arXiv
http://arxiv.org/abs/arXiv:2003.04865
URL
https://www.aclweb.org/anthology/2020.lrec-1.574/
URL
https://dblp.uni-trier.de/conf/lrec/2020
URL
https://dblp.uni-trier.de/db/conf/lrec/lrec2020.html#ShigetoYLT20
ID情報
  • DBLP ID : conf/lrec/ShigetoYLT20
  • arXiv ID : arXiv:2003.04865

エクスポート
BibTeX RIS