Video Caption Dataset for Describing Human Actions in Japanese.

Proceedings of The 12th Language Resources and Evaluation Conference(LREC)

Yutaro Shigeto
Yuya Yoshikawa
Jiaqing Lin
Akikazu Takeuchi

開始ページ: 4664
終了ページ: 4670
記述言語
掲載種別: 研究論文（国際会議プロシーディングス）
出版者・発行元: European Language Resources Association

In recent years, automatic video caption generation has attracted
considerable attention. This paper focuses on the generation of Japanese
captions for describing human actions. While most currently available video
caption datasets have been constructed for English, there is no equivalent
Japanese dataset. To address this, we constructed a large-scale Japanese video
caption dataset consisting of 79,822 videos and 399,233 captions. Each caption
in our dataset describes a video in the form of "who does what and where." To
describe human actions, it is important to identify the details of a person,
place, and action. Indeed, when we describe human actions, we usually mention
the scene, person, and action. In our experiments, we evaluated two caption
generation methods to obtain benchmark results. Further, we investigated
whether those generation methods could specify "who does what and where."

リンク情報

DBLP: https://dblp.uni-trier.de/rec/conf/lrec/ShigetoYLT20
arXiv: http://arxiv.org/abs/arXiv:2003.04865
URL: https://www.aclweb.org/anthology/2020.lrec-1.574/
URL: https://dblp.uni-trier.de/conf/lrec/2020
URL: https://dblp.uni-trier.de/db/conf/lrec/lrec2020.html#ShigetoYLT20

ID情報

DBLP ID : conf/lrec/ShigetoYLT20
arXiv ID : arXiv:2003.04865

エクスポート: BibTeX RIS

吉川友也

論文

Video Caption Dataset for Describing Human Actions in Japanese.

メニュー

共著者の一覧