2020年11月
A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction.
Findings of the Association for Computational Linguistics: EMNLP 2020
- ,
- ,
- ,
- ,
- 開始ページ
- 267
- 終了ページ
- 280
- 記述言語
- 英語
- 掲載種別
- DOI
- 10.18653/v1/2020.findings-emnlp.26
- 出版者・発行元
- Association for Computational Linguistics
Existing approaches for grammatical error correction (GEC) largely rely on
supervised learning with manually created GEC datasets. However, there has been
little focus on verifying and ensuring the quality of the datasets, and on how
lower-quality data might affect GEC performance. We indeed found that there is
a non-negligible amount of "noise" where errors were inappropriately edited or
left uncorrected. To address this, we designed a self-refinement method where
the key idea is to denoise these datasets by leveraging the prediction
consistency of existing models, and outperformed strong denoising baseline
methods. We further applied task-specific techniques and achieved
state-of-the-art performance on the CoNLL-2014, JFLEG, and BEA-2019 benchmarks.
We then analyzed the effect of the proposed denoising method, and found that
our approach leads to improved coverage of corrections and facilitated fluency
edits which are reflected in higher recall and overall performance.
supervised learning with manually created GEC datasets. However, there has been
little focus on verifying and ensuring the quality of the datasets, and on how
lower-quality data might affect GEC performance. We indeed found that there is
a non-negligible amount of "noise" where errors were inappropriately edited or
left uncorrected. To address this, we designed a self-refinement method where
the key idea is to denoise these datasets by leveraging the prediction
consistency of existing models, and outperformed strong denoising baseline
methods. We further applied task-specific techniques and achieved
state-of-the-art performance on the CoNLL-2014, JFLEG, and BEA-2019 benchmarks.
We then analyzed the effect of the proposed denoising method, and found that
our approach leads to improved coverage of corrections and facilitated fluency
edits which are reflected in higher recall and overall performance.
- リンク情報
-
- DOI
- https://doi.org/10.18653/v1/2020.findings-emnlp.26
- DBLP
- https://dblp.uni-trier.de/rec/conf/emnlp/MitaKKSI20
- arXiv
- http://arxiv.org/abs/arXiv:2010.03155
- URL
- https://www.aclweb.org/anthology/2020.findings-emnlp.26/
- URL
- https://dblp.uni-trier.de/conf/emnlp/2020f
- URL
- https://dblp.uni-trier.de/db/conf/emnlp/emnlp2020f.html#MitaKKSI20
- ID情報
-
- DOI : 10.18653/v1/2020.findings-emnlp.26
- DBLP ID : conf/emnlp/MitaKKSI20
- arXiv ID : arXiv:2010.03155