講演・口頭発表等

国際会議
2020年5月

Automatic Evaluation of Voice Severity using Deep Neural Network

The Voice Foundation's VIRTUAL VOICE SYMPOSIUM Care of the Professional Voice
  • Shunsuke Hidaka
  • ,
  • Yogaku Lee
  • ,
  • Kohei Wakamiya
  • ,
  • Takashi Nakagawa
  • ,
  • Tokihiko Kaburagi

開催年月日
2020年5月27日 - 2020年5月31日
記述言語
英語
会議種別
開催地
Online

Introduction: Perceptual evaluation of voice quality (e.g., the GRBAS scale or CAPE-V) is used widely in laryngological practice. However, this method suffers from the lack of reproducibility caused by inter- and intra-rater variability. To date, it has been a topic of discussion among clinicians how to improve the reliability of judgement. Objective: The purpose of this study was to solve the inevitable problem of perceptual evaluation by building an automatic evaluation system. Understandably, automatic evaluation is surely reproducible (i.e., reliable). Moreover, the system was required to output meaningful judgements (i.e., to be valid). Methods: We constructed a deep neural network (DNN) that estimated all the scores of the GRBAS scale. DNN was composed of Bidirectional GRUs and fully connected layers. As the acoustic feature, we compared spectrogram and mel-spectrogram of speech samples obtained using sustained vowel /a/. The dataset for supervised learning was composed of 3118 samples. All true labels were given by an otolaryngologist. Results: The performance of the system was measured in terms of accuracy and statistical agreement index Cohen’s linearly weighted Kappa. Five-fold cross validation showed the accuracy of 60% on average. The Kappa scores of GBAS were “moderate” and that of R was “fair.” For all the GRBAS, the performance was higher when using mel-spectrogram. Conclusions: Our study showed the feasibility of automatic evaluation. In order to indicate how valid the system performance is, future studies could investigate inter- and intra-rater variability for our dataset.