Automatic Evaluation of Voice Severity using Deep Neural Network

The Voice Foundation's VIRTUAL VOICE SYMPOSIUM Care of the Professional Voice

Shunsuke Hidaka
Yogaku Lee
Kohei Wakamiya
Takashi Nakagawa
Tokihiko Kaburagi

開催年月日: 2020年5月27日 - 2020年5月31日
記述言語: 英語
会議種別
開催地: Online

Introduction: Perceptual evaluation of voice quality (e.g., the GRBAS scale or CAPE-V) is used widely in laryngological practice. However, this method suffers from the lack of reproducibility caused by inter- and intra-rater variability. To date, it has been a topic of discussion among clinicians how to improve the reliability of judgement. Objective: The purpose of this study was to solve the inevitable problem of perceptual evaluation by building an automatic evaluation system. Understandably, automatic evaluation is surely reproducible (i.e., reliable). Moreover, the system was required to output meaningful judgements (i.e., to be valid). Methods: We constructed a deep neural network (DNN) that estimated all the scores of the GRBAS scale. DNN was composed of Bidirectional GRUs and fully connected layers. As the acoustic feature, we compared spectrogram and mel-spectrogram of speech samples obtained using sustained vowel /a/. The dataset for supervised learning was composed of 3118 samples. All true labels were given by an otolaryngologist. Results: The performance of the system was measured in terms of accuracy and statistical agreement index Cohen’s linearly weighted Kappa. Five-fold cross validation showed the accuracy of 60% on average. The Kappa scores of GBAS were “moderate” and that of R was “fair.” For all the GRBAS, the performance was higher when using mel-spectrogram. Conclusions: Our study showed the feasibility of automatic evaluation. In order to indicate how valid the system performance is, future studies could investigate inter- and intra-rater variability for our dataset.

若宮幸平

講演・口頭発表等

Automatic Evaluation of Voice Severity using Deep Neural Network

メニュー

共著者の一覧