1995年
SPEED INVARIANT SPEECH RECOGNITION USING VARIABLE VELOCITY DELAY-LINES
NEURAL NETWORKS
- ,
- ,
- 巻
- 8
- 号
- 2
- 開始ページ
- 167
- 終了ページ
- 177
- 記述言語
- 英語
- 掲載種別
- DOI
- 10.1016/0893-6080(94)00069-X
- 出版者・発行元
- PERGAMON-ELSEVIER SCIENCE LTD
A neural network model for speech recognition is proposed, based on neurophysiological findings of the auditory system. The first stage of the system is a feature-extracting module that is a model of the auditory pathway between the cochlea and the auditory cortex. The feature-extracting module extracts constant-frequency (CF), FM-ascending (FM-A), and FM-descending (FM-D) components. The second stage is a recognition module that is able to perform time-distortion invariant recognition without ignoring information concerning the relative lengths of each feature. This module consists of a main block and two subblocks. The recognition results are obtained from the main block. The two subblocks are used for monitoring the speed of the input pattern. Each block is a neocognitron-like network for which the first layer consists of variable-velocity delay lines. The propagation velocities of the delay lines of the upper and lower blocks are faster and slower, respectively, than that of the main block. The propagation velocities of these delay lines are controlled in such a way that the duration of the feature on the delay line of the main block is the same as the duration of a similar feature of a training pattern. This velocity control is accomplished by comparing the outputs of the two subblocks. The propagation velocities of these three delay lines are variable but the ratio of velocities is kept constant. The computer-simulated system was trained using several Japanese words. After the training was completed, the system recognized each of the words correctly without being affected by their spoken speeds.
- リンク情報
- ID情報
-
- DOI : 10.1016/0893-6080(94)00069-X
- ISSN : 0893-6080
- Web of Science ID : WOS:A1995QN84400001