oalogo2  

AUTHOR(S):

Orken Mamyrbayev, Dina Oralbekova, Mohamed Othman, Tolganay Turdalykyzy, Bagashar Zhumazhanov, Kuralai Mukhsina

 

TITLE

Investigation of Insertion-based Speech Recognition Method

pdf PDF

ABSTRACT

End-to-end models have come to the field of speech recognition, replacing traditional and hybrid ones. The basic principle of operation of modern end-to-end models is the generation of the output sequence from left to right, applying an autoregressive function during decoding. Until this time, it has not been proven that this decoding method is the best in text-to-speech technology. In addition, end-to-end models only consider previous information to predict the next output. This approach does not address the issue of speech conversion when the previous information was slurred. Thus, we began to apply the insertion method, which uses non-autoregressive generation of output data in random order. In this work, the model was trained on the basis of the insertion method and connectionist temporal classification for Kazakh speech recognition. The conducted experiments showed that this model improves the quality of Kazakh speech recognition.

KEYWORDS

automatic speech recognition, end-to-end, insertion-based, connectionist temporal classification, Transformer

 

Cite this paper

Orken Mamyrbayev, Dina Oralbekova, Mohamed Othman, Tolganay Turdalykyzy, Bagashar Zhumazhanov, Kuralai Mukhsina. (2022) Investigation of Insertion-based Speech Recognition Method. International Journal of Signal Processing, 7, 32-35

 

cc.png
Copyright © 2022 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0