Automatic Slovene speech recognition using deep neural networks
DOI:
https://doi.org/10.31449/upinf.53Keywords:
machine learning, deep neural networks, speech recognition, speech technologies, natural language processingAbstract
Recently, deep neural networks are becoming a predominant approach to automatic speech recognition, replacing classical acoustical modelling using GMM and HMM models and n-grams for language model. For recognition of spoken Slovene, we developed and tested several architectures of time-delayed neural networks and neural networks with long short-term memory for both acoustic and language model in Kaldi environment. We used a large lexicon, containing about a million words. Time-delayed neural networks achieved the best results on continuous speech, with 27.16% error according to WER criterion. Preliminary results show better performance than Google’s speech-to-text model, but more testing is needed for a statistically valid comparison.