Automatic Slovene speech recognition using deep neural networks

Authors

  • Matej Ulčar Univerza v Ljubljani, Fakulteta za računalništvo in informatiko
  • Simon Dobrišek Univerza v Ljubljani, Fakulteta za računalništvo in informatiko
  • Marko Robnik-Šikonja Univerza v Ljubljani, Fakulteta za računalništvo in informatiko

DOI:

https://doi.org/10.31449/upinf.53

Keywords:

machine learning, deep neural networks, speech recognition, speech technologies, natural language processing

Abstract

Recently, deep neural networks are becoming a predominant approach to automatic speech recognition, replacing classical acoustical modelling using GMM and HMM models and n-grams for language model. For recognition of spoken Slovene, we developed and tested several architectures of time-delayed neural networks and neural networks with long short-term memory for both acoustic and language model in Kaldi environment. We used a large lexicon, containing about a million words. Time-delayed neural networks achieved the best results on continuous speech, with 27.16% error according to WER criterion. Preliminary results show better performance than Google’s speech-to-text model, but more testing is needed for a statistically valid comparison.

Published

2019-09-27

How to Cite

[1]
Ulčar, M., Dobrišek, S. and Robnik-Šikonja, M. 2019. Automatic Slovene speech recognition using deep neural networks. Applied Informatics. 27, 3 (Sep. 2019). DOI:https://doi.org/10.31449/upinf.53.

Issue

Section

Scientific articles