Pristopi k denormalizaciji besedil: pregled področja

Melanija Vezočnik; Marko Bajec

doi:10.31449/upinf.249

Authors

Melanija Vezočnik University of Ljubljana, Faculty of Computer and Information Science
Marko Bajec University of Ljubljana, Faculty of Computer and Information Science

DOI:

https://doi.org/10.31449/upinf.249

Keywords:

automatic speech recognition, inverse text normalization, review, text denormalization

Abstract

Modern automatic speech recognition systems effectively convert spoken language into written text. However, they often produce only a raw transcript without properly formatted numbers, dates, and time expressions, which can reduce readability and usability. Denormalization is a process that addresses these issues by transforming the transcript into a standardized written form. This article provides a systematic review and analysis of the main approaches to denormalization, which can be classified into three categories: rule-based, neural, and hybrid approaches. Rule-based approaches typically rely on finite-state machines, while neural approaches utilize neural networks. Hybrid approaches combine elements of both approaches. Rule-based approaches achieve high accuracy but tend to overlook the context of the text. In contrast, neural approaches consider the context but require large amounts of training data. Hybrid approaches offer a balanced solution that harnesses the strengths of both approaches. This work contributes to understanding the challenges and improving the efficiency of denormalization systems.

Author Biographies

Melanija Vezočnik, University of Ljubljana, Faculty of Computer and Information Science

Melanija Vezočnik je asistentka v Laboratoriju za podatkovne tehnologije na Fakulteti za računalništvo in informatiko Univerze v Ljubljani. Njeno trenutno raziskovalno delo je usmerjeno v področje govornih in jezikovnih tehnologij. Leta 2023 je na isti fakulteti doktorirala iz računalništva in informatike. Med doktorskim študijem se je raziskovalno ukvarjala z analizo hoje z inercijskimi senzorji, še posebej z oceno dolžine koraka. Rezultate svojega raziskovalnega dela redno objavlja v znanstvenih revijah, za raziskovalne dosežke med doktorskih študijem pa je prejela priznanje dekanje.

Marko Bajec, University of Ljubljana, Faculty of Computer and Information Science

Marko Bajec je redni profesor na Fakulteti za računalništvo in informatiko (Univerza v Ljubljani) in vodja Laboratorija za podatkovne tehnologije. Ukvarja se z načrtovanjem in razvojem podatkovno intenzivnih sistemov. V zadnjih letih se posveča jezikovnim in govornim tehnologijam ter digitalizaciji slovenskega jezika. Svoje rezultate redno objavlja v domačih in tujih revijah ter konferencah. Je prejemnik več nagrad in priznanj za raziskovalno, pedagoško in aplikativno delo.