Detecting Idiomatic Multi-Word Expressions Using Word Embeddings
DOI:
https://doi.org/10.31449/upinf.63Keywords:
multi-word expressions, natural language processing, text mining, word embeddingsAbstract
The presence of idioms presents problems for many tasks in natural language processing as they can be hard for computers to detect. Detecting such expressions and correctly determining their meaning has not yet been fully solved. In recent years, several methods for constructing contextual word embeddings have been proposed, which are capable of detecting different meanings of the same word based on its context. Such embeddings should be well-suited to detecting idioms. Current approaches either do not use embeddings or use non-contextual embeddings. We show that we can use contextual word embeddings to differentiate between literal and idiomatic word use. We extract various features (e.g., the contextual vectors and distance to the mean contextual vector for each word) and show that they can be useful for detecting idiomatic word expressions present in the GloWbE corpus of English texts.