Continual Learning with Superposition in Transformers
DOI:
https://doi.org/10.31449/upinf.183Keywords:
deep learning, continual learning, machine learning, superposition, transformer, text classificationAbstract
In many machine learning applications, new data is continuously collected, e.g., in healthcare, for weather forecasting etc. Researchers often want a system that allows for continuous learning of new information. This is extremely important even in the case when not all data can be stored indefinitely. The biggest challenge in continual machine learning is the tendency of neural models to forget previously learned information after a certain time. To reduce model forgetting, our continual learning method uses superposition with binary contexts, which require negligible additional memory. We focus on transformer-based neural networks, comparing our approach with several prominent continual learning methods on a set of natural language processing classification tasks. On average, we achieved the best results: 4.6% and 3.0% boost in AUROC (area under the receiver operating characteristic) and AUPRC (area under the precision-recall curve), respectively.