[Marek Petrovič]: One Bit at a Time: Impact of Quantisation on NMT Robustness 17. 3. 2022
Abstract
Quantization of neural networks is one of the methods to make neural networks faster and smaller. There has been thorough research about the quantization of various models. It has already been shown, that it is possible to quantize the BERT model for integer-only inference and achieve 3x speed-up. Some papers evaluate the quantization of Transformer models for NMT. The drawback of quantization is a possible decrease in accuracy. Quantization Aware Training tries to solve this by preparing the model for quantization by estimating quantization effects during training. This behavior might have regularization effects on NMT models. In our work, we want to explore available quantization modes and their effect on NMT models' inference speed, memory efficiency, with a special focus on domain robustness (regularization effects).