👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization

[Šárka Ščavnická]: Multimodal Question Answering 13. 4. 2023

Abstract

Document question answering aims to provide users with accurate and efficient answers to their queries, thereby improving access to relevant information.  This task has become increasingly important in recent years due to the vast amount of digital information available on the web.

In this presentation, we present the first document question-answering model trained on Czech invoices, and we will discuss different ways to ensure that the model is able to answer unknown questions, such as searching for new entities in the text that it has not yet been trained for.

Slides

Presentation recordings

Readings

[1] Xu, Yiheng, et al. “LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding.” ArXiv abs/2104.08836 (2021)