Laboratory of Electronic and Multimedia Applications (Research Section)
[Martin Geletka]: Visual Document Understanding 14. 4. 2022
Abstract
We will present the individual task in the area of Visual Document Understanding. We will show this theoretical task on the practical need for Intelligent Back Office. We will describe interesting approaches that combine the information from Images and Text.
Presentation
Visual Document Understanding
Slides presented at the seminar on April 14, 2022
Visual Document Understanding
Slides presented at the seminar on April 14, 2022
2022-04-14-geletka.mp4
Záznam přednášky Martina Geletky 14. 4. 2022
Readings
- Xang, Y. (2020). LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding (https://arxiv.org/abs/2012.14740)
- Kim, G. (2022). Donut: Document Understanding Transformer without OCR
(https://arxiv.org/abs/2111.15664) - Li, M (2021). TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
(https://arxiv.org/abs/2109.10282)