👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization

[Dávid Meluš, Šárka Ščavnická]: Intelligent Back Office Work in Progress Thesis Reports 9. 11. 2023

[Dávid Meluš]:  Enhancing Quality of Optical Character Recognition for Financial Document Processing

Abstract

In the present day, a multitude of easily accessible general-purpose Optical Character Recognition (OCR) systems with high accuracy have emerged. However, certain applications require an even higher level of precision, prompting the need to fine-tune these OCR tools to suit specific domains.
In this discussion, we will explore techniques for fine-tuning OCR models tailored to a specific domain, faced with limited training data. We will also explore the practical application of these methods in the context of our use case which involves the processing of invoice documents within our pipeline.
Lastly, we will discuss our current findings and outcomes.

Slides

TBA

Readings

[1] GELETKA, Martin, Mikuláš BANKOVIČ, Dávid MELUŠ, Šárka ŠČAVNICKÁ, Michal ŠTEFÁNIK a Petr SOJKA. Information Extraction from Business Documents. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Recent Advances in Slavonic Natural Language Processing (RASLAN 2022). Brno: Tribun EU, 2022. s. 35-46. ISBN 978-80-263-1752-4. https://nlp.fi.muni.cz/raslan/2022/paper18.pdf

[Šárka Ščavnická]: CIVQA - Czech Invoice Visual Question Answering

Abstract

In recent years, Document Intelligence, also called Document AI or Intelligent
Document Understanding has become increasingly popular across multiple
industries, resulting in a large amount of research in this area. Document AI
transforms how businesses and organizations process, store, and analyze vast
amounts of data. DVQA is part of the Document AI, and it seeks to obtain
knowledge from the documents' visual and textual elements to answer questions.
The asked questions may relate to different parts of the examined document, not
only the text part; for example, they may refer to inserted images, tables, and
forms, but they may also refer to the overall arrangement of the text. We
decided to create CIVQA datasets and models to enable these promising better
processing of invoices to other people.

Slides

Readings


  1. https://openaccess.thecvf.com/content/CVPR2022/papers/Ding_V-Doc_Visual_Questions_Answers_With_Documents_CVPR_2022_paper.pdf

Recording (both lectures)

Catering

Camembert.