Seminární skupina 02 předmětu Laboratoř elektronických a multimediálních aplikací

[Mikuláš Bankovič & Dalibor Bačovský] Application of SR for OCR and Hyphenation in Subword LMs 15. 4. 2021

[Mikuláš Bankovič] Application of Super-Resolution in Optical Character Recognition

Abstract

The goal of this presentation is to briefly introduce Super-Resolution models and show their benefit for optical character recognition. Firstly I will talk about involved datasets, OCR engines, and super-resolution models and their setting for our experiments. Then I will discuss my results and conclusion about super-resolution.

Readings

Single Image Super Resolution: SRCNN and ESPCN
A talk by Mikuláš Bankovič at PV211 on October 22, 2020
Seminář 15. 4. 2021 10:00
Lecture by Mikuláš Bankovič from 2021-04-15

Application of super-resolution on OCR of historical documents
A talk by Mikuláš Bankovič at PV173 on April 14, 2021

[Dalibor Bačovský] Hyphenation in Subword Language Models

Abstract

Unsupervised word embeddings are useful for various NLP tasks. The fastText model uses subword n-grams to enhance the word embeddings. The subword n-gram size is usually overlooked together why some values might perform better than others. And also whether there are other useful models than just ones based on subword n-grams.
In this presentation, we'll show different subword representations based on BPE, word segmentation/hyphenation, and methods based on stemming/lemmatization. We'll also present statistics on why some models might perform better than others.

Readings

Enriching Word Vectors with Subword Information
A 2017 paper by Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomáš Mikolov


2021-04-15-PV174-02-Bacovsky.mp4
Lecture by Dalibor Bačovský from 2021-04-15