Seminární skupina 02 předmětu Laboratoř elektronických a multimediálních aplikací

[Mikuláš Bankovič & Dalibor Bačovský] Application of SR for OCR and Hyphenation in Subword LMs 15. 4. 2021

[Mikuláš Bankovič] Application of Super-Resolution in Optical Character Recognition


The goal of this presentation is to briefly introduce Super-Resolution models and show their benefit for optical character recognition. Firstly I will talk about involved datasets, OCR engines, and super-resolution models and their setting for our experiments. Then I will discuss my results and conclusion about super-resolution.


Single Image Super Resolution: SRCNN and ESPCN
A talk by Mikuláš Bankovič at PV211 on October 22, 2020
Seminář 15. 4. 2021 10:00
Lecture by Mikuláš Bankovič from 2021-04-15

Application of super-resolution on OCR of historical documents
A talk by Mikuláš Bankovič at PV173 on April 14, 2021

[Dalibor Bačovský] Hyphenation in Subword Language Models


Unsupervised word embeddings are useful for various NLP tasks. The fastText model uses subword n-grams to enhance the word embeddings. The subword n-gram size is usually overlooked together why some values might perform better than others. And also whether there are other useful models than just ones based on subword n-grams.
In this presentation, we'll show different subword representations based on BPE, word segmentation/hyphenation, and methods based on stemming/lemmatization. We'll also present statistics on why some models might perform better than others.


Enriching Word Vectors with Subword Information
A 2017 paper by Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomáš Mikolov

Lecture by Dalibor Bačovský from 2021-04-15