👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization

[Ondřej Sojka]: Transfer Learning of Slavic Syllabification for Hyphenation Patterns

Abstract

Hyphenation patterns play a vital role in enhancing the readability and aesthetics of text, especially for Slavic languages. Current hyphenation systems for many Slavic languages are outdated, sometimes relying on manually created patterns with limited effectiveness. We explore the transfer learning of syllabic hyphenation patterns across multiple Slavic languages to develop improved, data-driven hyphenation systems.

By using the International Phonetic Alphabet (IPA) as an intermediary, this research transfers hyphenation patterns between related Slavic languages, creating a unified set of IPA-based rules. These IPA patterns are then used to generate language-specific hyphenation patterns for each target language. The proposed approach aims to develop reliable hyphenation patterns using machine learning methods, improving syllabification across multiple languages.

Although the work is ongoing, early results indicate promising improvements, particularly for Ukrainians. The new patterns are intended to be practical and easy to reproduce, ultimately contributing to better text layout quality for Slavic languages.


Readings


Lecture recordings