Machine learning and natural language processing
1. Course overview. Overview of text pre-processing
Evaluation
20 p. poster
50 p. project
30 p. final exam (obligatory, must obtain at least 15 p.)
<50 F, <60 E, <70 D, <80 C, <90 B, >=90 A; zápočet: >= 45 p.
A Python notebook supporting the lecture part on typical NLP pipelines:
Additional readings:
Guidelines for publications to be selected as your poster topics:
- Influential (or at least potentially influential) journal article or conference paper dealing with a ML topic in NLP are acceptable. The actual topic is up to you.
- For journals, Q1 ranking in at least one journal field (and at least Q2 in others, if applicable) is a good indicator that corresponding journal articles may be influential. The following link can be used to determine the quartile ranking of journals: https://www.scimagojr.com/journalrank.php
- Some examples: Journal of Machine Learning Research, Artificial Intelligence
- For conferences, rank A is a good indicator that corresponding journal papers may be influential. The following link can be used to determine the ranking of conferences: http://portal.core.edu.au/conf-ranks/
- Some examples: AAAI, COLING, ACL, NeurIPS
- Alternatively, the high influence of a publication may be determined by its citation count - if a publication has at least 1000 citations on Google Scholar, it's very likely rather influential even if it comes from a venue that is not highly-ranked (sometimes it doesn't even have to be peer-reviewed).
- Some examples: Bag of tricks for efficient text classification, Efficient estimation of word representations in vector space (some of the most influential papers related to word embeddings, published on arXiv)
QUESTIONS AND TASKS:
Natural language (pre)processing techniques and their relevance for building machine learning models applicable to text
Bag of words representation of text - pros and cons