Machine learning and natural language processing

1. Course overview, a sample text (pre)processing pipeline

Influential (or at least potentially influential) journal article or conference paper dealing with a ML topic in NLP are acceptable. The actual topic is up to you.
For journals, Q1 ranking in at least one journal field (and at least Q2 in others, if applicable) is a good indicator that corresponding articles may be influential. The following link can be used to determine the quartile ranking of journals: https://www.scimagojr.com/journalrank.php
- Some examples: Journal of Machine Learning Research, Artificial Intelligence
For conferences, rank A is a good indicator that corresponding papers may be influential. The following link can be used to determine the ranking of conferences: http://portal.core.edu.au/conf-ranks/
- Some examples: AAAI, COLING, ACL, NeurIPS
Alternatively, the high influence of a publication may be determined by its citation count - if a publication has at least 1000 citations on Google Scholar, it's very likely rather influential even if it comes from a venue that is not highly-ranked (sometimes it doesn't even have to be peer-reviewed).

Last but not least, the publication should come with code and data, otherwise the reproducibility you'll need for your projects may be a rather tricky business.

Natural language (pre)processing techniques and their relevance for building machine learning models applicable to text
Bag of words representation of text - pros and cons

Interaktivní osnova