Machine learning and natural language processing

1. Course overview. Overview of text pre-processing

Evaluation

  • 20 p. poster

  • 50 p. project

  • 30 p. final exam (obligatory, must obtain at least 15 p.)

<50 F, <60 E, <70 D, <80 C, <90 B, >=90 A; zápočet: >= 45 p.


A Python notebook supporting the lecture part on typical NLP pipelines:

Additional readings:


Guidelines for publications to be selected as your poster topics:

  • Influential (or at least potentially influential) journal article or conference paper dealing with a ML topic in NLP are acceptable. The actual topic is up to you.
  • For journals, Q1 ranking in at least one journal field (and at least Q2 in others, if applicable) is a good indicator that corresponding journal articles may be influential. The following link can be used to determine the quartile ranking of journals: https://www.scimagojr.com/journalrank.php
    • Some examples: Journal of Machine Learning Research, Artificial Intelligence
  • For conferences, rank A is a good indicator that corresponding journal papers may be influential. The following link can be used to determine the ranking of conferences: http://portal.core.edu.au/conf-ranks/
    • Some examples: AAAI, COLING, ACL, NeurIPS
  • Alternatively, the high influence of a publication may be determined by its citation count - if a publication has at least 1000 citations on Google Scholar, it's very likely rather influential even if it comes from a venue that is not highly-ranked (sometimes it doesn't even have to be peer-reviewed).


QUESTIONS AND TASKS:

  • Natural language (pre)processing techniques and their relevance for building machine learning models applicable to text

  • Bag of words representation of text - pros and cons