Machine learning and natural language processing

1. Course overview, a sample text (pre)processing pipeline

Pa164 lecture 01
PDF ke stažení

A Python notebook supporting the introductory lecture part dealing with a sample  text (pre)processing pipeline:

Additional readings:


Recommended criteria for selecting publications for your posters and projects:

  • Influential (or at least potentially influential) journal article or conference paper dealing with a ML topic in NLP are acceptable. The actual topic is up to you.
  • For journals, Q1 ranking in at least one journal field (and at least Q2 in others, if applicable) is a good indicator that corresponding articles may be influential. The following link can be used to determine the quartile ranking of journals: https://www.scimagojr.com/journalrank.php
    • Some examples: Journal of Machine Learning Research, Artificial Intelligence
  • For conferences, rank A is a good indicator that corresponding papers may be influential. The following link can be used to determine the ranking of conferences: http://portal.core.edu.au/conf-ranks/
  • Alternatively, the high influence of a publication may be determined by its citation count - if a publication has at least 1000 citations on Google Scholar, it's very likely rather influential even if it comes from a venue that is not highly-ranked (sometimes it doesn't even have to be peer-reviewed).
  • Last but not least, the publication should come with code and data, otherwise the reproducibility you'll need for your projects may be a rather tricky business.


QUESTIONS AND TASKS:

  • Natural language (pre)processing techniques and their relevance for building machine learning models applicable to text

  • Bag of words representation of text - pros and cons