Machine learning and natural language processing
1. Course overview, a sample text (pre)processing pipeline
Pa164 lecture 01
Pa164 lecture 01
A Python notebook supporting the introductory lecture part dealing with a sample text (pre)processing pipeline:
Additional readings:
Recommended criteria for selecting publications for your posters and projects:
- Influential (or at least potentially influential) journal article or conference paper dealing with a ML topic in NLP are acceptable. The actual topic is up to you.
- For journals, Q1 ranking in at least one journal field (and at least Q2 in others, if applicable) is a good indicator that corresponding articles may be influential. The following link can be used to determine the quartile ranking of journals: https://www.scimagojr.com/journalrank.php
- Some examples: Journal of Machine Learning Research, Artificial Intelligence
- For conferences, rank A is a good indicator that corresponding papers may be influential. The following link can be used to determine the ranking of conferences: http://portal.core.edu.au/conf-ranks/
- Alternatively, the high influence of a publication may be determined by its citation count - if a publication has at least 1000 citations on Google Scholar, it's very likely rather influential even if it comes from a venue that is not highly-ranked (sometimes it doesn't even have to be peer-reviewed).
- Some examples: Bag of tricks for efficient text classification, Efficient estimation of word representations in vector space (some of the most influential papers related to word embeddings, published on arXiv)
- Last but not least, the publication should come with code and data, otherwise the reproducibility you'll need for your projects may be a rather tricky business.
QUESTIONS AND TASKS:
Natural language (pre)processing techniques and their relevance for building machine learning models applicable to text
Bag of words representation of text - pros and cons