Natural Language Processing
Summary
Pavel Rychlý
11 Dec 2023
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 1 / 18
Problems with NLP
Problems with NLP
Zipf‘s law
Ambiguity
Variability
Approaches
symbolic (rule-based)
no data available
statistical
neural (deep learning)
huge data available
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 2 / 18
Statistical NLP
counts
keywords
collocations, multi-word units
language modeling
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 3 / 18
Language Modeling
probability of senteces, chain rule
n-grams, Markov’s assumption
p(W) = i p(wi|wi−2, wi−1)
maximum-likelihood estimation gives zero probabilities
smoothing
evaluation using cross entropy, perplexity
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 4 / 18
Text Classiﬁcation
applications
Naive Bayes Classiﬁer
evaluation:
precision
recall
accuracy
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 5 / 18
Continuous Space Reprasentation
words as vectors, word embeddings
methods of learning vectors
evaluation of words embeddings
optional homework: Stability of word embeddings
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 6 / 18
HW: Stability of word embeddings
Choose one or more methods for creating word embeddings (word2vec,
FastText, GloVe, ...), run the traning on same data with different
parameters (and/or epochs), evaluation stability.
Stability can be computed in several ways:
1. How many pair similarities are same. It can be computed on the whole
vocabulary on a sample (for example: 10 words with frequences from
[100, 400, 1600, 6400, 25600, ...]).
2. Percentage of changes in analogy tasks. Same percentage in the taks
doesn’t mean the same succesful analogy items. Calculate how many
items changed successful/unsuccessful estimation.
3. Percentage of changes in the Outlier Detection task
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 7 / 18
Neural Networks
structure of NN
matrix representation
activation functions
NN training
stochastic gradient descent
backpropagation
sub-word tokenization
opt. hw: subword coverage
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 8 / 18
Recurrent NN
language modeling using NN
training RNN
problems in training RNN
LSTM
Bidirectional, multi layer RNN
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 9 / 18
Simple NLP using NN
Named Entity Recognition (NER)
language modeling
training
evaluation
opt. hw: NN for adding accents
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 10 / 18
Machine translation
sequence to sequence RNN
attention
decoding, beam search
MT evaluation: BLEU, ChrF++
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 11 / 18
Transformers
encoder, decoder
encoding positon
attention structure
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 12 / 18
Pretrained models
Encoder only
Decoder only
Encoder-decoder
training objectives
BERT, GPT, T5
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 13 / 18
Question Answering
QA types
usage
reading comprehension
applying NN for QA
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 14 / 18
Recipe for Training NN
NN training fails silently
1. Become one with the data
2. Set up the end-to-end training/evaluation skeleton + get dumb
baselines
3. Overﬁt
4. Regularize
5. Tune
6. Squeeze out the juice
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 15 / 18
Where to start
Hugging Face
models
code
pre-trained, ready to use
datasets
sometimes with evaluaton
transformers library
very complex
3 implementations: Jax, PyTorch, TensorFlow
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 16 / 18
Pre-trained models
OpenLLM
llama.cpp
run the LLaMA model using 4-bit integer quantization on a MacBook
optimizations
ﬂoat16, bﬂoat16
quantization
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 17 / 18
Training from scratch
nanoGPT
easy to read
minimal dependencies
nanoT5
train T5 on 1xA100 GPU in less than 24 hours
Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 18 / 18