Natural Language Processing Summary Pavel Rychlý 11 Dec 2023 Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 1 / 18 Problems with NLP Problems with NLP Zipf‘s law Ambiguity Variability Approaches symbolic (rule-based) no data available statistical neural (deep learning) huge data available Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 2 / 18 Statistical NLP counts keywords collocations, multi-word units language modeling Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 3 / 18 Language Modeling probability of senteces, chain rule n-grams, Markov’s assumption p(W) = i p(wi|wi−2, wi−1) maximum-likelihood estimation gives zero probabilities smoothing evaluation using cross entropy, perplexity Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 4 / 18 Text Classification applications Naive Bayes Classifier evaluation: precision recall accuracy Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 5 / 18 Continuous Space Reprasentation words as vectors, word embeddings methods of learning vectors evaluation of words embeddings optional homework: Stability of word embeddings Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 6 / 18 HW: Stability of word embeddings Choose one or more methods for creating word embeddings (word2vec, FastText, GloVe, ...), run the traning on same data with different parameters (and/or epochs), evaluation stability. Stability can be computed in several ways: 1. How many pair similarities are same. It can be computed on the whole vocabulary on a sample (for example: 10 words with frequences from [100, 400, 1600, 6400, 25600, ...]). 2. Percentage of changes in analogy tasks. Same percentage in the taks doesn’t mean the same succesful analogy items. Calculate how many items changed successful/unsuccessful estimation. 3. Percentage of changes in the Outlier Detection task Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 7 / 18 Neural Networks structure of NN matrix representation activation functions NN training stochastic gradient descent backpropagation sub-word tokenization opt. hw: subword coverage Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 8 / 18 Recurrent NN language modeling using NN training RNN problems in training RNN LSTM Bidirectional, multi layer RNN Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 9 / 18 Simple NLP using NN Named Entity Recognition (NER) language modeling training evaluation opt. hw: NN for adding accents Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 10 / 18 Machine translation sequence to sequence RNN attention decoding, beam search MT evaluation: BLEU, ChrF++ Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 11 / 18 Transformers encoder, decoder encoding positon attention structure Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 12 / 18 Pretrained models Encoder only Decoder only Encoder-decoder training objectives BERT, GPT, T5 Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 13 / 18 Question Answering QA types usage reading comprehension applying NN for QA Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 14 / 18 Recipe for Training NN NN training fails silently 1. Become one with the data 2. Set up the end-to-end training/evaluation skeleton + get dumb baselines 3. Overfit 4. Regularize 5. Tune 6. Squeeze out the juice Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 15 / 18 Where to start Hugging Face models code pre-trained, ready to use datasets sometimes with evaluaton transformers library very complex 3 implementations: Jax, PyTorch, TensorFlow Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 16 / 18 Pre-trained models OpenLLM llama.cpp run the LLaMA model using 4-bit integer quantization on a MacBook optimizations float16, bfloat16 quantization Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 17 / 18 Training from scratch nanoGPT easy to read minimal dependencies nanoT5 train T5 on 1xA100 GPU in less than 24 hours Pavel Rychlý ·Natural Language Processing ·11 Dec 2023 18 / 18