# Assignment 4 # The goal of this exercise is to create a running script, thus it is sufficient to upload a script file. You may use the script provided in the lesson. # The provided data set consists of 100 lematized newspaper articles from the daily MF Dnes (folder “ta_task_2”). Texts are lemmatized and consist only out of nouns, adjectives and verbs. There is no additional pre-processing. The corpus of texts consists of 21 106 individual words, mean length of an article is 211 words. # 1. Prepare R for analysis (1 pt) # a. Use commands to set proper working directory # b. Use commands to load proper packages # 2. Load texts and create a corpus (2 pt) # 3. Tokenize texts into individual words, bigrams and trigrams (2 pt) # 4. Create document-feature matrices (2 pt) # 5. Obtain most frequent words, bigrams and trigrams. Set a reasonable minimal frequency (explore available functions and their outputs) (2 pts) # 6. Create a wordcloud of most frequent words (1 pts) # Bonus: Pick 3 meaningful keywords out of the most frequent words (step 5) and create a keywords-in-context output (object) for each one of them (2 pts)