# Assignment 4

# The goal of this exercise is to create a running script, thus it is sufficient to upload a script file. You may use the script provided in the lesson.
# The provided data set consists of 100 lematized newspaper articles from the daily MF Dnes (folder “ta_task_2”). Texts are lemmatized and consist only out of nouns, adjectives and verbs. There is no additional pre-processing. The corpus of texts consists of 21 106 individual words, mean length of an article is 211 words. 

# 1.	Prepare R for analysis (1 pt)
# a.	Use commands to set proper working directory
# b.	Use commands to load proper packages

# 2.	Load texts and create a corpus (2 pt)

# 3.	Tokenize texts into individual words, bigrams and trigrams (2 pt)

# 4.	Create document-feature matrices (2 pt)

# 5.	Obtain most frequent words, bigrams and trigrams. Set a reasonable minimal frequency (explore available functions and their outputs) (2 pts)

# 6.	Create a wordcloud of most frequent words (1 pts)

# Bonus: Pick 3 meaningful keywords out of the most frequent words (step 5) and create a keywords-in-context output (object) for each one of them (2 pts)