•


•


Data visualization –
Text


The first step
•Text has to be turned into data
•Quantitative coding
•Human coding
•„automatic“ (machine processed) coding
•Almost always inclde some decision of human
•Frequency analysis (Wordclouds)
•Sentiment analysis - Is the text positive or negative?
•Topic models - what topics are in the text(s)
•Analysis of readability (how complex the text is?)
•Analysis of „positions“

•Allocating recording units to substantial categories
•Classifying each coded unit of text from the sample according to the category scheme
•Coding scheme
•Before coding (deductive)
•During coding (inductive)
•Unit: page, paragraph, sentence, quasisentence
•

Quasi-sentence
•an argument or phrase which is the verbal expression of one idea or issue
•One sentence may include more ideas, one idea may be divided into more sentences.
•
•I am going to buy bread, milk and apple.
•
•I am going to buy bread.
•I am going to buy milk.
•I am going to buy apple.
•

Text elements


Problems
•Language
•Tenses, adjective, male/female
•Common words (stopwords)
•Reliability of human coding
•Different understaning by different coders
•Noise in automatic coding
•
•

Wordcloud


World cloud procedure
•Find a text
•Do lemmatization
•
•

lemmatization
•https://lindat.mff.cuni.cz/services/morphodita/
•
Insert text here

Count and clean in excel
•Copy table „output“ into excel (export into xlm is not working well)
•Use subtotal or pivot table to count the words
•Delete stop words
•Words which do not make difference between clouds
•The, a, an, it, I, we, they, their, have, for, in,…
•
•Save as csv with names:
•weight word color url
•Weight is count, color and url may be empty
•
•

https://www.wordclouds.com/


•The cloud can be made from text without processing in excel
•Low control over stopwords
•
•Do not use crazy colors, shapes and text orientations
•Use the same font for all words

•