02. Datové objekty Vít Gabrhel R101 2019-09-29 Harmonogram 1. Vector 2. Factor 3. Data Frame 2/24 Co je to objekt? 3/24 Data Data - FiveThirtvEight FiveThirtyEight Politics Sports Science & Health Economics Culture Politics Podcast: How Does Clinton's Assessment Of 2016 Compare W.. DEC. 9, 2015 Al 10:29 AM RECOMMENDED A Complete Catalog Of Every Time Someone Cursed Or Bled Out In A Quentin Tarantino Movie By Olmer Roeder Filed under Uord Count Get the data on GitHub Quentin Tarantinor John Travolta and Samuel L. Jackson in "Pulp Fiction.' Students At Most Colleges Don't Pick 'Useless' Majors The GOP Establishment Got What It Wanted (Sorta) In Alabama's Senate Primary Al Gore's New Movie Exposes The Big Flaw In Online Movie Ratings Trump Approval Ratings UPDfllED 15 HOURS AGO -.56.0 See all approval polls 5/24 Vector Vector Vector je jednoduchý datový objekt o různé délce obsahující hodnoty. c("Reservoir Dogs", "Pulp Fiction", "inglorious Basterds") ## [1] "Reservoir Dogs" "Pulp Fiction" "inglorious Basterds c(421, 469, 51) ## [1] 421 469 51 c(421, "Reservoir Dogs", "death", false, 10) ## [1] "421" "Reservoir Dogs" "death" "false" ## [5] "10" Vector Vytvorenia pojmenování vektorového objektu Počet cursing words dle filmů Words_Movie = c(421, 469, 57, 51) Co je co aneb pojmenování vektorů names(words_Movie) = c("Reservoir Dogs", "Pulp Fiction", "Kill Bill 1", "inglorious Basterds") Vector Výběr hodnot(y) z vektoru Words_Movie[c(l, 4)] ## Reservoir Dogs inglorious Basterds ## 421 51 words_Movie[c("Reservoir Dogs", "inglorious Basterds")] ## Reservoir Dogs inglorious Basterds ## 421 51 9/24 Vector Vektorová aritmetika Sčítání vektorů Hell = c(12, 5, 3, 4) Goddamn = c(10, 28, 7, 8) Spirituality = Hell + Goddamn Součet hodnot ve vektoru words_N <- sum(Spi ritual i ty) words_N ## [1] 77 Vector Logické operátory < for less than > for greater than <= for less than or equal to >= for greater than or equal to == for equal to each other != not equal to each other Ve kterých filmech padlo více cursing words, než byl jejich průměrný počet za filmů? words_Movie > mean(Words_Movie) ## Reservoir Dogs Pulp Fiction Kill Bill 1 ## TRUE TRUE FALSE ## inglorious Basterds ## FALSE Vector Logické operátory names(Hell) = c("Reservoir Dogs", "Pulp Fiction", "Kill Bill 1", "inglorious Basterds") names(Goddamn) = c("Reservoir Dogs", "Pulp Fiction", "Kill Bill 1", "inglorious Basterds") Hell[c(l, 4)] > Goddamn[c(l, 4)] ## Reservoir Dogs inglorious Basterds ## TRUE FALSE Hel1[c("Reservoir Dogs", "inglorious Basterds")] != Goddamn[c("Reservoir Dogs", "inglorious Basterds")] ## Reservoir Dogs inglorious Basterds ## TRUE TRUE names(Spirituality) = c("Reservoir Dogs", "Pulp Fiction", "Kill Bill 1", "inglorious Basterds") Vector Logické operátory Klelo se v Pulp Fiction ve více než 50 případech? Pul pFiction_Cel kern <- Spi ritual ity[c(2)] > 50 Pul pFi cti on_Celkem ## Pulp Fiction ## FALSE Zaznívalo ve filmech více slovo "Hell" nebo "Goddamn"? Hell < Goddamn ## Reservoir Dogs Pulp Fiction Kill Bill 1 ## FALSE TRUE TRUE ## inglorious Basterds ## TRUE Factor Factor Filmy = c("Kill Bill 1", "Reservoir Dogs", "inglorious Basterds", "Pulp Fiction") class(Filmy) ## [1] "character" Nominální kategorie Factor_Filmy = as.factor(Filmy) class(Factor_Fi1 my) ## [1] "factor" levels(Factor_Filmy) <- c("Reservoir Dogs", "Pulp Fiction", "Kill Bill 1", "inglorious Basterds") Ordinalizace Factor_Filmy <- factor(Filmy, order = TRUE, levels = c("Reservoir Dogs", "Pulp Fiction", "Kill Bill 1", "inglorious Basterds") ) 15/24 Data Frame Data Frame Data Frame je matice tak, jak ji chápeme při analýze dat • A data frame has the variables of a data set as columns and the observations as rows O cursing words v Tar antinový ch filmech už něco víme. Co ale počet mrtvých? Budeme se věnovat Pulp Fiction, Inglorious Basterds a Django Unchained spolu s počtem zesnulých postav. A Přidáme k tomu známý počet cursing words v příslušných filmech: Pulp_Fiction = c(7, 469) Inglořious_Basterds = c(48, 58) Django_Unchained = c(47, 262) Filmy <- data.frame(Pulp_Fiction, lnglorious_Basterds, Django_Unchained) view(Filmy) # otevře náhled na matici přimo v RStudio 17/24 Data Frame Manipulace s řádky/sloupci col names(Filmy) <- c("Pulp Fiction", "inglorious Basterds", "Django Unchained") rownames(Filmy) <- c("Deaths", "words") view(Filmy) rowSums(Filmy) ## Deaths words ## 102 789 col Sums(Filmy) ## Pulp Fiction inglorious Basterds Django Unchained ## 476 106 309 18/24 Data Frame Jak do matice přidat sloupec / řádek? • Skrze příkaz cbind() / rbind() Filmy si rozdělíme z hlediska období tvorby (90s, OOs a lOs) s kódy "0", "1" a "2": Period = c(0, 1, 2) Filmy_Period <- rbind(Filmy, Period) rownames(Filmy_Period) <- c("Deaths", "words", "Period") Jak příkazem zjistit aktivní objekty? ls() ## [1] "Django_Unchained" ## [4] "Filmy_Period" ## [7] "lnglorious_Basterds ## [10] "PulpFiction_Celkern" ## [13] "words_N" Factor_Filmy Goddamn" Period" Spi ritual i ty Filmy" Hell" Pulp_Fi cti on words_Movie" 19/24 Data Frame Jak vybrat konkrétní prvky z matice? • Similar to vectors, you can use the square brackets [ ] to select one or multiple elements from a data frame. • Whereas vectors have one dimension, data frames have two dimensions. You should therefore use a comma to separate that what to select from the rows from that what you want to select from the columns. For example: ° Fi 1 my_Pe r i od [1, 2 ] selects the element at the first row and second column. ° Fi 1 my_Pe r i od [1: 3 ,2 : 3] results in a matrix with the data on the rows 1, 2, 3 and columns 2 and 3. • If you want to select all elements of a row or a column, no number is needed before or after the comma, respectively: ° Fi 1 my_Pe r i od [, 1] selects all elements of the first column. ° Fi 1 my_Pe r i od [1, ] selects all elements of the first row. 20/24 Data Frame Jaký byl průměrný počet mrtvých ve sledovaných filmech? Mean_Dead = as.numeric(Filmy_Period[l,]) mean(Mean_Dead) ## [1] 34 Jaký je Tarantino index (tj. počet mrtvých na počet nadávek) pro Inglorious Basterds? Dead_Curse = data.frame(Filmy_Period[l:2,2]) Dead_Curse[2,l]/Dead_Curse[l,1] ## [1] 1.208333 21/24 Intermezzo 22/24 Data Frame Vyvolání Data Frame z R dataO data(USArrests) view(liSArrests) ??USArrests Jak se zorientovat v Data Frame? head() # show the first observations of a data frame tail() # prints out the last observations in your data set str() # struktura dat 23/24 Data Frame Výběr z prvků USArrests[l:3,3] USArrests[1:3,"UrbanPop"] USArrests$UrbanPop USArrests[l, "UrbanPop"] Subsoubory subset(USArrests, UrbanPop < 50) Seřazování order(USArrests$Murder) 24/24