Install (update) ggplot2
in version 2.2.0 (released 14/11/2016):
https://blog.rstudio.org/2016/11/14/ggplot2-2-2-0/
Changes from 2.1.X:
- Subtitles and captions.
- A large rewrite of the facetting system.
- Improved theme options.
- Better stacking.
2016
Install (update) ggplot2
in version 2.2.0 (released 14/11/2016):
https://blog.rstudio.org/2016/11/14/ggplot2-2-2-0/
Changes from 2.1.X:
ggplot2
(There is a package similar in the spirit – ggvis
– which is oriented on interactive graphics.)
ggplot2
docs website: http://docs.ggplot2.org/current/To illustrate ggplot2
basics we will use data set diamonds
which contains data on tens of thousands stones. We will use a random sample of 500 of them for speed and clarity.
## # A tibble: 500 × 10 ## carat cut color clarity depth table price x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.50 Ideal E VS2 62.5 57 1629 5.10 5.04 3.17 ## 2 0.32 Ideal D VS2 61.0 57 972 4.46 4.42 2.71 ## 3 1.01 Ideal F SI2 61.5 56 4458 6.42 6.49 3.97 ## 4 1.13 Premium F VS2 59.8 61 6822 6.81 6.76 4.06 ## 5 1.60 Premium I VS2 60.2 58 9784 7.63 7.56 4.57 ## 6 0.44 Good H SI1 63.5 57 733 4.82 4.85 3.07 ## 7 0.71 Ideal D SI1 60.8 56 2863 5.80 5.77 3.52 ## 8 0.36 Ideal E VS2 61.7 55 742 4.57 4.60 2.83 ## 9 1.09 Ideal F VVS2 62.1 56 10246 6.55 6.59 4.08 ## 10 0.58 Ideal E VS2 62.3 54 1809 5.43 5.39 3.37 ## # ... with 490 more rows
ggplot2
Each figure produced by ggplot2
consist of three basic components:
All three components are controled independently.
Data visualization which is composed by one or many overlapping layers. Final figure is an union of multiple layers where every single one of them adds one quality in a figure.
In our schematic example first layer contains scatter plot and second smoothing line.
Following sequence of figures illustrates the concept of construction by layers. Our goal is to get smoothed scatter plot of weight (carat
) and price (price
) of stones in the sample.
At first the basic layer is generated by call of ggplot()
function. The basic layer is just a "empty blackboard":
diamonds %>% ggplot(data = ., mapping = aes(x = carat, y = price))
In the second layer we add dots which represent individual stones (using geom_point()
function):
diamonds %>% ggplot(data = ., mapping = aes(x = carat, y = price)) + geom_point()
The last layer in the example adds smoothing curve into the figure (using geom_smooth()
function):
diamonds %>% ggplot(data = ., mapping = aes(x = carat, y = price)) + geom_point() + geom_smooth()
Notice that smoothing curve is actually drawn over the points! Order of layers really matters.
We used geom_*()
functions to add additional layers. These functions are actually shortcuts for more verbose layer()
.
For example geom_point()
is identical to:
layer( data = NULL, mapping = NULL, geom = "point", stat = "identity", position = "identity" )
It is very rare to call layer()
directly.
Each layer()
argument refers to a property of a layer:
data
specified or inherited from ggplot()
call. (It is not necessary to specify data in ggplot()
call if they are specified for each layer separately.)aes()
function. aes()
assigns data (specific columns from input data frame) to qualities of the geom.Think about simple scatterplots. How many dimensions (stone qualities) can be plotted in a simple scatterplot?
See ggplot2::diamonds
:
## # A tibble: 500 × 10 ## carat cut color clarity depth table price x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.50 Ideal E VS2 62.5 57 1629 5.10 5.04 3.17 ## 2 0.32 Ideal D VS2 61.0 57 972 4.46 4.42 2.71 ## 3 1.01 Ideal F SI2 61.5 56 4458 6.42 6.49 3.97 ## 4 1.13 Premium F VS2 59.8 61 6822 6.81 6.76 4.06 ## 5 1.60 Premium I VS2 60.2 58 9784 7.63 7.56 4.57 ## # ... with 495 more rows
ggplot( data = diamonds, mapping = aes( x = carat, # X-coordinate y = price, # Y-coordinate color = cut, # color of the "point" margin fill = color, # color of filling size = table # size of the point ) ) + geom_point( shape = 21, # shape of "points" alpha = 0.2, # points transparency stroke = 2 # thickness of margin )
I have cheated – legends are "turned on" by default.
The example uses all aesthetics available for geom_point()
. One can notice two important things from the example:
ggplot()
call. See that geom_point
does not specify data used or mapping.layer()
/geom_*()
overrides specification given in initial ggplot()
call.geom
point
is used in above discussed example.geom_smooth()
from example above.Remember that functions geom_*()
are shortcuts for the specific call of layer which may differ in all parameters – not only in geom
parameter.
We can demonstrate it on the example of default setting of geom_point()
and geom_jitter()
.
Default geom_point()
:
layer( data = NULL, mapping = NULL, geom = "point", stat = "identity", position = "identity" )
Default geom_jitter()
:
layer( data = NULL, mapping = NULL, geom = "point", stat = "identity", position = "jitter" )
Plot a scatterplot which depicts following qualities of cars (datasets::mtcars
):
wt
– weight (1000 lbs)qsec
– 1/4 mile timeam
– transmission (0 = automatic, 1 = manual)mtcars %<>% as_tibble() mtcars %>% print(n=5)
## # A tibble: 32 × 11 ## mpg cyl disp hp drat wt qsec vs am gear carb ## * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## # ... with 27 more rows
You can map variables on different aesthetics. It is up to you to make it look nice.
ggplot( data = mtcars, mapping = aes(x = wt, y = qsec, colour = factor(am)) ) + geom_point()
ggplot() + geom_point( data = mtcars, mapping = aes(x = wt, y = qsec, colour = factor(am)) )
Both codes produce identical figures. Why?
geom_*()
functionsggplot2
allows users to construct many figure types in countless colors, sizes, etc. Following slides provide a basic overview of most common figure types and options.
The first thing which usually catches eye of a researcher is distribution of a variable. For its plotting we use a different tools for discrete and continuous variables.
geom_bar()
For a discrete variable it is crucial to see frequency of observed options. For example data set diamonds
contains column color
which contains evaluation of color of included stones from D
(best) to J
(worst). We can get the frequencies using table()
:
diamonds %$% table(color)
## color ## D E F G H I J ## 70 85 87 100 84 54 20
Common way to visualize frequencies of a discrete variable is to use a bar plot.
geom_bar()
geom_bar()
can be used to produce various bar plots. Basic (default) setting returns distribution of discrete variable ("histogram"), where height of a bar is equal to number of observation.
geom_bar()
understands following aesthetics: x
(required), alpha
, colour
, fill
, linetype
, size
. As an example we can plot a bar plot with stones color distribution.
geom_bar()
diamonds %>% ggplot( aes( x = color ) ) + geom_bar( stat = "count", position = "stack" )
geom_bar()
geom_bar()
returned number of cases at each x
position. The numbers were supplied by a function stat_count()
which processes data for geom_bar(stat="count")
. It is not common (but it is possible) to use stat_*()
functions directly as far as each of them is associated to some geom_*()
function.
Only mandatory aesthetic x
is used in the example. However one can use more then one aesthetic.
geom_bar()
diamonds %>% ggplot( aes( x = color, fill = cut ) ) + geom_bar()
geom_bar()
diamonds %>% ggplot( aes( x = color, fill = color ) ) + geom_bar()
This example shows that it is possible to map one variable to multiple aesthetics.
A histogram is designed for plotting a distribution of observed values of a continuous variable. At first a continuous scale of observed values is divided into intervals (bins) and then number of observations in all bins is counted a plotted.
A basic histogram can be created using geom_histogram()
. It uses the same aesthetics as geom_bar()
.
diamonds %>% ggplot(aes(x=price)) + geom_histogram()
geom_histogram()
allows user to change size (argument binwidth
, default to NULL
) or number (argument bins
, default to 30) of bins. If binnwidth
is set bins
is ignored.
It is recommended to play a little with binwidth
(or bins
) to find optimal bin size (number of bins).
diamonds %>% ggplot(aes(x=price)) + geom_histogram(binwidth = 1000)
geom_histogram()
displays observed values but sometimes is useful to estimated true density from the sample. geom_density()
delivers kernel density estimate of x
variable distribution.
geom_density()
understands following aesthetics: x
, y
, alpha
, colour
, fill
, linetype
, size
, and weight
.
diamonds %>% ggplot( aes(x=price) ) + geom_density()
Density estimate is delivered to geom_density()
by stat_density()
function which use kernel = "gaussian"
by default.
If you want to use a different kernel you need to call stat_density()
directly:
diamonds %>% ggplot( aes(x=price) ) + stat_density( kernel = "optcosine" # see density() help )
How could it be that stat_density()
produces layer (geom)?! The relationship between geom_*()
and stat_*()
is actually a bit complicated. Most of geom_*()
functions have an association with a stat
function. But on the other hand stat_*()
functions have a association with a geom
function.
See default setting of geom_histogram()
and related stat_bin()
:
geom_histogram(mapping = NULL, data = NULL, stat = "bin", position = "stack", ..., binwidth = NULL, bins = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) stat_bin(mapping = NULL, data = NULL, geom = "bar", position = "stack", ..., binwidth = NULL, bins = NULL, center = NULL, boundary = NULL, closed = c("right", "left"), pad = FALSE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
group
It might be important to see differences in density for different groups of stones (e.g. different cuts). ggplot2
provides multiple techniques for this task. The first one can be called grouping. Groups are specified by argument group
in aes()
. The argument group
expects a variable which divides variable x
into groups – it could be discrete (or logical) variable. The task specified in following geom_*()
call is performed for each group separately and outcomes are plotted into the same Figure.
diamonds %>% ggplot( aes(x=price, group = cut) ) + geom_density()
Hm, the outcome is hard to read. Let's tune it following the same concept of grouping. If we assign grouping variable cut
to different aesthetics (which geom_density()
understands!) the results could be clear to read and even breath taking:
diamonds %>% ggplot( aes(x=price, fill = cut, color = cut # Only one aesthetics would suffice. It just looks cool. ) ) + geom_density( alpha = 0.2 # Just to make it even cooler and easy to read. )
Box and violin plots are designed for comparison of distributions of different variables. More common box plot can be created using geom_boxplot()
which plots stylized distributions for each group.
geom_boxplot()
understands following aesthetics:
x
(grouping variable),ymax
(upper whisker = largest observation less than or equal to upper hinge + 1.5 * IQR),ymin
(lower whisker = smallest observation greater than or equal to lower hinge - 1.5 * IQR),lower
, middle
, upper
(quantiles)alpha
, colour
, fill
, linetype
, shape
, size
, weight
diamonds %>% ggplot( aes(x = cut, # Grouping variable price) # Variable to be plotted ) + geom_boxplot()
ggplot2
provides tools for visualization of 2D data distribution which are analogous to 1D functions described above.
ggplot2
has two geom_*()
functions analogous to geom_histogram()
which display distribution of observed combinations of two variables. Both of them split the plane to smaller areas and show the number of observations in each of them. geom_bin2d()
splits the plane to rectangles and geom_hex()
to hexagons.
geom_bin2d()
geom_bin2d()
understands to following aesthetics: x
, y
, and fill
. geom_hex()
adds colour
, fill
, and size
.
diamonds %>% ggplot( aes(x = carat, y = price) ) + geom_bin2d()
geom_hex()
diamonds %>% ggplot( aes(x = carat, y = price) ) + geom_hex()
geom_density2d()
Kernel estimate of 2D distribution is also available via geom_density2d()
. The function returns contours of distribution estimated.
geom_density2d()
understands following aesthetics: x
, y
, alpha
, colour
, linetype
, and size
.
diamonds %>% ggplot( aes(x = carat, y = price) ) + geom_density2d()
geom_point()
Basic tool for visualization of relationship of two continuous variables is with no doubt a scatter plot. You can plot it using geom_point()
.
It is possible to add more information on individual observations using many aesthetics available. However individual observations always will be – at least to some extent – "anonymous". geom_text()
and geom_label()
allows user to replace shape representing observation by text (string) defined in required aesthetic label
.
geom_text()
and geom_label()
Both geom_
functions understand aesthetics label
, x
, y
, alpha
, angle
, colour
, family
, fontface
, hjust
, lineheight
, size
, and vjust
.
As an example we can draw a scatter plot of price and weight relationship where each stone has a label depicting its clarity. As this type of plot is generally more suitable for data set with low number of observations we will further reduce the sub-sample from diamonds
data set.
geom_text()
and geom_label()
geom_text()
and geom_label()
As it is apparent from the Figure that the difference between geom_text()
and geom_label()
is just aesthetical. geom_label()
is also considerably slower.
It is also clear, that this type of plot very often suffers from over-plotting. As far as I know ggplot2
does not provide any automatic intelligent way to solve it. It is possible to manually adjust position of labels (see help) or use check_overlap = TRUE
. Which is a dirty way.
geom_text()
and geom_label()
sub_diamonds %>% ggplot(aes( x = carat, y = price, label = clarity )) + geom_text( check_overlap = TRUE )
check_overlap = TRUE
just suppress plotting of labels which would overlap with already plotted text. Oh, dear.
geom_smooth()
geom_smooth()
returns smoothed line. It supports multiple smoothing methods: lm, glm, gam, loess, and rlm (given in method
). Default method differs according to number of observations. It also returns confidence interval around smooth. Plotting of confidence interval can be suppressed by setting se = FALSE
.
geom_smooth()
understands aesthetics x
, y
, alpha
, colour
, fill
, linetype
, size
, and weight
.
geom_smooth()
diamonds %>% ggplot( aes( x = carat, y = price ) ) + geom_smooth( fill = "pink", colour = "red" )
Notice, that you can easily have smoothing curve without having actual observations.
geom_smooth()
You can also compare multiple smoothing methods by simply adding multiple layers:
diamonds %>% ggplot( aes(x = carat,y = price) ) + geom_point(alpha = 0.4) + geom_smooth(method = "lm", colour = "red", fill = "pink") + geom_smooth(method = "loess", colour = "green", fill = "lightgreen")
ggplot2
contains some tools for investigating relationship between three variables. geom_raster()
, geom_tile()
, and geom_rect()
provide similar functionality for plotting rectangles which is useful when plotting surface on a plane.
geom_raster()
is the fastest of the three and together with data on estimated density of Old Faithful Geyser eruptions we will use it to demonstrate use of rectangles.
print(faithfuld, n=5)
## # A tibble: 5,625 × 3 ## eruptions waiting density ## <dbl> <dbl> <dbl> ## 1 1.600000 43 0.003216159 ## 2 1.647297 43 0.003835375 ## 3 1.694595 43 0.004435548 ## 4 1.741892 43 0.004977614 ## 5 1.789189 43 0.005424238 ## # ... with 5,620 more rows
faithfuld %>% ggplot(aes(x = waiting, y = eruptions, fill = density)) + geom_raster()
The second Figure is created with an option interpolate = TRUE
which deliver nicer outcomes.
Rectangles might be a bit difficult to combine with different geoms. In this case one can use geom_contour()
which display contours of a 3D surface in 2D.
In the following task we want to combine estimated density of eruptions stored in table faithfuld
and actual observations from table faithful
.
The first way is to combine both data sets and plot Figure using a layer for density estimates and actual observations using the same table.
However ggplot2
allows user to specify different data set for each layer. We need to:
faithfuld
.faithful
.ggplot(data = faithfuld, aes( x = waiting, y = eruptions ) ) + geom_contour(aes(z = density)) + geom_point(data = faithful)
If the names in faithful
were different (e.g. obs_eruptions
and obs_waiting
) we would need to rewrite it in the following fashion (which would lead to identical outcome):
ggplot(data = faithfuld, aes( x = waiting, y = eruptions ) ) + geom_contour(aes(z = density)) + geom_point(data = faithful, aes( x = obs_waiting, y = obs_eruptions ))
A lot of datasets have spatial dimension which is useful to visualize on the map. There are two different cases:
The first thing you need is a map. Some maps are available in R-packages:
maps
(a few maps)cshapes
(extremely useful package with historical maps)However, these packages provide just very basic maps. You will need more than that. You can find a lot of maps on the internet, but those are not constructed for direct usage in ggplot2
. You need special tools for loading them and converting them into a data.frame. (ggplot2
can process only data.frames).
I recommend you to use data in so called ESRI shapefiles. There are tools for loading them in the package rgdal
. (rgdal
is just a frontend. It needs to have GDAL library installed.)
You would need especially two functions from rgdal
:
readOGR()
which requires user to specify source file (directory) and map layer name. (One shapefile can contain multiple layers.)ogrListLayers()
returns list of layers available in a shapefileHere you can download Czech Republic shapefiles with many layers: https://www.arcdata.cz/produkty/geograficka-data/arccr-500 (License!)
readOGR
returns a special spatial S4 class which you need to transform to data.frame. (Un)fortunately there is a method for it implemented in broom::tidy()
. (We will talk about broom
later – it is also part of tidyverse
.)
broom::tidy() is an alternative and future replacement of ggplot2::fortify()
Donwload and unpack ArcČR 500. I will eventually get a directory with a lot of stuff (almost 200 files):
dir("./data/ArcCR500_v33.gdb/") %>% head
## [1] "a0000000a.gdbindexes" "a0000000a.gdbtable" "a0000000a.gdbtablx" ## [4] "a0000000a.spx" "a0000000b.gdbindexes" "a0000000b.gdbtable"
There is no chance to do something with it without specialized tools.
At first we need to know what layers are in the shapefile (we need to feed it into readOGR()
library(rgdal) ogrListLayers("./data/ArcCR500_v33.gdb/")
## [1] "Hranice" "Zeleznice" ## [3] "SidlaBody" "SidlaPlochy" ## [5] "VyskoveKoty" "Silnice_2015" ## [7] "BazinyARaseliniste" "VodniPlochy" ## [9] "ZeleznicniStanice" "VodniToky" ## [11] "Letiste" "Lesy" ## [13] "ChranenaUzemi" "Vrstevnice" ## [15] "KladyTopografickychMap" "KladyZakladnichMap" ## [17] "SouradnicovaSitJTSK" "ZemepisnaSitETRS89" ## [19] "ZemepisnaSitWGS84" "Silnice_2016" ## attr(,"driver") ## [1] "OpenFileGDB" ## attr(,"nlayers") ## [1] 20
Let's assume that one might be a rail-nut:
readOGR("./data/ArcCR500_v33.gdb/","Zeleznice") -> Zeleznice
## OGR data source with driver: OpenFileGDB ## Source: "./data/ArcCR500_v33.gdb/", layer: "Zeleznice" ## with 3525 features ## It has 5 fields
Zeleznice %>% class()
## [1] "SpatialLinesDataFrame" ## attr(,"package") ## [1] "sp"
Zeleznice %>% typeof()
## [1] "S4"
We need to get a data.frame out of Zeleznice
. We will use tidy()
:
library(broom) Zeleznice %>% tidy() %>% as_tibble() -> Zeleznice_df print(Zeleznice_df, n=2)
## # A tibble: 20,284 × 6 ## long lat order piece group id ## <dbl> <dbl> <int> <fctr> <fctr> <chr> ## 1 -823494.8 -1070256 1 1 1.1 1 ## 2 -823275.9 -1070347 2 1 1.1 1 ## # ... with 2.028e+04 more rows
You can see two major problems here:
long
and lat
do not look like WGS84 coordinates! You are damn right – it is S_JTSK coordinate system.Plot it using geom_path()
:
Zeleznice_df %>% ggplot( aes(x=long, y=lat, group=id) ) + geom_path()
geom_path()
draws a line between points as they follow…
Zeleznice@data %>% mutate( id = row_number() %>% as.character() ) %>% left_join(Zeleznice_df,.) -> Zeleznice_df print(Zeleznice_df, n=5)
## # A tibble: 20,284 × 11 ## long lat order piece group id ELEKTRIFIKACE KATEGORIE ## <dbl> <dbl> <int> <fctr> <fctr> <chr> <int> <int> ## 1 -823494.8 -1070256 1 1 1.1 1 1 2 ## 2 -823275.9 -1070347 2 1 1.1 1 1 2 ## 3 -822975.1 -1070429 3 1 1.1 1 1 2 ## 4 -822726.4 -1070459 4 1 1.1 1 1 2 ## 5 -822536.8 -1070516 5 1 1.1 1 1 2 ## # ... with 2.028e+04 more rows, and 3 more variables: KOLEJNOST <int>, ## # ROZCHODNOST <int>, SHAPE_Length <dbl>
Now you can differentiate railways in the picture:
Zeleznice_df %>% ggplot( aes(x=long, y=lat, group=id, color = factor(ELEKTRIFIKACE)) ) + geom_path() + coord_fixed()
It would feel natural to add borders into the figure. But we do not have them in the railway layer.
So we need to access different layer from different shapefile and put thing together.
ogrListLayers("./data/AdministrativniCleneni_v13.gdb/")
## [1] "ZakladniSidelniJednotkyBody" ## [2] "UzemneTechnickeJednotkyBody" ## [3] "UzemneTechnickeJednotkyPolygony" ## [4] "KatastralniUzemiBody" ## [5] "KatastralniUzemiPolygony" ## [6] "MestskeObvodyAMestskeCastiBody" ## [7] "MestskeObvodyAMestskeCastiPolygony" ## [8] "CastiObceBody" ## [9] "CastiObcePolygony" ## [10] "ObceBody" ## [11] "ObcePolygony" ## [12] "ObceSPoverenymUrademBody" ## [13] "ObceSPoverenymUrademPolygony" ## [14] "ObceSRozsirenouPusobnostiBody" ## [15] "ObceSRozsirenouPusobnostiPolygony" ## [16] "OkresyBody" ## [17] "OkresyPolygony" ## [18] "KrajeBody" ## [19] "KrajePolygony" ## [20] "StatBod" ## [21] "StatPolygon" ## [22] "ZakladniSidelniJednotkyPolygony" ## attr(,"driver") ## [1] "OpenFileGDB" ## attr(,"nlayers") ## [1] 22
Read and transform the data:
readOGR("./data/AdministrativniCleneni_v13.gdb/","StatPolygon") %>% tidy() %>% as_tibble() -> CR
## OGR data source with driver: OpenFileGDB ## Source: "./data/AdministrativniCleneni_v13.gdb/", layer: "StatPolygon" ## with 1 features ## It has 30 fields
Zeleznice_df %>% ggplot( aes(x=long, y=lat, group=id, color = factor(ELEKTRIFIKACE)) ) + geom_path() + geom_path( data = CR ) + coord_fixed()
…and we would end up with an error. Can you tell me why?
Zeleznice_df %>% ggplot( aes(x=long, y=lat ) + geom_path( aes(group=id,color = factor(ELEKTRIFIKACE))) ) + geom_path( data = CR, aes(group=id), color="black" ) + coord_fixed()
Transformed data from GPX file produced by Strava.com:
load("data/run_slides.Rdata") print(run, n=5)
## # A tibble: 476 × 4 ## lon lat ele time ## <dbl> <dbl> <dbl> <chr> ## 1 16.59912 49.24200 243.4 2016-08-10T16:58:51Z ## 2 16.59916 49.24204 242.7 2016-08-10T16:58:53Z ## 3 16.59922 49.24207 242.2 2016-08-10T16:58:55Z ## 4 16.59926 49.24210 242.1 2016-08-10T16:58:57Z ## 5 16.59930 49.24214 242.3 2016-08-10T16:58:59Z ## # ... with 471 more rows
run %>% ggplot( aes(x = lon, y = lat) ) + geom_line()
Heart-shaped nonsense…
run %>% ggplot( aes(x = lon, y = lat) ) + geom_path()
run %>% ggplot( aes(x = lon, y = lat, color = ele) ) + geom_path()
library(ggmap) lon <- mean(run$lon) lat <- mean(run$lat) get_map(location = c(lon,lat), zoom=16) -> m1 ggmap(m1) -> p
p + geom_path( data = run, aes( x = lon, y = lat ), color = "red", size = 1 ) -> p
p
We will use a simulated data from an experiment. There is a table trial_data
in the file HW_trial_data.Rdata
with columns x
, control
, and treatment
with data observed:
## Source: local data frame [1,000 x 3] ## Groups: <by row> ## ## # A tibble: 1,000 × 3 ## x control treatment ## <dbl> <dbl> <dbl> ## 1 -2.6075287 -6.254996 -5.861075 ## 2 -0.8257511 -5.875684 -2.458799 ## 3 -0.6754555 -2.628165 -2.956889 ## # ... with 997 more rows
x
represents exogenenous variable and values in control
and treatment
responses in control and treatment groups.
I want you to plot a figure like this one:
All you need is in this presentation + some of tidyr
might be useful.