2016

Previously on AVED…

Basics of ggplot2

Each figure produced by ggplot2 consist of three basic components:

  1. Data visualization which is composed by one or many overlapping layers. (done)
  2. Elements which allows us to understand the visualization: scales (colors, sizes, etc.), legends, and axes.
  3. Data-unrelated elements which defines general appearance of resulting figure – fonts, grids, background colors, etc.

Scales, legends, and axes

Let's listen to Hadley (Wickham, 2016):

Scales control the mapping from data to aesthetics. They take your data and turn it into something that you can see, like size, colour, position or shape. Scales also provide the tools that let you read the plot: the axes and legends. Formally, each scale is a function from a region in data space (the domain of the scale) to a region in aesthetic space (the range of the scale). The axis or legend is the inverse function: it allows you to convert visual properties back to data.

Scales

We haven't mentioned any of those things so far. But we were using them – recall simplified example from introduction:

ggplot(
  data = diamonds,
  mapping = 
    aes(
      x = carat,      # X-coordinate
      y = price,      # Y-coordinate
      color = cut,    # color of the "point" margin
      size = table    # size of the point
    )
) + 
  geom_point()

Scales

This example specifies only aesthetics – i.e. which data should be mapped to which aesthetics. How it should be done is controlled by scales.

Scales

In the initial example the scales were left in default setting. The call was therefore identical to:

ggplot(
  data = diamonds,
  mapping = 
    aes(
      x = carat,      # X-coordinate
      y = price,      # Y-coordinate
      color = cut,    # color of the "point" margin
      size = table    # size of the point
    )
) + 
  geom_point() +
  scale_x_continuous() +
  scale_y_continuous() +
  scale_color_discrete() +
  scale_size_continuous()

Scales

We can use this example to demonstrate some options available via scales:

ggplot(
  data = diamonds,
  mapping = 
    aes(
      x = carat,      # X-coordinate
      y = price,      # Y-coordinate
      color = cut,    # color of the "point" margin
      size = table    # size of the point
    )
) + 
  geom_point() -> p

p + scale_x_continuous(trans = "log")         # Log axes
p + scale_y_continuous(limits = c(500,1000))  # Limits on axes
p + scale_color_brewer(palette = "Paired")      # Colors
p + scale_size_continuous(name = "Size")      # Transformations

Scales: examples

Scales: examples

Scales: Discrete and continuos

Underlying (constructor) functions for different scale* functions are discrete_scale() and continuous_scale(). They have a lot of arguments. Most important (from the user POV) are:

  • name – name of the scale (printed in the legend)
  • breaks – position on major breaks
  • minor_breaks – position of minor breaks
  • labels – labels
  • limits – limits of the scale (e.g. axes)
  • trans – transformation (log,…) of the scale
  • position – position of the axis (new in ggplot2 2.2.0)
  • guide

See help for more details…

Scales

The names of scales are systematic. They consist of three elements joined by _:

  1. scale
  2. the name of aesthetic (x, y, color, size, alpha,..)
  3. the name of the scale (discrete, continuous, brewer, hue,…)

Scales are added to ggplot() call in the same manner as layers – i.e. by +. It is misleading a bit. In the case of layers + actually add additional layer but with scales it replaces default (or previously defined) one.

Arguments given in ... are passed to constructor functions.

Scales

ggplot(
  data = diamonds,
  mapping = 
    aes(
      x = carat,
      y = price,
      color = cut
    )) +
  geom_point() -> p
  
p + scale_color_brewer(palette = "Paired") -> p1

p + scale_color_brewer(palette = "Paired") + 
    scale_color_brewer(palette = "Set3") -> p2

Scales

Colors

A special attention is paid to colors in scales which drive mapping to aesthetics fill and color. There four gradient-based methods of mapping for continuous and two for discrete variables. Let`s start with continuous color scales.

Colors: Continuous color scales (continuous, gradient)

  1. scale_fill_continuous() is a default option identical with scale_fill_gradient(). It allows user to set a color for low and high values. ggplot2 will process any choice of colors but it is very difficult to come up with proper combination which is easy understand for human eye and brain, therefore you should use some prepared choices.

Colors: examples

Example of geom_ratser() using ggplot2::faithfuld data:

faithfuld %>% 
  ggplot(
    aes(
      x = waiting,
      y = eruptions,
      fill = density
    )
  ) + 
  geom_raster(
    interpolate = TRUE
  ) -> geyser

Example of geom_point() using subset of ggplot2::diamonds data:

ggplot(
  data = diamonds,
  mapping = 
    aes(
      x = carat,
      y = price,
      color = table
    )) +
  geom_point() -> stones

Colors: examples

Default, manual, using munsell package

geyser + scale_fill_gradient()

geyser + scale_fill_gradient(low="white", high="black")

geyser + scale_fill_gradient(
  low = munsell::mnsl("5G 9/2"),
  high = munsell::mnsl("5G 6/8")
)

stones + scale_color_gradient()

stones + scale_color_gradient(low="white", high="black")

stones + scale_color_gradient(
  low = munsell::mnsl("5G 9/2"),
  high = munsell::mnsl("5G 6/8")
)

Colors: examples

Default, manual, using munsell package

Colors: Continuous color scales (gradient2)

  1. scale_fill_gradient2() allows user to combine two color gradients: from low to mid point and from midpoint to high. User can manually specify the value of midpoint (default is midpoint=0).

Colors: Continuous color scales

faithfuld$density %>% median -> mid

geyser + scale_fill_gradient2(midpoint = mid)
geyser + scale_fill_gradient2(midpoint = mid, 
                              low = "blue", 
                              high = "red", 
                              mid = "white")

diamonds$table %>% median -> mid

stones + scale_color_gradient2(midpoint = mid)
stones + scale_color_gradient2(midpoint = mid,
                               low = "blue",
                               high = "red",
                               mid = "white")

Colors: Continuous color scales

Colors: Continuous color scales (gradientn)

  1. scale_fill_gradientn() provides possibility to use n-element gradient specified by a vector in argument colors. It should be used only if there is a strong reason for it. I also recommend to use gradient prepared by experts – see some examples.

Colors: Continuous color scales

Function terrain.colors(n) from grDevices generates gradient (palette) of n colors:

geyser + scale_fill_gradientn(colours = terrain.colors(7))

Similar function from colorspace package:

geyser + scale_fill_gradientn(colours = colorspace::heat_hcl(7))

…and from viridis package. Viridis provides very nice palettes for color-blind people and even its own function scale_fill_viridis():

geyser + scale_fill_gradientn(colours = viridis::viridis(7))

Colors: Continuous color scales

Colors: Continuous color scales (distiller)

  1. scale_fill_distiller() applies ColorBrewer colors (see http://colorbrewer2.org/) on continuous data. It allows user to choose from three types of palettes: seq (sequential), div (diverging) or qual (qualitative). ggplot2 can and will use qual even for continuous data but there is no reason for using it.

Particular palette can be set by its name (see website for it) or by its number. You can also change direction of the palette using direction = 1 or direction = -1.

Colors: Continuous color scales

geyser + scale_fill_distiller(type = "seq", palette = "YlOrRd", direction = 1)
geyser + scale_fill_distiller(type = "seq", palette = "Oranges", direction = 1)
geyser + scale_fill_distiller(type = "div", palette = "BrBG", direction = 1)

geyser + scale_fill_distiller(type = "seq", palette = "YlOrRd", direction = -1)
geyser + scale_fill_distiller(type = "seq", palette = "Oranges", direction = -1)
geyser + scale_fill_distiller(type = "div", palette = "BrBG", direction = -1)

Colors: Continuous color scales

Colors: Discrete color scales

Example of geom_point() using subset of ggplot2::diamonds data:

diamonds %>% 
  ggplot(aes(
    x = carat,
    y = price,
    color = color
  )) +
  geom_point() + 
  theme_classic() + 
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank()
  ) -> stones_color

Colors: Discrete color scales

Example of geom_bar() using subset of ggplot2::diamonds data:

diamonds %>% 
  ggplot(aes(
    x = color,
    fill = color
  )) +
  geom_bar() + 
  theme_classic() + 
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank()
  ) -> stones_fill

Colors: Discrete color scales (hue)

  1. The default color scheme is scale_fill_hue() which picks evenly scaled hues around HCL wheel. HCL is color definition system used by ggplot2. Color are defined by three components: hue (h, [0, 360]), chroma (c), and luminance (l, [0, 100]). scale_fill_hue() returns evenly hues with chroma and luminance being equal.

Colors: Discrete color scales

User can control the range of hues as well as values of chroma and luminance. See example:

stones_color + scale_color_hue()
stones_color + scale_color_hue(c = 50, l = 10)
stones_color + scale_color_hue(h = c(100,200))

stones_fill + scale_fill_hue()
stones_fill + scale_fill_hue(c = 50, l = 10)
stones_fill + scale_fill_hue(h = c(100,200))

Colors: Discrete color scales

Colors: Discrete color scales (brewer)

It is very difficult to find good colors using hue. Therefore it is good to try some prepared palettes.

  1. scale_fiĺl_brewer() allows to use palletes qualitative palettes from http://colorbrewer2.org/

H.W. recommends to use qualitative palettes Set1 and Dark2 for points and Set2, Pastel1, Pastel2, and Accent for areas. It also make sense to use sequential palettes in the case of ordered options

Colors: Discrete color scales

stones_color + scale_color_brewer(type = "qual", palette = "Set1")
stones_color + scale_color_brewer(type = "qual", palette = "Pastel2")
stones_color + scale_color_brewer(type = "seq", palette = "YlOrRd")

stones_fill + scale_fill_brewer(type = "qual", palette = "Set1")
stones_fill + scale_fill_brewer(type = "qual", palette = "Pastel2")
stones_fill + scale_fill_brewer(type = "seq", palette = "YlOrRd")

Colors: Discrete color scales

Colors: Discrete color scales (grey)

  1. scale_fill_grey() provide black and white palette for discrete data. Shades are scaled from light (start) to dark (end).
stones_color + scale_color_grey()
stones_color + scale_color_grey(start = 0.5, end = 1)
stones_color + scale_color_grey(start = 0, end = 1)

stones_fill + scale_fill_grey()
stones_fill + scale_fill_grey(start = 0.5, end = 1)
stones_fill + scale_fill_grey(start = 0, end = 1)

Colors: Discrete color scales

Colors: Discrete color scales

  1. scale_fill_manual() and scale_colour_manual() allows user to define his own palette or use palette from different package.

  2. scale_fill_identity() and scale_colour_identity() use values from already scaled variable

Scales: Beyond colors

There are more aestheitcs then fill and color. You can also find specialized functions for them:

  • continuous – default for continuous data
  • discrete – default for discrete data
  • identity – uses directly values given in a scaled variable
  • manual – allows user to define his/her own rules

Positioning

There are four ways which drives positioning of observation representation on the page. The scales transformation was discussed above. Description of other three options follows.

Position adjustments

layer() function has an argument position with options:

  • identity – no position adjustment (default for most geom_*() functions)
  • jitter – jitter points to avoid overlapping
  • dodge – avoid overlapping by dodging on side
  • stack – put overlaps on the top of each other
  • nudge – shifts overlaps by set x and y distance
  • jitterdodge – combines jitter and dodge

Jittering

Jittering is a technique useful for avoiding overlapping (especially) in scatter plots. Actual coordinates of each observation are randomly changed within specified limits.

The most common use of jittering is via geom_jitter() a shortcut for layer(geom = "points", position = "jitter,...):

geom_jitter(mapping = NULL, data = NULL, stat = "identity",
  position = "jitter", width = NULL, height = NULL,...)
  • width : Amount of vertical and horizontal jitter. The jitter is added in both positive and negative directions, so the total spread is twice the value specified here.
  • height : Amount of vertical and horizontal jitter. The jitter is added in both positive and negative directions, so the total spread is twice the value specified here.

Jittering

Recall an example from the introduction:

diamonds %>%
  ggplot(data = ., mapping = aes(x = x, y = y)) +
    # x -- length of stones (mm)
    # y -- width of stones (mm)
  geom_point(
    color = "black"
  ) +
  geom_jitter(
    color = "blue",
    width = 1,
    height = 1,
    alpha = 0.3
  )

Jittering

Stack, dodge and fill

These options are commonly used in bar plots for putting geoms (bars) on the top of each other, next to each other, and getting shares of options. See an example:

mean.price <- diamonds$price %>% mean

diamonds %>% 
  mutate(
    high.price = price > mean.price
  ) %>% 
  ggplot(
    aes(
      x = cut,
      fill = high.price
    )
  ) + scale_fill_brewer(name = "High price", 
                        type = "qual", palette = "Set2") -> p

Stack, dodge and fill

p +
geom_bar(
    position = "stack" # default value
  )

Resulting figure does not provide very clear idea of ratio of low/high price stones in each category. You can get a clear picture by setting position = "fill".

Stack, dodge and fill

p +
geom_bar(
    position = "fill"
  )

This figure shows nicely shares within groups (cuts) but it cannot provide comparison among groups. For that purpose use position = "dodge".

Stack, dodge and fill

p +
geom_bar(
    position = "dodge"
  )

Faceting

ggplot2 allows you to break single figure into multiple "facets". See example:

Faceting

There are two functions that can arrange faceting for you:

  • facet_wrap() (used on previous slide) takes a variable or combination of multiple variables and create a "subfigure" for each level.
  • facet_grid() creates a matrix of panels defined by row and column facetting variables.

Faceting

facet_wrap()

facet_wrap(facets, nrow = NULL, ncol = NULL, scales = "fixed",
  shrink = TRUE, labeller = "label_value", as.table = TRUE,
  switch = NULL, drop = TRUE, dir = "h", strip.position = "top")

Argmunets:

  • facets – either a formula or a character vector.

You can get identical results in the example with + facet_wrap(~cut) and + facet_wrap("cut"). You will learn about formulas in the "Econometrics in R" lecture.

  • nrow and ncol – number of rows and columns
  • scales – All subfigures have identical scales by default. You can change this behavior using argument scales with options fixed (default), free, free_x, and free_y.
  • strip.position – set strip position

Faceting

facet_grid()

facet_grid(facets, margins = FALSE, scales = "fixed", space = "fixed",
  shrink = TRUE, labeller = "label_value", as.table = TRUE,
  switch = NULL, drop = TRUE)

Argmunets:

  • facets – a formula with rows on the LHS and columns on the RHS
  • margins – if TRUE adds an extra row and column with all observations row row/column

Faceting

Run examples

diamonds %>%
    sample_n(500) %>% 
    ggplot(data = ., mapping = aes(x = carat, y = price)) +
    geom_point() -> p

p + facet_wrap(~cut, ncol = 3)
p + facet_wrap("cut", ncol = 3, scales = "free")
p + facet_wrap(c("cut","color"))

p + facet_grid(cut ~ color)
p + facet_grid(cut ~ color, margins = TRUE)

Faceting

Change labels

In order to do so you can:

  1. Modify your data (e.g. change factor labels) – sometimes efficient but always dirty solution
  2. Use labeller option

Faceting

Change labels with labeller

labeller is a function which breaks original data.frame into list of data.frames. Each of them (i.e. each item) is used as a data input for one panel.

You can use labeller to change strip labels – see example:

aux <- c(
    "Fair" = "Fair cut",
    "Good" = "Good cut",
    "Very Good" = "Very good cut",
    "Premium" = "Premium cut",
    "Ideal" = "Ideal cut"
)

p + facet_wrap("cut", labeller = as_labeller(aux))

Faceting

Change labels with labeller

Coordinate systems: Linear coordinate systems

ggplot2 supports both linear and non-linear coordinate systems. In most cases we use default coord_cartesian() which is default linear system, where the position of an element is given by x and y coordinates.

Following example use a simple scatter plot to illustrate default properties of coord_cartesian(). It uses data from VGAMdata to show arrow shots. Each shot is described by its X a Y coordinates.

Coordinate systems: Linear coordinate systems

## # A tibble: 126 × 3
##         X     Y archer
##     <dbl> <dbl>  <dbl>
## 1   24.14 -9.55      1
## 2   28.55  6.57      1
## 3    3.97  0.46      1
## 4   28.57 26.84      1
## 5   -3.43  8.57      1
## 6    9.68 16.33      1
## 7   -5.95 20.73      1
## 8   17.32  4.59      1
## 9   -0.48 -7.72      1
## 10 -18.42 -5.64      1
## # ... with 116 more rows

Coordinate systems: Linear coordinate systems

coord_fixed()

coord_cartesian() sets ratio to fit required figure size. But it is misleading it this case as far as units on both axes are equal. Luckily coord_fixed() allows user to set ratio:

p + coord_fixed(ratio = 1) # ratio = 1 is default value

Coordinate systems: Linear coordinate systems

coord_fixed()

Coordinate systems: Linear coordinate systems

coord_flip()

Another function which modifies linear coordinate system is coord_flip() which just flips axes:

p + coord_fixed(ratio = 1) + # ratio = 1 is default value
  coord_flip()

Coordinate systems: Linear coordinate systems

coord_flip()

Coordinate systems: Linear coordinate systems

All coord_*() allows user to use arguments ylim and xlim to zoom part of the figure. Let's zoom 1st quadrant:

p + coord_fixed(xlim = c(0,30), ylim = c(0,30))

Coordinate systems vs. scales

Similar functionality is provided by scales. They might appear identical, but they differ deeply. Let's see an example using simulated data:

data_frame(
  x = seq(from=-10, to=10, by=0.1)
) %>% 
  mutate(
    y = x^2 # ...and we have a parabola
  ) %>% 
  ggplot(
    aes(x=x,y=y)
  ) + geom_point() -> p

Coordinate systems vs. scales

We can plot it with the OLS fitted line and then limit the figure on 1st quadrant using coord_cartesian() and scales:

p <- p + geom_smooth(method = "lm", se=FALSE)

p1 <- p + coord_cartesian(xlim = c(0,10), ylim = c(0,100))

p2 <- p + coord_cartesian() +
  scale_x_continuous(limits = c(0,10)) +
  scale_y_continuous(limits = c(0,100))

Coordinate systems vs. scales

Coordinate systems vs. scales

The difference is clear. Limits set by coord truly zoom the figure and all observations are taken into account (onto OLS fit in this case), but scales completely excludes observations.

Coordinate system: Non-linear coordinate systems

There are also non-linear coordinate systems supported by ggplot2. Two of them – polar coordinates (coord_polar()) and map projections (coord_map()) are quite rarely used.

Themes

Data unrelated elements are controlled via themes which allows user to completely change appearance of a Figure.

Themes

Complete schemes in ggplot2

Themes

Complete schemes in ggplot2

Data-related elements are still the same in all figures! Themes really controls only data-unrelated features such as fonts, grids, background colors, fonts sizes, and so forth.

Themes

Theming system in ggplot2 consist of following components:

  • elements specify the non-data elements which can be controlled – e.g. plot.title element controls the appearance of plot title.
  • each element is associated with an element function which describes visual properties if the element – there are four basic ones: element_text(), element_line(), element_rect(), and element_blank()
element_text(family = NULL, face = NULL, colour = NULL, size = NULL,
  hjust = NULL, vjust = NULL, angle = NULL, lineheight = NULL,
  color = NULL, margin = NULL, debug = NULL)

element_line(colour = NULL, size = NULL, linetype = NULL,
  lineend = NULL, color = NULL)

element_rect(fill = NULL, colour = NULL, size = NULL, linetype = NULL,
  color = NULL)

Themes

  • theme() function which allows user to set themes – e.g. by call:
p + theme(plot.title = element_text(size = 20))

Alike in the case of colors it is difficult to come up with nice theme. Therefore there is a number of ready to use complete themes – see examples in handout.

Themes: Example

We will demonstrate using themes by mimicking appearance of a Figure from an OECD report:

Themes: Example

Let's create a simulated data:

expand.grid(
  country = c("MEX","USA","CAN"),
  year = 2005:2014
) %>% rowwise %>%  
  mutate(
    value = rnorm(1, mean = 0.25, sd = 2)
  ) %>% 
  group_by(country) %>% 
  mutate(
    value = cumsum(value)
  ) -> oecd

save(oecd, file="data/oecd_sim.Rdata")

Themes: Example

And very first figure:

oecd %>% 
  ggplot(
    aes(x=year,y=value)
  ) +
  geom_line(
    aes(color = country)
  ) +
  labs(
      title = "QuasiOECD Figure",
      subtitle = "This is subtitle",
      caption = "Source: Simulated data"
      ) -> p

New in ggplot2 2.2.0 – You can add title, subtitle, and caption using function labs().

Themes: Example

Themes: Example

In the first step we should adjust data-related elements to fit OECD figure:

p + scale_color_manual(
  # Set name of the scale
  name = "Country",
  # Set colors as a named vector (see help)
  values = c(
    "CAN" = "black",
    "MEX" = "blue",
    "USA" = "grey50"
  ),
  # Set order in the legend -- see help for scale_discrete()
  breaks = c("CAN","MEX","USA"),
  # Set labels in the legend -- see help for scale_discrete()
  labels = c(
    "CAN" = "Canada",
    "MEX" = "Mexico",
    "USA" = "United States"
  )
) +
  scale_x_continuous(
    breaks = unique(oecd$year)
  ) -> p

Themes: Example

Themes: Example

And now we can adjust data-unrelated elements. The easiest way is to modify complete scheme but we will do it from scratch.

In the first step we will set plot elements: background, margin and title:

p +
  theme(
    # Background properties
    plot.background = element_rect(fill="pink",            # Fill color
                                   linetype = 3,           # Border linetype 
                                   size = 5,               # Border size
                                   color = "yellow"),      # Border color
    plot.title = element_text(family = "times",            # Font family
                              face = "bold",               # Font face
                              color = "red",               # Font color
                              angle = 180                  # Text angle
                              ),                              
    # Margin is set by a special function margin() -- no element_*()
    plot.margin = margin(t = 20, r = 0, b = 5, l = 5)
  ) -> pde

Themes: Example

Themes: Example

No, not even close. Let's try it again:

p + theme(
  # We will use element_blank() to remove 
  # plot.title. element_blank() draws nothing
  # and assigns no space.
  plot.title = element_blank(),
  plot.subtitle = element_blank(),
  plot.caption = element_blank()
) -> p

Themes: Example

Themes: Example

We will proceed with axis elements:

p + theme(
  # axis.line = element_line() # controls lines parallel to axis
  axis.title = element_blank(), # There are no axis titles in OECD Figure
  # axis.title.x = element_text()
  axis.ticks = element_blank(), 
       # There are no actual ticks in OECD Figure,
       # normally set by element_line()
  # Length of ticks is again set by a special function unit()
  # axis.ticks.length = unit(10, units="pt")
  axis.text = element_text(
    color = "black", 
    size = 11
  )
) -> p

Themes: Example

Themes: Example

It is a time to modify legend elements:

p + theme(
  legend.background = element_rect(
    fill = "grey90",                     # light grey background
    size = NA                            # no border
  ),
  legend.key = element_rect(
    fill = NA,                            # use no extra fill for keys
    color = NA                            # and no border
  ),
  legend.key.width = unit(30, units = "pt"), # make it a bit longer
  legend.title = element_blank(),         # There is no name in OECD Figure
  legend.position = "top",
  legend.direction = "horizontal",
  legend.text = element_text(
    size = 11
  )
) -> p

Themes: Example

Themes: Example

And finally panel elements:

p + theme(
  panel.background = element_rect(
    fill = "#e1fcfd"                    # Super-light blue as a RGB code
  ),
  panel.border = element_rect(
    color = "black",
    fill = NA
  ),
  panel.grid.major = element_blank(),
  panel.grid.minor = element_blank(),
  aspect.ratio = 1
) -> p

Themes: Example

…and it is almost as ugly as the original.

Export figures

Export figures with ggsave()

ggsave() is a function which save last plot displayed. It supports export to multiple vector (pdf, svg, eps/ps, and wmf) and bitmap (png, jpeg, tiff, bmp) formats.

Using vector graphic can be highly recommended. Vector graphics formats save infromation on all elements in figure (their position and other properties) which allows user to scale them without loss of quality (no blurred edges etc.). On the other hand it may results in considerable file size. This is the case especially of scatter plots with many (often overlaping) observations. In this situation you can consider using bitmaps.

Example

Following figure has less then 200KB in PNG (a bitmap), over 11MB in PDF, and 26MB in SVG.

Export figures with ggsave()

Choice of formats also depends on intended use: PNG and SVG are designed for web sites, PDF and EPS/PS for (e-)printed documents. Make sure that you can process vector graphics in text tools of choice!

Best practices

What (not) to do

ggplot2 is a powerful tool which allows you to do pretty much anything you want (incl. pie charts) – but you should try to use it effectively.

Some advice from Tufte (2001) on Principles of graphical excellence:

  • Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency.
  • Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.
  • Graphical excellence is nearly always multivariate.
  • Graphical excellence requires telling the truth about the data.

Think of graphical excellence when plotting.

Homework 2

Homework

Use table VGAMdata::oly12 and compare weight and height of London 2012 Summer Olympic Games with BMI limits set by WHO.

Your results should look like one of following figures. (Feel free to choose your favorite colors. Other features are mandatory.)

  • Use BMI formula and limits defined by WHO: http://apps.who.int/bmi/index.jsp?introPage=intro_3.html
  • You can add BMI limits as an additional table (1st solution) or as a set of functions (2nd solution).
  • You may find helpful to use tidyr, dplyr, and, of course, ggplot2
  • You are supposed to submit code – not a figure

Homework: 1st solution

Homework: 2nd solution