Statistics in Computer Science
Seminar Exercises
Stanislav Katina, Mojmir Vinkler
November 26, 2016
1 Location characteristics
Exercise 1 (Code Vectorization). Implement two versions of mean function in R - thee ﬁrst
one is mean_slow = function(x) and use for loop to sum numbers. The second version is
mean_fast = function(x) and use built-in function sum to sum numbers. Generate very long
random vector of uniformly distributed random numbers and compare performance of both
functions and also of built-in mean function.
Hints. Use runif(...) for generating random numbers and system.time({...}) for proﬁling.
Realizations will be denoted as x1, x2, . . . , xn, sorted realizations as x(1) ≤ x(2) ≤
. . . ≤ x(n). Then we deﬁne following estimations of location characteristics (sample location
characteristics):
• sample minimum Xmin, with realization xmin = x(1);
• sample maximum Xmax, with realization xmax = x(n);
• sample (arithmetic) mean X, with realization x = 1
n
n
i=1 xi = 1
n
nj
j=1 xjfj, nj
n, where fj are frequencies (counts) of xj and n = j fj;
• sample mode Xmod, with realization xmod is the most common value (in case of
discrete variable it is value x in which probability function has its maximum; in case
of continuous variable it is value x in which density function has its maximum);
• sample median X(robust estimation of location), with realization
x =
x(n+1
2 ) if n is odd,
1
2
x(n
2 ) + x(n
2
+1) if n is even;
distribution is symmetric, if x = x = xmod, distribution is positively skewed (right),
if x > x > xmod and distribution is negatively skewed (left), if x < x < xmod;
1
• sample percentile Xp (read as 100p-percentile), with realization xp deﬁned as
xp =
x(k+1) for k = np,
1
2
x(k) + x(k+1) for k = np,
where k = np , which is ﬂoor of np;
• sample quartiles there are three
– sample ﬁrst (lower) quartile Q1, with realization x0.25 is a value that splits oﬀ
the lowest 25% of data from the highest 75%,
Pr [xmin, x0.25] = Pr [X ≤ x0.25] =
1
4
, Pr [x0.25, xmax] = Pr [X ≥ x0.25] =
3
4
;
– sample second quartile (median) Q2, with realization x0.5 = x is a value that
splits oﬀ the lowest 50% of data from the highest 50%,,
Pr [xmin, x0.5] = Pr [X ≤ x0.5] =
1
2
, Pr [x0.5, xmax] = Pr [X ≥ x0.5] =
1
2
;
– sample third (upper) quartile Q3, with realization x0.75 is a value that splits
oﬀ the lowest 75% of data from the highest 25%,,
Pr [xmin, x0.75] = Pr [X ≤ x0.75] =
3
4
, Pr [x0.75, xmax] = Pr [X ≥ x0.75] =
1
4
;
• sample deciles Xk, with realizations xk splits data to ten buckets, i.e. k/10 of data
are lower than a decile and (10 − k)/10 are higher, where k ∈ {0, 1, . . . , 10};
• sample ﬁve-number summary (Xmin, Q1, Q2, Q3, Xmax)T
, with realizations (xmin,
x0.25, x0.50, x0.75, xmax)T
.
Robust location characteristics (resistant to outliers) are
• sample γ-truncated arithmetic average Xg, with realization xg that is calculated
as
xg =
1
n − 2g
x(g+1) + x(g+2) + . . . + x(n−g) ,
where g = γn , γ = 0.1, 0.2. More than γ100 % observations must be replaced for
the γ-truncated average to become large or small relative to the original [1
breakdown
point xg is therefore γ],
1
Breakdown point represents number of observations we need to signiﬁcantly change value of location
characteristics. For γ-truncated and γ-winsorized arithmetic average it is γn observations, for median n/2
observations and for simple arithmetic average changing just one observation is enough (that’s the reason
we say that arithmetic average is very sensitive to outliers).
2
• sample γ-winsorized arithmetic average Xw, with realization xw is deﬁned as
xw =
1
n
(g + 1)x(g+1) + x(g+2) + . . . + (g + 1)x(n−g) .
More than γ100 % must be replaced for the γ-winsorized average to become large or
small relative to the original [breakdown point xw is therefore γ].
Exercise 2 (height of 10-year old girls). Let’s have n = 12 heights (in cm) of randomly
sampled 10-year old girls sorted by height (order denoted as ri for x(i); in case the values
are equal, ri is calculated as average of their order numbers).
Table 1: Sorted realizations xi and their order ri for heights of 10-year old girls
i 1 2 3 4 5 6 7 8 9 10 11 12
x(i) 131 132 135 141 141 141 141 142 143 146 146 151
ri 1 2 3 5.5 5.5 5.5 5.5 8 9 10.5 10.5 12
Then x
·
= 140.83, x = 1
2
x(6) + x(7) = 141, x0.25 = 1
2
x(3) + x(4) = 138, where k =
12 × 0.25 = 3, Q3 = x0.75 = 1
2
x(9) + x(10) = 144.5, where k = 12 × 0.75 = 9.
Write functions for calculation of all location characteristics. Verify your functions on
characteristics above. Don’t use built-in functions for location characteristics such as mean,
quantile, etc. Use γ = 0.1 for truncated and winsorized arithmetic averages.
2 Spread (variability) characteristics
Then we deﬁne following estimations of spread (variability) characteristics (sample spread
characteristics):
• sample variance S2
, with realization
s2
= s2
n−1 = s2
x =
1
n − 1
n
i=1
(xi − x)2
;
under linear transformation sample variance changes like this2
s2
y = s2
a+bx = b2
s2
x,
i.e.
s2
y = s2
a+bx =
1
n − 1
n
i=1
a + bxi − a + bx
2
=
1
n − 1
n
i=1
(a + bxi − (a + bx))2
=
1
n − 1
n
i=1
(b (xi − x))2
= b2
s2
x;
2
Equation tells us that variance of shifted and rescaled variable y is equal to square of scale multiplied
by variance of original variable x.
3
• sample standard deviation S, with realization
s = sn−1 = sx = s2
x;
under linear transformation standard deviation changes like this
sy = sa+bx = |b| sx,
• coeﬃcient of variation Vk, with realization vk represents normalized form of standard
deviation (inversion of signal-to-noise ratio; fraction of variability to mean)
vk =
sx
x
;
it is usually denoted in percentage points, i.e. 100 × (sx/x) % and can be used only
for realizations with positive values;
• sample variance of arithmetic average S2
X
, with realization
s2
x =
s2
x
n
;
• sample standard deviation of arithmetic average (sample standard error)
SX, with realization
sx =
sx
√
n
;
• sample skewness B1, with realization
b1 =
n−1 n
i=1 (xi − x)3
n−1 n
i=1 (xi − x)2 3/2
=
√
n n
i=1 (xi − x)3
n
i=1 (xi − x)2 3/2
,
distribution is symmetric, if b1 = 0, positive skewness (density on the left side is steeper
than the right side), if b1 > 0 a negative skewness (density on the right side is steeper
than the left side), if b1 < 0;
• sample kurtosis B2, with realization
b2 =
n−1 n
i=1 (xi − x)4
n−1 n
i=1 (xi − x)2 2 − 3 =
n n
i=1 (xi − x)4
n
i=1 (xi − x)2 2 − 3,
distribution is normal (mesokurtic), if b2 = 0, pointy (leptokurtic), if b2 > 0 and ﬂat
(platykurtic), if b2 < 0;
• sample sum of squares SS = n
i=1 Xi − X
2
, with realization
SSobs =
n
i=1
(xi − x)2
,
4
• sample sum of absolute deviation SAD = n
i=1 |Xi − X0.5|, with realization
SADobs =
n
i=1
|xi − x0.5| ;
• sample arithmetic average deviation MAD = 1
n
n
i=1 |Xi −X0.5|, with realization
MADobs = SADobs/n;
• sample range D = Xmax − Xmin, with realization
dobs = xmax − xmin;
• sample interquartile range DQ = Q3 − Q1, with realization
dQ = x0.75 − x0.25;
distribution is (between quartiles) symmetric, if x0.75 − x0.50 = x0.50 − x0.25, positively
skewed, if x0.75 −x0.50 > x0.50 −x0.25 and negatively skewed, if x0.75 −x0.50 < x0.50 −x0.25;
• sample decile range DD = X0.9 − X0.1, with realization
dD = x0.9 − x0.1;
• sample percentile range DP = X0.99 − X0.01, with realization
dP = x0.99 − x0.01.
Robust spread characteristics (variability) are
• sample γ-truncated variance S2
tg, with realization s2
tg calculated as
s2
tg =
1
n − 2g − 1
n−g
i=g+1
x(i) − xtg
2
;
more than γ100 % must be replaced so that γ-truncated variance changes to large or
small relative to the original s2
[breakdown point s2
tg is γ]; it applies that s2
tg < s2
because truncating removes outliers;
• sample γ-winsorized variance S2
wg, with realizations s2
wg calculated as
s2
wg =
1
n − 1
(g + 1)(x(g+1) − xwg)2
+ (x(g+2) − xwg)2
+ . . . + (g + 1)(x(n−g) − xwg)2
;
more than γ100 % must be replaced so that gamma-winsorized variance changes to
large or small relative to the original s2
[breakdown point s2
wg is γ]; it applies that
s2
wg < s2
because winsorization pulls outliers closer to the mean;
5
• sample quartile coeﬃcient of variation Vk,Q = (Q3 −Q1)/Q2, with realization vk,Q
calculated as
vk,Q =
x0.75 − x0.25
x
.
Other robust spread characteristics characterized by modiﬁed boundaries are
• sample robust minimum and maximum (“inner boundaries”) X∗
min = BL = Q1 −
1.5DQ and X∗
max = BU = Q1 + 1.5DQ, with realizations deﬁned as
x∗
min = bL = x0.25 − 1.5 (x0.75 − x0.25) ,
x∗
max = bU = x0.75 + 1.5 (x0.75 − x0.25) ,
values outside of boundaries are considered to be suspicious, potential outliers;
• sample robust minimum and maximum (“outer boundaries”) deﬁned as B∗
U =
Q1 − 3(Q3 − Q1), B∗
U = Q3 + 3(Q3 − Q1), with realizations b∗
L = x0.25 − 3dQ, b∗
U =
x0.75 + 3dQ;
– if there are any xi < b∗
L ∨ xi > b∗
U , we call them distant values3
,
– if xi ∈ b∗
L, bL) ∨ (bU , b∗
U , these are outer values,
– if xi ∈ bL, bU , these are inner values or values close to median;
– normal distribution has these properties BU − BL = Q3 + 1.5DQ − Q1 + 1.5DQ =
4DQ
·
= 4.2; probability of xi /∈ BL, BU is then 0.04;
• sample robust skewness B1Q and B1O and their variance under asymptotic normality
B1·, where · = Q or O, with realizations deﬁned as
– quartile skewness
b1Q =
(x0.75 − x0.50) − (x0.50 − x0.25)
x0.75 − x0.25
, V aras(b1Q) = 1.84,
– octile skewness
b1O =
(x0.875 − x0.50) − (x0.50 − x0.125)
x0.875 − x0.125
, V aras(b1O) = 1.15.
Exercise 3 (height of 10-year old girls). Calculate all spread characteristics for the sample
with heights of 10-year old girls.
3
Symbol ∨ means “or” and symbol ∧ means “and”.
6
3 Basics of Probability
Exercise 4 (Simple random sample). In a simple random sample of size n from population
of ﬁnite size N, each element has an equal probability of being chosen. If we avoid choosing
any member of the population more than once, we call it simple random sample without
replacement4
.(Dalgaard 2008). If we put a member back to population after choosing it,
we talk about simple random sample with replacement5
. Let’s have a set M with
N = 10 elements and we want to choose n = 3 elements (a) without replacement and (b)
with replacement. How many combinations there are? How do these combinations look like
if M = {1, 2, . . . , 10}? Do the same for N = 100, n = 30 and set M = {1, 2, . . . , 100}.
Solution without code:
(a) Number of combinations is N
n
. If N = 10 and n = 3, then N
n
= N!
(N−n)!n!
= 10
3
= 120.
(b) Number of combinations with replacement is N+n−1
n
. If N = 10 and n = 3, then
N+n−1
n
= (N+n−1)!
(N−1)!n!
= 10+3−1
3
= 220. If N = 100 a n = 30, then N+n−1
n
= 100+30−1
30
=
2.009491 × 1029
.
Hints. choose(n,k), combn(n,k)6
, sample(x=..., size=..., replace=...)
Exercise 5 (Simple random sample). A group of people are labeled by their identiﬁcation
numbers (ID) from 1 to 30. Choose (a) randomly 5 people out of 30 without replacement,
(b) randomly 5 people out of 30 with replacement and ﬁnally (c) randomly 5 people out of
30 without replacement, where people with ID between 28 and 30 have 4× higher probability
of being chosen than people with ID between 1 and 27.
Repeat the sampling 100 times, concatenate all samples and use table to get frequencies
and conﬁrm that IDs 28 to 30 are indeed sampled more frequently.
Hints. rep(1, k), sample(..., prob=p)
Exercise 6 (Normal distribution). Let X be a random variable (it could represent for example
adult height) and let’s assume it is normally distributed with parameters µ (expectation or
mean) and σ2
(standard deviation) which is written as X ∼ N(µ, σ2
), µ = 140.83, σ2
= 33.79.
Normal distribution represents a probability distribution model for this random variable.
Calculate probability Pr (a < X < b) = Pr (X < b) − Pr (X < a) = FX (b) − FX (a), where
a = µ − kσ, b = µ + kσ, k = 1, 2, 3.7
Plot probability density function and ﬁll area between
points a and b and label axes x and y as shown in ﬁgure 1.
Partial solution:
a = µ − σ = 135.0171, b = µ + σ = 146.6429,
Pr (|X − µ| > σ) = 0.3173, Pr (|X − µ| < σ) = 1 − 0.3173 = 0.6827,
a = µ − 2σ = 129.2042, b = µ + 2σ = 152.4558,
Pr (|X − µ| > 2σ) = 0.0455, Pr (|X − µ| < 2σ) = 1 − 0.0455 = 0.9545,
4
n-combination without replacement from N members of set M.
5
n-combination with replacement from N members of set M.
6
requires library utils
7
Note that Pr (a < X < b) = Pr (a ≤ X ≤ b) because probability of a point (here a and b) is zero for
continuous random variables, i.e Pr(a) = Pr(b) = 0. This does not apply to discrete random variables.
7
a = µ − 3σ = 123.3913, b = µ + 3σ = 158.2687,
Pr (|X − µ| > 3σ) = 0.0027, Pr (|X − µ| < 3σ) = 1 − 0.0027 = 0.9973.
1 mu <- 0
2 sig <- 1
3 bin <- seq(mu-3*sig,mu+3*sig,by=sig)
4 pnorm(bin[7]) - pnorm(bin[1]) # 0.9973002
5 pnorm(bin[6]) - pnorm(bin[2]) # 0.9544997
6 pnorm(bin[5]) - pnorm(bin[3]) # 0.6826895
Probabilities 68.27 − 95.45 − 99.73 are called (empirical) rule or 3-sigma rule.
Hints. Similar example for exponential distribution.
1 # probability density curve
2 x = seq(0, 4, by=0.01)
3 y = dexp(x)
4
5 # tell R we will plot 3 graphs in one row
6 par(mfrow=c(1,3))
7
8 # plot three graphs
9 for(p in c(0.2, 0.1, 0.05)){
10 # plot probability density as a line
11 plot(x, y, type=’l’, xlab=paste0(’Area under the curve = ’, p), ylab=’density’)
12
13 # fill area under the curve
14 p_quantile = qexp(1-p)
15 xx = seq(4, p_quantile, by=-0.01)
16 pol_x = c(p_quantile, 4, xx)
17 pol_y = c(0, 0, dexp(xx))
18 polygon(pol_x, pol_y, col = "grey")
19 }
Exercise 7 (Normal distribution). Let X ∼ N(µ, σ2
), where µ = 150, σ2
= 6.25. Calculate
a = µ − x1−α
2
σ and b = µ + x1−α
2
σ so that Pr (a ≤ X ≤ b) = 1 − α is equal to 0.90, 0.95
a 0.99. Value x1−α is a quantile of standardized normal distribution, i.e. Pr(Z = X−µ
σ
<
x1−α) = 1 − α, Z ∼ N(0, 1). Plot density function, ﬁll area between points a and b and label
axes x a y as shown in the following ﬁgure.
Hints. qnorm(alpha)
This gives us rule 90 − 95 − 99 (so called adjusted 3-sigma rule). We used property
Pr uα/2 < Z < u1−α/2 = Φ u1−α/2 − Φ uα/2 = 1 − α, where Φ is cumulative distribution
function of normal distribution and in general α ∈ (0, 1/2); in the exercise we used α = 0.1,
0.05 a 0.01. See ﬁgure 2.
Exercise 8 (Interactive Normal Distribution). Create an interactive Shiny application showing
probability density function of normal distribution N(µ, σ2
), where µ = 0 and σ can be
interactively set by user. Fill area between points a and b, where a = µ−kσ, b = µ+kσ, k =
1, 2, 3 and make k interactive too. Finally, add graph title showing probability Pr (a ≤ X ≤ b).
8
Figure 1: 3-sigma rule; density curve with colored area under this curve between corresponding
quantiles on x-axis; volume of the area is equal to the probability of realization of random
variable between these quantiles
Figure 2: Adjusted 3-sigma rule; density curve with colored area under this curve between
corresponding quantiles on x-axis; volume of the area is equal to the probability of realization
of random variable between these quantiles
Hints. Installing shiny
To install shiny on your computer simply run install.packages(’shiny’) in R. If you’re installing
it on a faculty computer, you have to specify library too and also load jsonlite
package
1 # create a directory called "rlib" somewhere in your profile
2 libpath = ’path to your rlib directory’
3
4 # install shiny package (need only once)
5 install.packages("shiny", lib=libpath)
6
7 # load shiny package (need to run each time you start R)
8 library("shiny", lib=libpath)
9 library("jsonlite", lib=libpath)
9
Hints. Sample app
Create a folder normalPlot and copy following ﬁles ui.R and server.R into it (ﬁnd these
ﬁles in study materials). Run it from R console with command runApp(’normalPlot’, display
.mode="showcase") from parent directory of normalPlot (use function setwd to set your working
directory if needed). Insert your code from previous exercises into function renderPlot in
server.R to ﬁnish the exercise.
Listing 1: ui.R
1 # if missing, install with command ‘install.packages(’shiny’)‘
2 library(shiny)
3
4 # Define UI
5 shinyUI(fluidPage(
6
7 # Application title
8 titlePanel("Normal Distribution"),
9
10 # Sidebar with a slider input for the number of bins
11 sliderInput("sig",
12 "sigma:",
13 min = 0.1,
14 max = 3,
15 value = 1),
16 sliderInput("k",
17 "k:",
18 min = 1,
19 max = 3,
20 value = 1,
21 step = 1),
22
23 plotOutput(’normalPlot’)
24 ))
Listing 2: server.R
1 library(shiny)
2
3 # Define server logic
4 shinyServer(function(input, output) {
5
6 output$normalPlot <- renderPlot({
7 # insert your code from previous exercises here and modify it so that
8 # it uses values input$sig and input$k from ui.R
9 plot(c(1,2,3), c(1,2,3), main=paste(’Sigma:’, input$sig))
10 })
11 })
Exercise 9 (Interactive Normal Distribution 2). Build similar app for Exercise 7, but let
user choose if he wants to display normal distribution or exponential distribution and let him
choose α too, where Pr (a ≤ X ≤ b) = 1 − α for normal distribution and Pr (X ≤ b) = 1 − α
for exponential distribution. Furthermore, add sliders for µ and σ if user selects normal
distribution or slider λ if user selects exponential distribution.
10
Figure 3: Screenshot from interactive Shiny application with normal distribution.
Your plot should have ﬁxed x-axis limits (xlim). Also set reasonable min/max values for
sliders.
Hints. Use control widgets conditionalPanel and selectInput in ui.R.
Exercise 10 (Binomial Distribution). Let’s assume that number of people preferring treatment
A over treatment B follows a binomial distribution with parameters p (probability of
success) and N (number of independent trials) denoted Bin (N, p), where N = 20, p = 0.5,
i.e. people prefer both treatments equally. (a) What is the probability that 16 and more patients
will prefer treatment A over treatment B? (b) What is the probability that 16 and more
or 4 or less patients will prefer treatment A over treatment B?
Solution without code:
(a) Pr(X ≥ 16) = 1 − Pr(X < 16) = 1 − Pr(X ≤ 15) = 1 − i:xi≤15 Pr (X = xi) =
1 − i:xi≤15
N
xi
pxi
(1 − p)N−xi
= 1 − i:xi≤15
20
xi
0.5xi
(1 − 0.5)20−xi
= 0.0059.
(b) Pr(X ≤ 4, X ≥ 16) = 1 − i:xi≤15 Pr (X = xi) + i:xi≤4 Pr (X = xi) = 0.012. This
probability is twice the previous one since Bin(N, 0.5) is symmetric around 0.5.
Hints. pbinom(x, size=..., prob=...) gives you probability Pr(X ≤ x). How do you get
probability Pr(X ≥ x) using this function?
Exercise 11 (Binomial Distribution). Let’s assume that Pr(swirl) = 0.533 = p1 is the
probability of having dermatological pattern swirl on right thumb of male population and
Pr(other) = 0.467 = p2 is the probability of other patterns on right thumb of the same
population. Random variable X represents number of swirls and Y number of other patterns,
where X ∼ Bin(N, p1) a Y ∼ Bin(N, p2). Calculate
11
1. Pr(X ≤ 120) if N = 300
2. Pr(Y ≤ 120) if N = 300
Compare the results with your intuition (e.g. what do you think is the probability of getting
less than 120 heads if you ﬂip a coin 300 times?)
Exercise 12 (Moments of Bernoulli distribution). Random variable X has Bernoulli distribution
(X ∼ Ber(p)) if Pr(X = 1) = 1 − Pr(X = 0) = 1 − q = p. Derive its expected value
EX = k kPr(X = k) and variance VarX = E[(X − EX)2
] = E[X2
] − (EX)2
Exercise 13 (Moments of Binomial distribution). Random variable X has Binomial distribution
(X ∼ Bin(N, p)) if
Pr(X = k) =
N
k
pk
(1 − p)N−k
=
N!
k!(N − k)!
pk
(1 − p)N−k
for k ∈ {0, . . . , N}. Derive its expected value EX = k kPr(X = k) and variance VarX =
E[(X − EX)2
] = E[X2
] − (EX)2
Hints. The hard way involves Binomial theorem, for the easy way you need to realize
relationship between Binomial and Bernoulli random variable and use results from previous
exercise.
Exercise 14 (Normal approximation of binomial distribution). 8
Let Pr(man) = 0.515 be a proportion of men in population and Pr(women) = 0.485
proportion of women. Let X represent number of men and Y number of women. Under the
assumption of model Bin(N, p) calculate
1. Pr(X ≤ 3) if N = 5
2. Pr(X ≤ 5) if N = 10
3. Pr(X ≤ 25) if N = 50.
Compare these probabilities with those approximated by normal distribution N(Np, Npq).
Plot density function of normal distribution and superimpose it with probability distribution
of binomial distribution. Plot cumulative distribution function of normal distribution and
superimpose it with cumulative distribution function for binomial distribution. See ﬁgure 4.
Hints. par(mfrow=c(2, 3)), plot(..., type=’h’)
plot function creates a new graph, to add points or lines to an existing graph, use commands
points or lines. These functions take the same arguments as plot.
8
Approximation means ”similar but not exactly equal”, i.e. we approximate some distribution with a
diﬀerent one (that has certain advantages over the approximated one) or we approximate data with some
distribution (that describes data with help of easily interpretable parameters)
12
Figure 4: Approximating binomial distribution with normal distribution for p = 0.515, N =
5, 10 and 50; scatter plot superimposed with density (ﬁrst row) and cumulative distribution
function (second row).
Exercise 15 (Levelplots / Contingency tables).
Analyze data/reputation.data for potential security risks. Plot heatmap of Risk vs Reliability
for all IPs (see ﬁgure 5).
Then plot the same risk-reliability heatmap, but grouped by SimpleType. SimpleType has the
same values as Type, but all values with multiple types (multiple types are separated by ;) are
replaced by ”Multiples” value (see ﬁgure 6).
Try removing dominating type ”Scanning Host” to get more insights into other types.
Hints. Simple heatmap
1. Load data into variable av and check their correctness using head(av) and str(av).
2. Create a frequency table for heatmap
1 rr.df = data.frame(table(av$Risk, av$Reliability))
2 colnames(rr.df) <- c("Risk", "Reliability", "Freq")
13
3. Plot level plot using following command (experiment with other options)
1 library(lattice)
2 levelplot(Freq~Risk*Reliability,
3 data=rr.df, main="Risk ~ Reliability", ylab="Reliability", xlab = "Risk",
4 shrink = c(0.5, 1),
5 col.regions = colorRampPalette(c("#F5F5F5", "#01665E"))(20))
Hints. Heatmap grouped by type
1. Create a new column SimpleType that is the same as Type and convert it from factor
variable to character with as.character. Then replace all values containing ; with value
”Multiples” (use grep to select these rows).
2. Create frequency table like before and change the ﬁrst argument of levelplot to
levelplot(Freq ~Reliability*Risk|Simpletype, ...).
Figure 5: Heatmap for reputation data.
Exercise 16 (Bivariate normal distribution). Random vector (X, Y )T
has bivariate normal
distribution
N2 (µ, Σ) , where µ = (µ1, µ2)T
and Σ =
σ2
1 ρσ1σ2
ρσ1σ2 σ2
2
,
14
Figure 6: Heatmap for reputation data grouped by Type with all multiple types merged into
one value ”Multiples”.
with density
f (x, y) =
1
2π σ2
1σ2
2 (1 − ρ2)
exp −
1
2 (1 − ρ2)
(x−µ1)2
σ2
1
− 2ρ(x−µ1)(y−µ2)
σ1σ2
+ (y−µ2)2
σ2
2
,
where (x, y)T
∈ R2
, µj ∈ R1
, σ2
j > 0, j = 1, 2, ρ ∈ −1, 1 are parameters, then θ =
(µ1, µ2, σ2
1, σ2
2, ρ). Expression in the exponent can be also written as
−
1
2
x − µ1
y − µ2
T
σ2
1 ρσ1σ2
ρσ1σ2 σ2
2
−1
x − µ1
y − µ2
,
marginal distributions9
are X ∼ N (µ1, σ2
1) and Y ∼ N (µ2, σ2
2), ρ is a correlation coeﬃ-
cient10
.
9
Marginal distribution is a distribution of marginal random variable. Marginal distribution of multivariate
normal distribution is again normal, which is a very useful property.
10
From this example it is clear that to suﬃciently describe bivariate normal distribution we need 5 pa-
15
Exercise 17 (Bivariate normal distribution). (1) Create a function bnorm(x,y,mu1,mu2,sigma1
,sigma2,corr) returning density of a bivariate normal distribution. Use function properties
to test that it works correctly.
(2) Plot density of bivariate normal distribution N2 (µ, Σ) using function image() and superimpose
it with contour graph of the same distribution using function contour().
(3) Plot density of bivariate normal distribution N2 (µ, Σ) using function persp(). Cut density
into 12 intervals, where values in these intervals correspond to colors terrain.colors(12).
Use following parameters
(a) µ1 = 0, µ2 = 0, σ1 = 1, σ2 = 1, ρ = 0;
(b) µ1 = 0, µ2 = 0, σ1 = 1, σ2 = 1, ρ = 0.5;
(c) µ1 = 0, µ2 = 0, σ1 = 1, σ2 = 1.2, ρ = 0.5.
See ﬁgure 5 for correct solution.
Hints.
1. create a function bnorm(x,y,mu1,mu2,sigma1,sigma2,corr) returning density of bivariate
normal distribution
2. to test your implementation, use properties such as symmetry, the fact that function
value goes to zero as you go further away from mean, location of the maximum, etc.
3. create vectors x and y with values from -3 to 3 and length n and their cartesian product
(xi, yj), i = 1, . . . , n, j = 1, . . . , n (represented as either n2
× 2 matrix or two vectors
of length n2
)
4. apply your function bnorm on the cartesian and reshape it to a matrix Z with dimensions
n × n
5. use x, y and Z in image function to plot density
You don’t have to use Greek letters in axis labels (use mu 1 instead of µ1), but if you
want to, look up expression function. For coloring of persp look at the last example in persp
documentation and modify it to your needs.
Exercise 18 (Kernel density estimation). Simulate N samples from given distribution and
plot them as + symbols on x-axis together with their kernel density estimation. Add theoretical
normal distribution (resp. exponential distribution) with parameters estimated from data.
Do it for (a) N = 50 (b) N = 1000 and following distributions
1. X ∼ N(0, 1)
2. X ∼ pN(−3, 1) + (1 − p)N(3, 1) is a mixture of normals, where p = 0.3
3. X ∼ Exp(λ), where λ = 3. Plot theoretical exponential distribution with estimated
parameter ˆλ = 1
¯X
= N
N
i=1 Xi
.
rameters, i.e. mean and variance for marginal distributions of random variables X and Y and correlation
coeﬃcient ρ = ρ(X, Y ) describing the strength of linear relationship of X and Y .
16
Figure 7: Density of bivariate normal distribution with diﬀerent parameters (ﬁrst row –
contour graph, second row – perpective 3D graph displayed as surface); the larger absolute
value of ρ, the more are contours diﬀerent from circles (they are changing into ellipses); as
the diﬀerence between σ1 and σ2 is getting larger, we say that diﬀerence in variability of X1
and X2 is increasing
See ﬁgure 8 for correct solution.
Hints. To get the same results as in the ﬁgure, call set.seed(5) at the beginning of your
script.
Useful functions: density(..., from=, to=, n=), plot(..., lty=2), points(..., pch=3)
Exercise 19 (Bivariate normal distribution). Simulating pseudorandom numbers from N2 (µ, Σ)
can be done with in several ways:
1) library library(MASS) and function mvrnorm();
2) library library(mvtnorm) and function rmvnorm();
3) function rnorm() and this algorithm – let X1 ∼ N(0, 1) and X2 ∼ N(0, 1); then (Y1, Y2) ∼
N2 (µ, Σ), where µ = (µ1, µ2)T
is vector of means and σ2
1, σ2
2 and ρ are parameters of covariance
matrix Σ, where the strength of linear relationship Y1 and Y2 is given by magnitude
and sign of ρ; Y1 = σ1X1 + µ1 a Y2 = σ2(ρX1 + 1 − ρ2X2) + µ2.
Simulate pseudorandom numbers Y1 and Y2 from N2 (µ, Σ) using the ﬁrst method. Calculate
kernel density estimation of (Y1, Y2)T
using function kde2d(). Plot it using function
image() and superimpose it with contour graph of bivariate normal density of N2 (µ, Σ)
17
Figure 8: Kernel density estimation of several distributions (ﬁrst row n = 50; second row
n = 1000)
using function contour().Use following parameter in simulations
(a) µ1 = 0, µ2 = 0, σ1 = 1, σ2 = 1, ρ = 0; (1) n = 50 a (2) n = 1000;
(b) µ1 = 0, µ2 = 0, σ1 = 1, σ2 = 1, ρ = 0.5; (1) n = 50 a (2) n = 1000;
(c) µ1 = 0, µ2 = 0, σ1 = 1, σ2 = 1.2, ρ = 0.5; (1) n = 50 a (2) n = 1000.
See ﬁgure 11 for correct solution.
Hints. kde2d(..., n=100, lims=c(-3, 3, -3, 3))
Exercise 20 (Mixture of two normal distributions). Simulate pseudorandom numbers (1)
from mixture of normal distributions pN2 (µ1, Σ1) + (1 − p)N2 (µ2, Σ2) where
θ1 = (µ11, µ12, σ2
11, σ2
12, ρ1, µ21, µ22, σ2
21, σ2
22, ρ2)T
and (2) from bivaraite normal distribution
N2 (µ, Σ), where parameters represent combined vector of means a combined covariance matrix.
i.e. θ2 = (µ1, µ2, σ2
1, σ2
2, ρ)T
. For (1) calculate kernel density estimation of (X, Y )T
using function kde2d().
(a) Plot theoretical density (2) using function image() and superimpose it with contour graph
of theoretical density (2) using function contour().
(b) Plot theoretical density (1) using function image() and superimpose it with contour graph
18
Figure 9: Density of bivariate normal distribution (ﬁrst row n = 50; second row n = 1000)
of theoretical density (1) using function contour().
(c) Plot kernel density estimation of (1) using function image() and superimpose it with
contour graph of theoretical density (1) using function contour().
Cut density into 12 intervals, where values in these intervals correspond to colors terrain.colors(12).
For simulation use these parameters θ1 = (−1.2, −1.2, 1, 1, 0, 1, 1, 1, 1, 0)T
, n = 100 and
p = 0.5.
Use parameters from simulation for case (1), i.e. θ1 = θ1 = (µ11, µ12, σ2
11, σ2
12, ρ1, µ21, µ22, σ2
21, σ2
22, ρ2)T
=
(−1.2, −1.2, 1, 1, 0, 1, 1, 1, 1, 0)T
. For case (2) estimate parameters from simulated data θ2 =
(µ1, µ2, σ2
1, σ2
2, ρ)T
.
See ﬁgure 8 for correct solution.
Hints. For parameter estimations in case (2) use functions mean, std and cor.
Exercise 21 (Poisson distribution). We have a data (Greenwood and Yule 1920) with number
of injuries of factory workers in the following table
n 0 1 2 3 4 ≥ 5
mn 447 132 42 21 3 2
where n is number of injuries and mn number of workers with n injuries.
19
Figure 10: Combined density of bivariate normal distribution (left), mixture density of
two bivariate normal distributions (middle) and kernel density estimation superimposed by
mixture density of two bivariate normal distributions (right)
Calculate expected number of worker injuries under the assumption that random variable
X representing injuries has Poisson distribution with parameter λ = n nmn
n mn
, i.e. X ∼
Poiss(λ). Create a table with mn and expected mn and display it (ﬁgure 11).
Hints. Create a dataframe with two columns and proper row names, then print it. data.
frame(mn=..., expected_mn=..., row.names=...).
Figure 11: Observed and expected frequencies of Poisson distribution.
Exercise 22 (Binomial distribution). In a study from 1889 based on medical records in
Saxony professor Geissler (1889) recorded distribution of boys in families. The study included
M = 6115 families with N = 12 children with following number of boys (n stands for number
of boys and mn number of families with n boys)
n 0 1 2 3 4 5 6 7 8 9 10 11 12
mn 3 24 104 286 670 1033 1343 1112 829 478 181 45 7
Calculate expected mn under the assumption that number of boys X in families follow
binomial distribution π =
N
n=0 nmn
NM
and N = 12 (i.e. X ∼ Bin(N, π)).
Compare expected and observed frequencies - do you see any diﬀerence? Display mn and
expected mn in a table and visualize both observed and expected frequencies in one graph (see
ﬁgure 10 or use your own imagination). Calculate probability that family will have ≥ 4 and
≤ 6 boys from theoretical distribution (i.e. Pr(4 ≤ X ≤ 6)).
20
Figure 12: Observed and expected frequencies of Binomial distribution
Hints. Add legend with legend(’topleft’, c(’observed’, ’expected’), col=c(’black’, ’red’
), lty=1, bty=’n’). For stairs graph use argument type=’S’ or type=’s’
Exercise 23 (Approximating Binomial distribution by Poisson). If each of 50 million people
in Italy drives a car next week (independently), then probability of dying in an accident
will be 0.000002, where number of fatalities has binomial distribution Bin(50mil, 0.000002).
Approximate this Binomial distribution with Poisson that has the same mean. Plot both
probabilities for values {50, 51, . . . , 150}.
Exercise 24 (Contingency table). 356 people have been polled on their smoking status
(Smoke) and their socioeconomic status (SES). For each person it was determined whether or
not they are current smokers, former smokers, or have never smoked. Also, for each person
their socioeconomic status was determined (low, middle, or high). The data ﬁle smoker.csv
(available from study materials) contains only two columns - Smoke and SES. Load this data
into and create contingency table with table() function.
These observations (after converting to probability) form joint distribution Pr(X =
x, Y = y), where X is a discrete random variable for smoker and Y for SES. Calculate
marginal distributions Pr(X = x) = y Pr(X = x, Y = y), x ∈ {current, former, never}
and Pr(Y = y), y ∈ {high, low, middle}. Also calculate and compare Pr(X = x|Y = high)
with Pr(X = x|Y = low) and Pr(Y = y|X = smoker).
Create a table with expected frequencies using assumption that X and Y are independent
(ﬁgure ). Visualize both tables (observed and expected) with mosaicplot function (ﬁgure ).
21
Hints. To calculate marginal probabilities use either special table functions margin.table
and prop.table or simply rowSums and colSums.
Figure 13: Observed and expected frequencies for contingency table
Figure 14: Mosaic plot of contingency table
Exercise 25 (Number of contingency tables). How many contingency tables of size 4 × 2
there are for N = 20?
Exercise 26 (Multinomial distribution). We have following random variables (1) socioeconomic
status (high - H, low - Lo), (2) political aﬃliation (democrat - D, republican - R) and
(3) political philosophy (libral - Li, convservatism - C). Let’s denote their interactions like
this: X1 (H-D-Li), X2 (H-D-C), X3 (H-R-Li), X4 (H-R-C), X5 (Lo-D-Li), X6 (Lo-D-C),
X7 (Lo-R-Li) a X8 (Lo-R-C). We have random sample of size N = 50. Probabilities pj are
in the following table
D-Li D-C R-Li R-C all
H 0.12 0.12 0.04 0.12 0.4
Lo 0.18 0.18 0.06 0.18 0.6
all 0.30 0.30 0.10 0.30 1.0
Calculate Var[X1], Var[X3], Cov [X1, X3], Cor [X1, X3] and expected frequencies Npj, j =
1, 2, . . . , 8.
22
Hints. X = (X1, X2, . . . , X8) ∼ Mult(N, p), where N = 50, p = (p1, p2, . . . , p8)T
, we know
that Xj ∼ Bin(N, pj), pj are given by table above j = 1, 2, . . . , 8.
Exercise 27 (Product-multinomial distribution). Let’s have the same probabilities as in
previous exercise, but now with two separate random samples - ﬁrst with size N1 = 30
from group H and the second with size N2 = 20 from group Lo. Denote interactions of
random variables X11 = X1|1 (H-D-Li), X12 = X2|1 (H-D-C), X13 = X3|1 (H-R-Li), X14 =
X4|1 (H-R-C), X21 = X1|2 (Lo-D-Li), X22 = X2|2 (Lo-D-C), X23 = X3|2 (Lo-R-Li) and
X24 = X4|2 (Lo-R-C), where X1 = (X11, X12, X13, X14)T
and X2 = (X21, X22, X23, X24)T
.
Then X = (X1, X2) has product-multinomial distribution with K = 2, N1 = 30, J1 = 4,
N2 = 20, J2 = 4. Notation Xj|k, where j = 1, 2, 3, 4 and k = 1, 2 highlights the fact, that
distribution is conditioned by socioeconomic status (high – H, low – Lo), i.e. distribution
in table columns is conditioned by its row. Realizations Xj|k are denoted as nj|k = nkj,
similarly probabilities Xj|k = Xkj as pj|k = pkj. Calculate conditional probabilities pj|k,
expected frequencies Nkpkj, Var[X13], Cov [X21, X23] and Cor [X11, X23].
Exercise 28 (Binomial distribution, simulation study). Generate M = 1000 pseudo-random
numbers from distribution Bin(5, 0.5). Create a table with observed and theoretical probabilities
(for n = 0, 1, . . . , 5).
r 0 1 2 3 4 5
observed probabilities 0.031 0.158 0.302 0.324 0.161 0.024
theoretical probabilities 0.031 0.156 0.312 0.312 0.156 0.031
Superimpose histogram of simulated numbers with theoretical density function. See ﬁgure 15.
Hints. For histogram plot use breaks and probability arguments.
Exercise 29 (Normal distribution, simulation study). If random variable X ∼ N(150, 6.25),
what distribution has its arithmetic average Xn? Verify your result using simulations for
n = 30. Make 5000 simulations (each simulation involves generating 30 realizations of X)
and for each simulation calculate arithmetic average xm, m = 1, 2, . . . , M, where M =
5000. Superimpose their histogram (normalized) with theoretical density function for Xn you
derived. Calculate Pr(Xn > 151) from simulated data and compare this result with theoretical
(expected) probability. See ﬁgure 16.
Exercise 30 (Normal distribution, simulation study II). Let X ∼ N(µ1, σ2
1) a Y ∼ N(µ2, σ2
2).
Then Xn1 − Y n2 ∼ N(µ1 − µ2,
σ2
1
n1
+
σ2
2
n2
). Use µ1 = 100, σ1 = 10, µ2 = 50, σ2 = 9 and (a)
n1 = 4, n2 = 5, (b) n1 = 100, n2 = 81. Make M = 1000 simulations and for each calculate
diﬀerence xm − ym, m = 1, 2, . . . , M. Superimpose histogram (normalized) of the diﬀerences
with theoretical density function of diﬀerence Xn1 −Y n2 . For both cases (a) and (b) calculate
Pr(Xn1 − Y n2 ) < 52 from simulated data and compare this result with theoretical probability.
23
Figure 15: Histogram superimposed with theoretical distribution of Bin(5, 0.5)
4 Likelihood
Exercise 31 (Maximum-likelihood derivation). Let X ∼ Poiss(λ) and x1, . . . , xn are its
i.i.d. realizations. Analytically derive its log-likelihood function l(λ|x). Recall that
l(λ|x) = ln L(λ|x) = ln
n
i=1
f(xi) =
n
i=1
ln f(xi)
.
Exercise 32 (Maximum-likelihood, Poisson distribution). Let X ∼ Poiss(λ). Simulate
realizations x1, . . . , xn. Use n = 100 and λ = 4. Plot log-likelihood function of Poisson
distribution
l(λ|x) =
n
i=1
xi ln λ − nλ.
for λ ∈ [2, 6]. Find maximum likelihood estimate of λ and mark it in graph with dashed line.
See ﬁgure 17.
Hints. Start with writing a function loglikehood = function(x, lambda) that calculates loglikelihood
for single λ. Then in a for loop or using sapply function calculate log-likelihood
for λ ∈ [2, 6].
To ﬁnd maximum of a function, either use formula for MLE from the lecture or ﬁnd
it empirically from vector of log-likelihoods using function which.max(lambdas). To plot a
vertical line use abline(v=..., lty=’dashed’).
24
Figure 16: Histogram superimposed with theoretical distribution of N(150, 6.25/n)
Exercise 33 (Maximum-likelihood, Poisson distribution). Use data from exercise 21 for
log-likelihood function from the previous exercise and plot it. Note that the log-likelihood
function will have to be slightly modiﬁed to work with counts of realizations from exercise
21. See ﬁgure 17.
Exercise 34 (Maximum-likelihood, Binomial distribution). Let X ∼ Bin(N, p) and x its
realization11
. Formulate its likelihood function L(p|x) and log-likelihood function l(p|x). Use
log-likelihood function to derive maximum-likelihood estimate ˆp = argmax
p
l(p|x) of parameter
p and Fisher information I(ˆp) = − ∂2
∂p2 l(p|x)|p=ˆp.
Exercise 35 (Quadratic approximation of log-likelihood function). Let X ∼ Bin(N, p) and
x its realization. Plot scaled log-likelihood function of binomial distribution with p on x-axis
and scaled log-likelihood l∗
(p|x) = l(p|x) − max(l(p|x)) on y-axis. Compare l∗
(p|x) with its
quadratic approximation calculated using Taylor approximation ln(L(p|x)
L(p|x)
) ≈ −1
2
I(p)(p − p)2
.
Use (a) x = 8, N = 10, (b) x = 80, N = 100 and (c) n = 800, N = 1000 with reasonable
ranges for both axes. See ﬁgure 18.
Hints. Use lines(..., lty="dashed") for the dashed line.
11
We work with just one realization, so L(p|x) =
1
i=1 f(x|p). You would get the same results if you
worked with N realizations with Bernoulli distribution Ber(p).
25
Figure 17: Left: Log-likelihood function for simulated data from Poiss(λ = 4). Right:
Log-likelihood function for number of injuries of factory workers.
5 Hypothesis Testing
Exercise 36 (Hypothesis testing). Let X ∼ Bin(N, p) be a random variable with N = 1000
and p = 0.5. Make 100000 simulations x1, . . . , x100000 and for each simulation calculate
maximum likelihood estimate of p, i.e. ˆp = x
N
. Plot histogram of ˆp and add vertical line at
value 0.534. Calculate percentage of ˆp that are higher than 0.534.
Exercise 37 (Hypothesis testing). Out of 1000 newborn children, 534 of them were boys
and 466 girls. Test null hypothesis that probabilities of having a boy or a girl are equal.
Calculate p-value, conﬁdence interval, test statistic and critical region. Use Wald statistic
and Likelihood statistic.
Hints. Use uniroot to solve inequality in conﬁdence interval for Likelihood statistic.
26
Figure 18: Compare scaled log-likelihood function (full line) with its quadratic approximation
(dashed line).
Figure 19: Histogram of simulations of ˆp with line at 0.534.
27