Homework 3

Due by 11/20/2016 (Sunday!).

Complete exercises and replicate outputs. Note that only PDF and HTML formats will be accepted. All R code you used to generate figures should be included in the document.

There are plenty of options for creating PDF documents with inline R code such as knitr, sweave, sense.io, jupyter or simply save the plots as .png images and add them manually to a document together with R code.

Your output can be stylistically different from the outputs below (e.g. vectors or matrices will show up with row/column numbers). This doesn't matter as long as you have the correct numbers, labels and graphs.

1. Statistical Distance

Complete exercise 12 (Normal approximation of binomial distribution) and calculate statistical distance between binomial distribution and its normal approximation.

For probability density functions (first row), use Kullback-Leibler divergence (KL-divergence) which is defined as $$ D_{\mathrm{KL}}(P\|Q) = \sum_{i=0}^N P(i) \, \log_2 \frac{P(i)}{Q(i)}. $$ Here $P(i) = Pr(X = i)$, where $X \sim Bin(p, N)$, i.e. probability for binomial distribution and $Q(i)$ is a probability density function of its normal approximation.

For cumulative distribution functions (second row), use Kolmogorov-Smirnov statistic which is defined as $$ D_{\mathrm{KS}} = max_{i \in \{0,\ldots, N\}} |Pr(X \leq i) - Pr(\tilde{X} \leq i)|. $$ Here $Pr(X \leq i)$ is a cumulative distribution function for binomial distribution and $Pr(X \leq i)$ is a cumulative distribution function of its normal approximation.

Add these distances to titles and round to 5 decimal numbers.

2. Multivariate normal distribution

a) probability density function

Create function dnorm(x, mu, Sigma) calculating probability density of multivariate normal distribution (i.e. normal distribution for arbitraty dimensions). $k$-dimensional random vector $X \sim \mathcal{N}_k(\mathbf{\mu}, \Sigma)$ has density $$ f_{X}(\mathbf{x}) = \dfrac{1}{\sqrt{(2\pi)^k|\mathbf{\Sigma}|}}\, \exp\left(-\frac{1}{2}(\mathbf{x}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{\mu})\right), $$ where $\mu = (\mu_1, \ldots, \mu_k) \in \mathbb{R}^k$ is a mean vector and $\mathbf{\Sigma} = [\operatorname{Cov}[X_i, X_j]] \in \mathbb{R}^{k \times k}$ is a covariance matrix. $|\mathbf{\Sigma}|$ stands for determinant and $\mathbf{\Sigma}^{-1}$ for inverse of matrix $\mathbf{\Sigma}$.

Testing your function on parameters $$ \mathbf{x} = (0,0,0)^T \\ \mu = (0,1,2)^T \\ \mathbf{\Sigma} = \begin{pmatrix} 1 & 0.8 & 0\\ 0.8 & 2 & 0 \\ 0 & 0 & 0.5 \end{pmatrix} $$ should return $0.0009764066\ldots$.

b) log of probability density function

Analytically derive (and simplify) logarithm of $f_{X}(\mathbf{x})$. Don't write down just the solution, but include at least 1 intermediate step (use LaTeX or something similar for math). Then add parameter use_log to your function from part a) and return $\log f_{X}(\mathbf{x})$ if parameter use_log==TRUE.

Note: Solutions that implement it as log(dnorm(x, mu, Sigma)) will obviously not be accepted

Note to Czech/Slovak students: derive does not mean to take a derivative

[1] -6.931631