Homework 2 - no graphics

Due by 30/10/2016.

Complete exercises and replicate outputs. Note that only PDF and HTML formats will be accepted. All R code you used to generate figures should be included in the document.

There are plenty of options for creating PDF documents with inline R code such as knitr, sweave, sense.io, jupyter, Anaconda cloud or simply save the plots as .png images and add them manually to your document together with R code.

Your output can be stylistically different from the outputs below (e.g. vectors or matrices will show up with row/column numbers). This doesn't matter as long as you have the correct numbers, labels and graphs.

Note. If you decide to use Jupyter Notebook, run command options(jupyter.plot_mimetypes = 'image/png') in the first cell to get around bug with SVG images.

1. Location and spread characteristics

Load fivemin.csv dataset with network usage data recorded every 5 minutes over 8 hours. Select data for type=Server and report all location and spread characteristics for $sessions$. (It's up to you whether you decide to use print commands or put all characteristics in a dataframe or list and print that)

Sessions
=================================
Minimum: 1293 
Maximum: 2678 
Mean: 1904.573 
... TODO ...
Packets
=================================
Minimum: 14857 
Maximum: 4015541 
Mean: 204540.1 
... TODO ...

2. Parameter estimation for normal distribution

Let $X$ be a random variable representing number of sessions for type=Server from the previous exercise. Answer the following questions

  • a) What is the probability P(X > 2000) (i.e. high load load)?
  • b) What is the probability P(1500 < X < 2000) (i.e. probability of usual network load)?
  • c) What is the probability P(X > 2700) (i.e. probability of extreme network load)?

Answer these questions using two different approaches:

  • i) Do not make any assumptions about distribution of X and calculate empirical probabilities (relative frequencies), i.e.

    P(a < X < b) = \frac{\text{nr of times X is between } a \text{ and } b}{\text{nr of all occurences}}

  • ii) Assume that $X$ is continuous and follows a normal distribution N(\mu, \sigma^2). It's parameters can be estimated by

    \hat{\mu} = \frac{\sum{i=1}^{n} x_i}{n}, \quad \quad \hat{\sigma}^2 = s^2 = \frac{\sum{i=1}^{n} (x_i - \hat{\mu})^2}{n - 1}. Calculate given probabilities from distribution N(\hat{\mu}, \hat{\sigma}^2).