Due by 30/10/2016.
Complete exercises and replicate outputs. Note that only PDF and HTML formats will be accepted. All R code you used to generate figures should be included in the document.
There are plenty of options for creating PDF documents with inline R code such as knitr, sweave, sense.io, jupyter, Anaconda cloud or simply save the plots as .png images and add them manually to your document together with R code.
Your output can be stylistically different from the outputs below (e.g. vectors or matrices will show up with row/column numbers). This doesn't matter as long as you have the correct numbers, labels and graphs.
Note. If you decide to use Jupyter Notebook, run command options(jupyter.plot_mimetypes = 'image/png')
in the first cell to get around bug with SVG images.
Load fivemin.csv
dataset with network usage data recorded every 5 minutes over 8 hours. Select data for type=Server
and report all location and spread characteristics for $sessions$. (It's up to you whether you decide to use print
commands or put all characteristics in a dataframe or list and print that)
Let $X$ be a random variable representing number of sessions
for type=Server
from the previous exercise. Answer the following questions
Answer these questions using two different approaches:
i) Do not make any assumptions about distribution of X and calculate empirical probabilities (relative frequencies), i.e.
P(a < X < b) = \frac{\text{nr of times X is between } a \text{ and } b}{\text{nr of all occurences}}
ii) Assume that $X$ is continuous and follows a normal distribution N(\mu, \sigma^2). It's parameters can be estimated by
\hat{\mu} = \frac{\sum{i=1}^{n} x_i}{n}, \quad \quad \hat{\sigma}^2 = s^2 = \frac{\sum{i=1}^{n} (x_i - \hat{\mu})^2}{n - 1}. Calculate given probabilities from distribution N(\hat{\mu}, \hat{\sigma}^2).