against the null hypothesis
$$ H_0 : temperature_{2019} > temperature_{average} $$We are going to transform this into the form
$$ H_0 : temperature_{2019} - temperature_{average} = 0 $$which creates only ONE VECTOR which we can work with.
T_2019 <- c(-1.7, 1.7, 5.6, 9.4, 10.7, 20.7, 18.8, 18.9, 13.3, 9.5, 5.6, 1.9)
T_average <- c(-2.8, -1.1, 2.5, 7.3, 12.3, 15.5, 16.9, 16.4, 12.8, 8.0, 2.7, -1.0)
x <- T_2019 - T_average
Test normality in any way you prefer. Remember the null hypothesis is that our data is indeed normal.
if(shapiro.test(x)$p.value > 0.05){print("pass")} else{print("fail")}
[1] "pass"
Now we perform the usual type of testing, using the test statistic
$$ t = \frac{x - mean}{\frac{standard \; deviation}{\sqrt{n}}} \sim students \; distribution(df = n - 1)$$Construct
critical region using quantiles
confidence interval using the formula from previous seminars
$p$-value using the formula from previous seminars
And then check your results using the magic function t.test()
. The arguments are
paired = T
indicating that our data correspond to the same monthsalternative = "greater"
indicates we are testing the right-side alternativet.test(T_2019, T_average, paired = T, alternative = "greater")
Paired t-test data: T_2019 and T_average t = 4.3394, df = 11, p-value = 0.000588 alternative hypothesis: true mean difference is greater than 0 95 percent confidence interval: 1.216245 Inf sample estimates: mean difference 2.075
See the structure of the result
str(t.test(T_2019, T_average, paired = T, alternative = "greater"))
List of 10 $ statistic : Named num 4.34 ..- attr(*, "names")= chr "t" $ parameter : Named num 11 ..- attr(*, "names")= chr "df" $ p.value : num 0.000588 $ conf.int : num [1:2] 1.22 Inf ..- attr(*, "conf.level")= num 0.95 $ estimate : Named num 2.07 ..- attr(*, "names")= chr "mean difference" $ null.value : Named num 0 ..- attr(*, "names")= chr "mean difference" $ stderr : num 0.478 $ alternative: chr "greater" $ method : chr "Paired t-test" $ data.name : chr "T_2019 and T_average" - attr(*, "class")= chr "htest"
Pull the value of the test statistic and compare it with the appropriate quantiles
t.test(T_2019, T_average, paired = T, alternative = "greater")$statistic
Pull the confidence interval
t.test(T_2019, T_average, paired = T, alternative = "greater")$conf.int
What number should be in the confidence interval for us to not reject the null hypothesis? Look at the formulation of the null hypothesis
$$ H_0 : temperature_{2019} - temperature_{average} = 0 $$Pull the $p$-value
t.test(T_2019, T_average, paired = T, alternative = "greater")$p.value
In this task our data isn't essentially paired. Examples of paired samples would be
Typically the paired samples arise in these types of scenarios
Test the null hypothesis that the variances of the two samples are the same.
$$H_0 : \sigma^2_1 = \sigma^2_2 $$You can use the in-built function var.test()
or do in manually
mach.1 <- c(29, 27, 29, 35, 29, 32, 28, 34, 32, 33)
mach.2 <- c(31, 28, 30, 28, 37, 29, 27, 27, 39, 33,
31, 32, 31, 29, 32, 28, 27, 28, 24, 34)
alpha <- 0.05
n1 <- length(mach.1)
sigma1 <- sd(mach.1)
n2 <- length(mach.2)
sigma2 <- sd(mach.2)
F <- sigma1^2/sigma2^2
CR_1 <- qf(alpha/2, n1-1, n2-1)
CR_2 <- qf(1 - alpha/2, n1-1, n2-1)
print(c(CR_1, F, CR_2))
[1] 0.2714929 0.5807166 2.8800520
Now use the t.test which assumes equal variances. Compute the common variance using this formula
$$ S = \sqrt{ \frac{ (n_1 - 1) \sigma_1^2 + (n_2 - 1) \sigma_2^2}{n_1 + n_2 - 2} } $$S = sqrt(((n1-1)*sigma1^2+(n2-1)*sigma2^2)/(n1+n2-2))
The test statistic is going to be
$$ t = \frac{mean_1 - mean_2}{S \sqrt{\frac{n_1 + n_2}{n_1 n_2}}} \sim students \; distribution (n_1 + n_2 - 2)$$Now you can compute
Use the in built function to do the same thing.
t.test(mach.1, mach.2, var.equal = T)
Two Sample t-test data: mach.1 and mach.2 t = 0.4245, df = 28, p-value = 0.6744 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.103981 3.203981 sample estimates: mean of x mean of y 30.80 30.25
Remember that Poisson process models the number of events. Number of events in one sector is going to have Poisson distribution with parameter $\lambda$. Number of events in all sectors is going to also follow Poissons distribution with the parameter $\lambda$ multiplied by the number of sectors.
The value S plays the role of the test statistic. The parameter is going to be $\lambda$ multiplied by the number of sectors.
S <- sum(n_bombs*n_sectors) # total number of events
# hint for computing the p-value
ppois(S, lambda*n)
# hint for visualization
qpois(ppois(S,lambda*n))
Using central limit theorem
$$ t = \frac{mean - \lambda_0}{\sqrt{\lambda_0}} \sim N(0,1) $$Confidence interval is going to be
$$ mean \pm q_{1 - \alpha/2} \cdot \frac{mean}{n}$$Mean here denotes the average number of events per sector!