t-stud

---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
Other tests. Continuation from here

• When the number N of the data is small (N<10), instead to the calculation of the
standard deviation of the mean, to better evaluate the confidence intervals a particular
procedure called "Student's t" is used.
Suppose 15, 17, 18, 19, 25 are the weights in grams of 5 mice. I want to find with the
95% confidence an interval for the value of the "true" average. We have already seen
that using the Gasussian we could proceed as follows:

mice = c(15,17,18,19,25); m = mean(mice); S = SdM(mice)
c(m-2*S, m, m+2*S)
# 15.42954  18.80000  22.17046

Using the Student t-test (for which we use the t.test command) we have:

t.test(mice, conf.level = 0.95)
# One Sample t-test
# t = 11.1557, df = 4
# 95 percent confidence interval:
# 14.12105 23.47895
# mean of x 
# 18.8 

A slightly larger interval is obtained. The phrase "One Sample t-test" refers to the
fact that Student's test is used here to study a single random variable. We will soon
see that it can be used to study the relationship between two random variables. Without
going into details, we observe only that "df" has the value "N-1" and that the values
14.1 and 23.5 obtained are the extremes of the interval (centered in 18.8) where the
following function "dtS" (similar to the Gaussian), which represents the distribution
of t, has integral 0.95:

dtS = function(x) dt(x-18.8,df=4)
BF=3.5; HF=2.5; graphF(dtS, 13,25, "brown")
                   

dt(x, df): density,  rt(n, df): generates random deviates  (analogous to  dnorm  and  rnorm)

• Consider the averages of two different data samples: we want to evaluate if their
difference is significant. Let's take an example (taken from a manual by Cavalli-Sforza).
I want to establish whether an anticancer preparation reduces the respiration of liver
cells. I measure (using a suitable unit of measurement) the breathing of the liver of 4
treated animals and 5 others, used as a control.

tr = c(339, 405, 302, 362)
co = c(401, 340, 461, 442, 361)
t.test(co,tr, var.equal=TRUE)
# Two Sample t-test
# t = 1.5193, df = 7
# mean of x mean of y 
# 401 352

I got the index 1.52. The degrees of freedom (df) are 7 (3+4). The significance is
checked, similar to the χ2 test, by resorting to the dtS density of the t distribution.

g = 7; dtS = function(x) dt(x,df=g)
graphF(dtS, -5,5, "brown")
                   

I determine x such that the integral between -x and x is 0.95 (but I could choose 0.99
or 0.9 or ..., depending on the needs).

idtS = function(x) integral(dtS,-x,x); solution(idtS,0.95, 0,100)
# 2.364624

The value of t at the 95th percentile is 2.365. 1.52 is less than 2.365. It is not a
significant difference: it is risky to say that the preparation decreases the
respiration of liver cells.


• We also see the comparison of the averages of two data series: a group of people is
given a presumed anti-thermal and their temperatures are adjusted at the time of
administration (A) and after three hours (B). You want to see if the temperature
reduction is significantly different from zero.

A = c(38.3, 39.1, 40.2, 37.6, 38.9, 38.7)
B = c(37.2, 38.4, 38.6, 36.7, 38.2, 38.2)
t.test(A,B, paired=TRUE)
# Paired t-test
# t = 5.7279, df = 5

I got the Student t index 5.73. The degrees of freedom (df) are 5. I determine x such
that the integral between -x and x is 95%.

g = 5; dtS = function(x) dt(x,df=g)
idtS = function(x) integral(dtS,-x,x); solution(idtS,0.95, 0,100)
# 2.570582

I also determine x such that the integral between -x and x is 99%.

solution(idtS,0.99, 0,100)
# 4.032143

5.73 is greater than 2.57 and even 4.03. It is a significant difference. I can
consider the heat preparation to be effective.


• One last example. It can be shown that if X1, ..., XN are N independent measures
normally distributed, Mc is the sample mean, Mt the population total and σ is the sample
standard deviation, then t = (Mc-Mt)/σ·√N follows a Student law (with N-1 degrees of
freedom).  Assume that a certain quantity G has, with respect to a certain unit of
measurement, the value 2.4.  10 measurements are taken obtaining mean Mc=2.7 and σ=0.3.
Let us assume that measurement errors have a normal distribution (of unknown parameters).
Should the hypothesis on G be accepted or rejected if I assume a significance level of
0.05?
# 1-0.05 = 0.95
g = 10-1; dtS = function(x) dt(x,df=g)
idtS = function(x) integral(dtS,-x,x)
solution(idtS,0.95, 0,100)
# 2.262158  (the value of t at the 95th percentile)

The Student variable in our case has the value:

(2.7-2.4)/0.3*sqrt(10)
# 3.162278

I have to reject the theory. But if I had assumed a significance level of 0.01 I
would have accepted it:

solution(idtS,0.99, 0,100)
# 3.249836 (> 3.162278)


For further details see the help of R (type "help(t.test)" and "help(dt)") or WikiPedia
(English version).  Another test often used in the continuous case is that of
Kolmogorov-Smirnov (type in R "help (ks.test)".