---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- Other tests. Continuation from here • When the number N of the data is small (N<10), instead to the calculation of the standard deviation of the mean, to better evaluate the confidence intervals a particular procedure called "Student's t" is used. Suppose 15, 17, 18, 19, 25 are the weights in grams of 5 mice. I want to find with the 95% confidence an interval for the value of the "true" average. We have already seen that using the Gasussian we could proceed as follows: mice = c(15,17,18,19,25); m = mean(mice); S = SdM(mice) c(m-2*S, m, m+2*S) # 15.42954 18.80000 22.17046 Using the Student t-test (for which we use the t.test command) we have: t.test(mice, conf.level = 0.95) # One Sample t-test # t = 11.1557, df = 4 # 95 percent confidence interval: # 14.12105 23.47895 # mean of x # 18.8 A slightly larger interval is obtained. The phrase "One Sample t-test" refers to the fact that Student's test is used here to study a single random variable. We will soon see that it can be used to study the relationship between two random variables. Without going into details, we observe only that "df" has the value "N-1" and that the values 14.1 and 23.5 obtained are the extremes of the interval (centered in 18.8) where the following function "dtS" (similar to the Gaussian), which represents the distribution of t, has integral 0.95: dtS = function(x) dt(x-18.8,df=4) BF=3.5; HF=2.5; graphF(dtS, 13,25, "brown") dt(x, df): density, rt(n, df): generates random deviates (analogous to dnorm and rnorm) • Consider the averages of two different data samples: we want to evaluate if their difference is significant. Let's take an example (taken from a manual by Cavalli-Sforza). I want to establish whether an anticancer preparation reduces the respiration of liver cells. I measure (using a suitable unit of measurement) the breathing of the liver of 4 treated animals and 5 others, used as a control. tr = c(339, 405, 302, 362) co = c(401, 340, 461, 442, 361) t.test(co,tr, var.equal=TRUE) # Two Sample t-test # t = 1.5193, df = 7 # mean of x mean of y # 401 352 I got the index 1.52. The degrees of freedom (df) are 7 (3+4). The significance is checked, similar to the χ2 test, by resorting to the dtS density of the t distribution. g = 7; dtS = function(x) dt(x,df=g) graphF(dtS, -5,5, "brown") I determine x such that the integral between -x and x is 0.95 (but I could choose 0.99 or 0.9 or ..., depending on the needs). idtS = function(x) integral(dtS,-x,x); solution(idtS,0.95, 0,100) # 2.364624 The value of t at the 95th percentile is 2.365. 1.52 is less than 2.365. It is not a significant difference: it is risky to say that the preparation decreases the respiration of liver cells. • We also see the comparison of the averages of two data series: a group of people is given a presumed anti-thermal and their temperatures are adjusted at the time of administration (A) and after three hours (B). You want to see if the temperature reduction is significantly different from zero. A = c(38.3, 39.1, 40.2, 37.6, 38.9, 38.7) B = c(37.2, 38.4, 38.6, 36.7, 38.2, 38.2) t.test(A,B, paired=TRUE) # Paired t-test # t = 5.7279, df = 5 I got the Student t index 5.73. The degrees of freedom (df) are 5. I determine x such that the integral between -x and x is 95%. g = 5; dtS = function(x) dt(x,df=g) idtS = function(x) integral(dtS,-x,x); solution(idtS,0.95, 0,100) # 2.570582 I also determine x such that the integral between -x and x is 99%. solution(idtS,0.99, 0,100) # 4.032143 5.73 is greater than 2.57 and even 4.03. It is a significant difference. I can consider the heat preparation to be effective. • One last example. It can be shown that if X1, ..., XN are N independent measures normally distributed, Mc is the sample mean, Mt the population total and σ is the sample standard deviation, then t = (Mc-Mt)/σ·√N follows a Student law (with N-1 degrees of freedom). Assume that a certain quantity G has, with respect to a certain unit of measurement, the value 2.4. 10 measurements are taken obtaining mean Mc=2.7 and σ=0.3. Let us assume that measurement errors have a normal distribution (of unknown parameters). Should the hypothesis on G be accepted or rejected if I assume a significance level of 0.05? # 1-0.05 = 0.95 g = 10-1; dtS = function(x) dt(x,df=g) idtS = function(x) integral(dtS,-x,x) solution(idtS,0.95, 0,100) # 2.262158 (the value of t at the 95th percentile) The Student variable in our case has the value: (2.7-2.4)/0.3*sqrt(10) # 3.162278 I have to reject the theory. But if I had assumed a significance level of 0.01 I would have accepted it: solution(idtS,0.99, 0,100) # 3.249836 (> 3.162278) For further details see the help of R (type "help(t.test)" and "help(dt)") or WikiPedia (English version). Another test often used in the continuous case is that of Kolmogorov-Smirnov (type in R "help (ks.test)".