---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------# Let's see how to do some otherstatisticalanalysis of data.# Here is an alternative way to those seen ing_B.htmandg_C.htmto buildpies.# The extensions of the hill, mountain and plain areas of "Piemonte" expressed in tens # of square kilometers.First way:pie(c(770,1099,671)) # I add thenames: piemonte = c(770,1099,671) names(piemonte) = c("hil","mou","pla") pie(piemonte) # I choose thecolors: pie(piemonte, col=c("yellow","brown","green"))# If I want to keep the previous diagram I can do the next in a new window, which I can # open with the commanddev.new()# An alternative way to buildbarplot.First way:barplot(piemonte) # I reduce thespacebetween the columns and place agrid: barplot(piemonte, space=0)abline(h=c(200,400,600,800,1000),lty=3) # I add thecolors: barplot(piemonte, space=0, col=c("yellow","brown","green")) abline(h=c(200,400,600,800,1000),lty=3)# Here is the diagram with thepercentages:colors = c("yellow","brown","green") barplot(piemonte/sum(piemonte)*100,space=0,col=colors) abline(h=c(10,20,30,40),lty=3) ---------- ---------- ---------- ----------# The heights of some 13-year-old pupils (analyzed with commands already seen):pup= c(146,158,152,140,157,147,161,147,149,154,155,153,155,156,150,153,152,145)sort(pup)#[1] 140 145 146 147 147 149 150 152 152 153 153 154 155 155 156 157 158 161histogram(pup)#[the distance of the sides of the grid is 2.5 %]#Frequencies and percentage freq.:#1, 5, 6, 5, 1#5.56,27.78,33.33,27.78,5.56#For other statistics use the command morestat()morestat()#Min. 1st Qu. Median Mean 3rd Qu. Max.#140.0 147.5 152.5 151.7 155.0 161.0#The brown dots are 5^ and 95^ percentiles#The red dot is the meanHistogram(pup, 140,161, 3)#[the distance of the sides of the grid is 1.5 %]#Frequencies and percentage freq.:#1, 1, 3, 2, 5, 4, 2#5.56,5.56,16.67,11.11,27.78,22.22,11.11# What we obtain with the commandhist.hist(pup, right=FALSE) # "right=FALSE" tells the program that the intervals are[…,…).# If I addprobability=TRUEI obtain the figure on the right, in which the vertical # scale has changed: the probability densities are plotted (the relative frequencies # divided by the amplitude of the intervals) so that thehistogramhas atotal area#equal to 1[ we added also the option col="yellow" and the command BOX(); for the # third graph I type again hist(…, add=TRUE) with "add=TRUE" ]. Or:hist(pup, right=FALSE,seq(140,161, 3),cex.axis=0.8)# WithsummaryI obtain similar results to those obtained withmorestat()summary(pup)# With#Min. 1st Qu. Median Mean 3rd Qu. Max.#140.0 147.5 152.5 151.7 155.0 161.0statisticI obtain the boxplot, withstatisticsboth statistic(pup) statistics(pup)#Min. 1st Qu. Median Mean 3rd Qu. Max.#140.0 147.5 152.5 151.7 155.0 161.0# If I want to compare this data with:pup2 = c(146,158,152,140,157,147,161,147,149,154,155,153, 155,156,150,153,152,145,158,140,147,149,155,156,153,145, 141,156,158,151,161,148,157,159,160,162,149,163,152,149, 157,155) # I can use:BF=4;HF=1.2boxAB=c(140,163);statistics(pup)#Min. 1st Qu. Median Mean 3rd Qu. Max.#140.0 147.5 152.5 151.7 155.0 161.0boxAB=c(140,163);statistics(pup2)#Min. 1st Qu. Median Mean 3rd Qu. Max.#140.0 149.0 153.0 152.6 157.0 163.0# Tocompare the histogramsi can use:h1 = max(hist(pup,seq(140,164,3),plot=FALSE)$density); h1 # [1] 0.09259259 h2 = max(hist(pup2,seq(140,164,3),plot=FALSE)$density); h2 # [1] 0.07142857 h = max( c(h1,h2) ); h # [1] 0.09259259 # the maximum of those that would be the heights of the two histograms BF=4; HF=3 noBox=1; boX(140,165,0,h) S = seq(140,164,3) hist(pup, S,angle=45,density=15,probability=TRUE,add=TRUE,border=1,col="blue") hist(pup2,S,angle=135,density=15,probability=TRUE,add=TRUE,border=1,col="red")---------- ---------- ---------- ----------# I can introduce data values from the keyboard using the commandscan(file="",n=...)# where in place of...you have to put the data to read. Two examples:numbers = scan(file="", n=3); sum(numbers)#1: 3#2: 7#3: 5#Read 3 items#[1] 15# In the second example I introduce N and then N data (of which I calculate the mean):npup = scan(file="",n=1); altpup = scan(file="",n=npup); mean(altpup)---------- ---------- ---------- ----------#1: 7#Read 1 item#1: 154#2: 156#3: 163#4: 170#5: 161#6: 156#7: 159#Read 7 items#[1] 159.8571# Thedatacan be copied from the network and pasted into "R" or can bedownloadedfrom#the network. An example ("t_sec.txt" file in the R folder). I read a few lines (eg 4) # of a file (I read a few because it can be very long, eg contain a few thousand data):readLines("http://macosa.dima.unige.it/R/t_sec.txt",n=4)#[1] " measures (truncated to centisecond) of a period of 1 s timed by hand"#[2] "111"#[3] "103"#[4] "109"# I see that the data is integer and that there is 1 line to jump. I do it with the #skipcommand. The data is separated by an "end-of-line". I read them with the command # "scan" (if they were separated by, for example, ";" I would add)sep=";"data = scan("http://macosa.dima.unige.it/R/t_sec.txt", skip=1)#Read 47 items# Alternatively, I can examine the file from a browser, copy it and then use # scan("clipboard", skip=1)# I can also use (from File menu)Open(andScript/ApriScriptNewScript/NuovoScript): # seehere.# Thestrcommand displays the structure of the objects. I use it to examine "data"str(data)#num [1:47] 111 103 109 97 99 110 99 103 109 106 ...# They are 47. They aretruncatedto the integers. To make correct evaluations I need # to haveroundedvalues: I add 1/2.DATA = data+1/2 summary(DATA)#Min. 1st Qu. Median Mean 3rd Qu. Max.#68.50 96.50 98.50 99.86 108.00 129.50# Esaminandoli conhistogramo conhist:---------- ---------- ---------- ----------# If I have data classified in different amplitude ranges I can use thehistoclas# command: see. ---------- ---------- ---------- ----------# We cangenerate random numbers. For examplerunif(4, 1,10) # [I can obtain 9.139260 3.747637 3.847650 2.825768] # generates 4 random numbers "uniformly" distributed on the interval (1,10). # Whereas:RUNIF(20, 1,6) # [I can obtain 2 5 4 2 3 5 6 5 6 2 5 1 5 1 6 2 1 3 1 6 ] # generates 20 random integers "uniformly" distributed on {1,2,3,4,5,6}. # andRUNIF2(20, 1,6, c(2,1,1,1,1,0.5)) # generates 20 random integers distributed on {1,2,3,4,5,6} where 1 [2] has the chance # to get out double [half] compared to the chances of 2,3,4,5 (a die with face "1" # heavier than the face "6") # [I can obtain 5 3 1 4 2 5 5 2 4 6 1 1 2 1 2 2 1 2 2 1 ]# If I want generate random elements from ageneric setI can useRunIf: see here# Let's see the throwing, ten thousand times, of twobalanceddice:n = 10000; dice = RUNIF(n, 1,6) + RUNIF(n, 1,6)# The values have been put into "dice". Here are some:dice[1]; dice[1:10]#[1] 7#[1] 7 9 9 7 8 6 4 10 7 7hist(dice, seq(1.5,12.5,1),probability=TRUE,col="yellow") abline(h=c(0.05,0.1,0.15), lty=3)# The possible outcomes are 6*6 = 36; we assumed they are equally probable: everyone # has probability 1/36. For example, 3 can exit in 2 cases: 1 and 2, 2 and 1, so with # the probability: 2*1/36 = 0.0555…; and indeed with 10000 launches I got a percentage # close to 5.55…% The histogram on the right was obtained with theHistogramcommand:noClass=1; Histogram(dice, 1.5,12.5, 1)#Frequencies and percentage freq.:#277, 554, 857, 1084, 1442, 1663, 1337, 1126, 862, 532, 266#2.77,5.54,8.57,10.84,14.42,16.63,13.37,11.26,8.62,5.32,2.66# The percentage freq. of 3 is 5.54% (ie 0.0554).

# But if I roll the die (considered above) with the # face "1" heavier than the face "6": |

# A use of RUNIF to simulate the # Brownian motion of a particle # (the positions that the particle # occupies tend to be equidistributed). # See here. |