---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- Use of boxPlot outliers (abnormal values). Comparing different data # The lengths of many broad beans (ie bean seeds) and the lenghts of many basil seeds # collected by two 12-year-old class. The bean seeds (taken with photographs placed on # a graph paper) are expressed in cm, the basil seeds (taken with a camcorder connected # to a stereo microscope) are expressed in mm. beans = c( 1.35,1.65,1.80,1.40,1.65,1.80,1.40,1.65,1.85,1.40,1.65,1.85,1.50,1.65,1.90, 1.50,1.65,1.90,1.50,1.65,1.90,1.50,1.70,1.90,1.50,1.70,1.90,1.50,1.70,2.25, 1.55,1.70,1.55,1.70,1.55,1.70,1.60,1.70,1.60,1.75,1.60,1.75,1.60,1.80,1.60, 1.80,1.60,1.80,1.60,1.80,1.00,1.55,1.70,1.75,1.30,1.55,1.70,1.75,1.40,1.60, 1.70,1.75,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40, 1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80, 1.45,1.60,1.70,1.80,1.50,1.60,1.70,1.80,1.50,1.60,1.70,1.85,1.50,1.60,1.70, 1.85,1.50,1.60,1.75,1.90,1.50,1.60,1.75,1.90,1.50,1.65,1.75,1.90,1.55,1.65, 1.75,1.95,1.55,1.65,1.75,2.00,1.55,1.65,1.75,2.30,1.35,1.65,1.80,1.40,1.65, 1.80,1.40,1.65,1.85,1.40,1.65,1.85,1.50,1.65,1.90,1.50,1.65,1.90,1.50,1.65, 1.90,1.50,1.70,1.90,1.50,1.70,1.90,1.50,1.70,2.25,1.55,1.70,1.55,1.70,1.55, 1.70,1.60,1.70,1.60,1.75,1.60,1.75,1.60,1.80,1.60,1.80,1.60,1.80,1.60,1.80, 1.00,1.55,1.70,1.75,1.30,1.55,1.70,1.75,1.40,1.60,1.70,1.75,1.40,1.60,1.70, 1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60, 1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.45,1.60,1.70,1.80,1.50, 1.60,1.70,1.80,1.50,1.60,1.70,1.85,1.50,1.60,1.70,1.85,1.50,1.60,1.75,1.90, 1.50,1.60,1.75,1.90,1.50,1.65,1.75,1.90,1.55,1.65,1.75,1.95,1.55,1.65,1.75, 2.00,1.55,1.65,1.75,2.30 ) basil <- c( 1.996646,2.427837,2.002445,2.032486,2.440977,2.179811,1.827547,2.122749,2.273763, 2.237457,2.234695,2.416860,1.855254,2.141668,2.274085,2.148191,2.188731,2.279401, 1.861674,2.148191,2.277117,1.907743,2.151697,2.149251,1.874470,2.149251,2.279401, 1.885252,2.309115,2.479710,1.883268,2.151697,2.302933,1.979976,2.353246,2.231072, 1.885252,2.176491,2.309115,1.861674,2.274085,2.336312,1.891458,2.178452,2.335834, 2.072091,2.302933,2.196575,1.907743,2.179811,2.336312,2.141668,2.273763,2.194292, 1.943342,2.181266,2.339914,2.348716,2.574592,1.967000,2.188731,2.348716,2.208185, 2.277117,1.975734,2.194292,2.353246,1.943342,2.238444,1.979976,2.196575,2.395220, 2.098704,2.482356,1.996646,2.204940,2.406590,2.204940,2.458355,2.002445,2.205823, 2.416860,1.883268,2.667822,2.015793,2.208185,2.427837,2.015793,2.457101,2.016699, 2.224770,2.440977,1.855254,2.395220,2.032486,2.226911,2.457101,2.052005,2.176491, 2.033379,2.231072,2.458355,2.104753,2.178452,2.045551,2.232692,2.459751,2.335834, 2.339914,2.052005,2.234695,2.479710,2.122749,2.033379,2.069424,2.237457,2.482356, 1.967000,1.975734,2.072091,2.238444,2.574592,2.267303,2.205823,2.098704,2.267303, 2.667822,2.232692,2.226911,2.104753,1.891458,2.406590,2.045551,1.827547,2.069424, 2.459751,1.874470,2.181266,2.224770,2.016699,2.602342,1.980298,2.414356,2.156164, 1.944474,2.176403,2.381037,2.665530,2.282354,1.971069,2.178466,2.389039,2.403857, 2.176403,1.980298,2.222064,2.400560,2.441692,2.256341,2.005266,2.233202,2.403857, 2.400560,2.301457,2.047079,2.256341,2.414356,2.279626,2.222064,2.063478,2.257275, 2.441692,2.389039,2.293673,2.073464,2.264273,2.441809,2.663396,2.063478,2.075890, 2.279626,2.501192,2.575395,2.264273,2.080611,2.282354,2.524316,1.944474,2.112226, 2.097085,2.288252,2.575395,2.047079,2.178466,2.112226,2.293673,2.602342,2.073464, 2.299742,2.141965,2.299742,2.654032,2.142679,2.305893,2.142679,2.301457,2.663396, 2.075890,2.144709,2.144709,2.305893,2.665530,2.080611,2.288252,2.156164,2.345941, 2.739415,2.163349,2.257275,2.161602,2.364831,2.762354,2.141965,2.161602,2.163349, 2.368598,2.005266,2.097085,1.971069,2.381037,2.345941,2.233202,2.739415,2.524316, 2.762354,2.364831,2.501192,2.654032,2.368598,2.441809 ) # How many are they? length(beans); length(basil) # 260 240 # I can have the box-plot without using histogram (see), by the command boxPlot: boxPlot(beans) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 1.000 1.550 1.650 1.659 1.750 2.300 # The brown dots are 5^ and 95^ percentiles # The red dot is the mean # If I write noMean=1; boxPlot(beans) the mean is not shown. boxPlot(basil) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 1.828 2.079 2.226 2.230 2.356 2.762 # The brown dots are 5^ and 95^ percentiles # The red dot is the mean # The window is chosen automatically # An observation point that is distant from other observations is called outlier. It # may indicate experimental error. With the command out(data, p1,p2) I can exclude data # that are lower than the p1-th percentile or greater than the p2-th one. beans2 = out(beans,5,5); basil2 = out(basil,5,5); length(beans2); length(basil2) # 246 224 boxPlot(beans2) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 1.400 1.550 1.650 1.655 1.750 1.900 # The brown dots are 5^ and 95^ percentiles # The red dot is the mean boxPlot(basil2) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 1.891 2.103 2.226 2.225 2.341 2.602 # The brown dots are 5^ and 95^ percentiles # The red dot is the mean # The command boxPlot use a standard window. You can change it with the commands Bbox # and Hbox (the standard window corresponds to Bbox<<- 4; Hbox<<- 1.4) Bbox<<- 2.5; Hbox<<- 1.2; boxPlot(beans2); boxPlot(basil2) # # You can use affine command (see) to compare data in different scales, and boxAB (see) # to choose the same box. # Example: I want to compare the scores of two tests with votes one between 0 and 10, # the other between 0 and 110, by transforming them between 0 and 100. # v1=c(2,3,5,6,7,8,9,3,7,8,3,5,6,9,10,5,7,8,5,7,10,8,6,4,6) v2=c(21,32,55,61,70,86,104,31,75,88,37,51,67,96,110,54,73,89,53,77,98,81,62,47,66) boxPlot(v1); boxPlot(v2) V1 = affine(v1, 0,10, 0,100) V2 = affine(v2, 0,110, 0,100) boxAB=c(0,100); boxPlot(V1); boxAB=c(0,100); boxPlot(V2) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 20.0 50.0 60.0 62.8 80.0 100.0 # The brown dots are 5^ and 95^ percentiles # The red dot is the mean # Min. 1st Qu. Median Mean 3rd Qu. Max. # 19.09 48.18 60.91 61.24 78.18 100.00 # The brown dots are 5^ and 95^ percentiles # The red dot is the mean # # If I want (losing any information on the distribution of scores), I can represent # (instead of the entire boxplot) only the placement of the median (figure below left) # or the average (figure on the right), here appropriately transformed in the range # between 1 and 6. v1=c(2,3,5,6,7,8,9,3,7,8,3,5,6,9,10,5,7,8,5,7,10,8,6,4,6) v2=c(21,32,55,61,70,86,104,31,75,88,37,51,67,96,110,54,73,89,53,77,98,81,62,47,66) V1 = affine(v1, 0,12, 1,6) V2 = affine(v2, 10,110, 1,6) boxAB=c(1,6); medianPlot(V1); abovex("V1") boxAB=c(1,6); medianPlot(V2); abovex("V2") boxAB=c(1,6); meanPlot(V1); abovex("V1") boxAB=c(1,6); meanPlot(V2); abovex("V2") Back