---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
Use of  boxPlot  outliers (abnormal values).  Comparing different data

# The lengths of many broad beans (ie bean seeds) and the lenghts of many basil seeds
# collected by two 12-year-old class. The bean seeds (taken with photographs placed on
# a graph paper) are expressed in cm, the basil seeds (taken with a camcorder connected
# to a stereo microscope) are expressed in mm.

beans = c(
1.35,1.65,1.80,1.40,1.65,1.80,1.40,1.65,1.85,1.40,1.65,1.85,1.50,1.65,1.90,
1.50,1.65,1.90,1.50,1.65,1.90,1.50,1.70,1.90,1.50,1.70,1.90,1.50,1.70,2.25,
1.55,1.70,1.55,1.70,1.55,1.70,1.60,1.70,1.60,1.75,1.60,1.75,1.60,1.80,1.60,
1.80,1.60,1.80,1.60,1.80,1.00,1.55,1.70,1.75,1.30,1.55,1.70,1.75,1.40,1.60,
1.70,1.75,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40,
1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,
1.45,1.60,1.70,1.80,1.50,1.60,1.70,1.80,1.50,1.60,1.70,1.85,1.50,1.60,1.70,
1.85,1.50,1.60,1.75,1.90,1.50,1.60,1.75,1.90,1.50,1.65,1.75,1.90,1.55,1.65,
1.75,1.95,1.55,1.65,1.75,2.00,1.55,1.65,1.75,2.30,1.35,1.65,1.80,1.40,1.65,
1.80,1.40,1.65,1.85,1.40,1.65,1.85,1.50,1.65,1.90,1.50,1.65,1.90,1.50,1.65,
1.90,1.50,1.70,1.90,1.50,1.70,1.90,1.50,1.70,2.25,1.55,1.70,1.55,1.70,1.55,
1.70,1.60,1.70,1.60,1.75,1.60,1.75,1.60,1.80,1.60,1.80,1.60,1.80,1.60,1.80,
1.00,1.55,1.70,1.75,1.30,1.55,1.70,1.75,1.40,1.60,1.70,1.75,1.40,1.60,1.70,
1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60,
1.70,1.80,1.40,1.60,1.70,1.80,1.40,1.60,1.70,1.80,1.45,1.60,1.70,1.80,1.50,
1.60,1.70,1.80,1.50,1.60,1.70,1.85,1.50,1.60,1.70,1.85,1.50,1.60,1.75,1.90,
1.50,1.60,1.75,1.90,1.50,1.65,1.75,1.90,1.55,1.65,1.75,1.95,1.55,1.65,1.75,
2.00,1.55,1.65,1.75,2.30
)
basil <- c(
1.996646,2.427837,2.002445,2.032486,2.440977,2.179811,1.827547,2.122749,2.273763,
2.237457,2.234695,2.416860,1.855254,2.141668,2.274085,2.148191,2.188731,2.279401,
1.861674,2.148191,2.277117,1.907743,2.151697,2.149251,1.874470,2.149251,2.279401,
1.885252,2.309115,2.479710,1.883268,2.151697,2.302933,1.979976,2.353246,2.231072,
1.885252,2.176491,2.309115,1.861674,2.274085,2.336312,1.891458,2.178452,2.335834,
2.072091,2.302933,2.196575,1.907743,2.179811,2.336312,2.141668,2.273763,2.194292,
1.943342,2.181266,2.339914,2.348716,2.574592,1.967000,2.188731,2.348716,2.208185,
2.277117,1.975734,2.194292,2.353246,1.943342,2.238444,1.979976,2.196575,2.395220,
2.098704,2.482356,1.996646,2.204940,2.406590,2.204940,2.458355,2.002445,2.205823,
2.416860,1.883268,2.667822,2.015793,2.208185,2.427837,2.015793,2.457101,2.016699,
2.224770,2.440977,1.855254,2.395220,2.032486,2.226911,2.457101,2.052005,2.176491,
2.033379,2.231072,2.458355,2.104753,2.178452,2.045551,2.232692,2.459751,2.335834,
2.339914,2.052005,2.234695,2.479710,2.122749,2.033379,2.069424,2.237457,2.482356,
1.967000,1.975734,2.072091,2.238444,2.574592,2.267303,2.205823,2.098704,2.267303,
2.667822,2.232692,2.226911,2.104753,1.891458,2.406590,2.045551,1.827547,2.069424,
2.459751,1.874470,2.181266,2.224770,2.016699,2.602342,1.980298,2.414356,2.156164,
1.944474,2.176403,2.381037,2.665530,2.282354,1.971069,2.178466,2.389039,2.403857,
2.176403,1.980298,2.222064,2.400560,2.441692,2.256341,2.005266,2.233202,2.403857,
2.400560,2.301457,2.047079,2.256341,2.414356,2.279626,2.222064,2.063478,2.257275,
2.441692,2.389039,2.293673,2.073464,2.264273,2.441809,2.663396,2.063478,2.075890,
2.279626,2.501192,2.575395,2.264273,2.080611,2.282354,2.524316,1.944474,2.112226,
2.097085,2.288252,2.575395,2.047079,2.178466,2.112226,2.293673,2.602342,2.073464,
2.299742,2.141965,2.299742,2.654032,2.142679,2.305893,2.142679,2.301457,2.663396,
2.075890,2.144709,2.144709,2.305893,2.665530,2.080611,2.288252,2.156164,2.345941,
2.739415,2.163349,2.257275,2.161602,2.364831,2.762354,2.141965,2.161602,2.163349,
2.368598,2.005266,2.097085,1.971069,2.381037,2.345941,2.233202,2.739415,2.524316,
2.762354,2.364831,2.501192,2.654032,2.368598,2.441809
)
# How many are they?
length(beans); length(basil)
#    260          240
# I can have the box-plot without using histogram (see), by the command boxPlot:
boxPlot(beans)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#  1.000   1.550   1.650   1.659   1.750   2.300 
#    The brown dots are 5^ and 95^ percentiles 
#           The red dot is the mean 
 
# If I write  noMean=1; boxPlot(beans)  the mean is not shown.
 
boxPlot(basil)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#  1.828   2.079   2.226   2.230   2.356   2.762
#    The brown dots are 5^ and 95^ percentiles 
#           The red dot is the mean 
# The window is chosen automatically
                              
# An observation point that is distant from other observations is called outlier. It
# may indicate experimental error. With the command out(data, p1,p2) I can exclude data
# that are lower than the p1-th percentile or greater than the p2-th one.
beans2 = out(beans,5,5); basil2 = out(basil,5,5); length(beans2); length(basil2)
#                                                   246             224
boxPlot(beans2)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#  1.400   1.550   1.650   1.655   1.750   1.900 
#    The brown dots are 5^ and 95^ percentiles 
#           The red dot is the mean 
boxPlot(basil2)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#  1.891   2.103   2.226   2.225   2.341   2.602 
#    The brown dots are 5^ and 95^ percentiles 
#           The red dot is the mean 
         
# The command boxPlot use a standard window. You can change it with the commands Bbox
# and Hbox (the standard window corresponds to  Bbox<<- 4; Hbox<<- 1.4)
Bbox<<- 2.5; Hbox<<- 1.2; boxPlot(beans2); boxPlot(basil2)
#
# You can use affine command (see) to compare data in different scales, and boxAB (see)
# to choose the same box.
# Example: I want to compare the scores of two tests with votes one between 0 and 10,
# the other between 0 and 110, by transforming them between 0 and 100.
#
v1=c(2,3,5,6,7,8,9,3,7,8,3,5,6,9,10,5,7,8,5,7,10,8,6,4,6)
v2=c(21,32,55,61,70,86,104,31,75,88,37,51,67,96,110,54,73,89,53,77,98,81,62,47,66)
boxPlot(v1); boxPlot(v2)
V1 = affine(v1, 0,10,  0,100)
V2 = affine(v2, 0,110, 0,100)
boxAB=c(0,100); boxPlot(V1); boxAB=c(0,100); boxPlot(V2)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   20.0    50.0    60.0    62.8    80.0   100.0 
#    The brown dots are 5^ and 95^ percentiles 
#           The red dot is the mean 
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#  19.09   48.18   60.91   61.24   78.18  100.00 
#    The brown dots are 5^ and 95^ percentiles 
#           The red dot is the mean 
         
#
# If I want (losing any information on the distribution of scores), I can represent 
# (instead of the entire boxplot) only the placement of the median (figure below left)
# or the average (figure on the right), here appropriately transformed in the range
# between 1 and 6.
 
    
 
v1=c(2,3,5,6,7,8,9,3,7,8,3,5,6,9,10,5,7,8,5,7,10,8,6,4,6)
v2=c(21,32,55,61,70,86,104,31,75,88,37,51,67,96,110,54,73,89,53,77,98,81,62,47,66)
V1 = affine(v1, 0,12,   1,6)
V2 = affine(v2, 10,110, 1,6)
boxAB=c(1,6); medianPlot(V1); abovex("V1")
boxAB=c(1,6); medianPlot(V2); abovex("V2")
boxAB=c(1,6); meanPlot(V1);  abovex("V1")
boxAB=c(1,6); meanPlot(V2); abovex("V2")

Back