```---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
Examples of distributions

# References for the most common probability distributions are available here:
help("distribution")
#
# I toss up a coin 10 times. What is the law of distribution of the random variable:
# N = "number of heads" ?
# Experimental study:
n = 10000; da = NULL; for (i in 1:n) da[i]=sum(RUNIF(10, 0,1))
hist(da, seq(-0.5,10.5, 1),probability=TRUE,col="grey",main="",cex.axis=0.9)
# If I want the histogram on the right I insert in hist(...)  ylim=c(0,0.3)

statistics(da)                                      # I get the boxplot and:
#  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 0.000   4.000   5.000   4.983   6.000  10.000 # The theoretical study (for choose see):
N=10; da = NULL; for (i in 0:N) da = c(da,rep(i,choose(10,i)))
hist(da, seq(-0.5,10.5, 1),probability=TRUE,col="grey",main="",cex.axis=0.9)
summary(da)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#      0       4       5       5       6      10
# As index of the data dispersion I can take 3^quartile-1^quartile, ie:
Percentile(da,75)-Percentile(da,25)
 2
# or the standard deviation of "da" (see here)
Sd(da)
# 1.581139
# Using Var (= √Sd) I see that Sd(da) √2.5
Var(da)
# 2.5                   It is demonstrable that Var is √N/2
# I recall that these are weighted values. I can proceed also in this way (see here):
j=0:10; Wmean(j,choose(10,j)); WVar(j,choose(10,j))
#            5                    2.5

# This is an example of binomial distribution  Pr(N=k) = C(n,k)*p^k*(1–p)^(n-k)
# when p = 1/2. In general:
binom = function(m) choose(N,m)*p^m*(1-p)^(N-m)

# An example with p ≠ 1/2.
# An automatic oven produces on average 1 biscuit burnt every 8; biscuits are then
# automatically mixed and packaged in packs of 6. What is the distribution law of the
# random variable  N = "n. of burnt cookies in a pack" ?
# It is a discrete probability distribution.
N = 6; p = 1/8; x = 0:N; BC = binom(0:N)
Plane(0,N, 0,max(BC) ); POINT(x,BC, "blue")
# For a better view we add these commands and obtain the following graph, on the left:
polyl(x,BC, "blue"); segm(x,0, x,BC, "blue") Wmean(x,BC); WVar(x,BC)
# 0.75      0.65625
# I can prove that I obtain the same valued with:
N*p; N*p*(1-p)
# 0.75  0.65625

# To understand this it is useful to interpret BC as X1+…+X6 where Xi is 1 with proba-
# bility 1/8, is 0 otherwise.  With this simulation we obtain the graph on the right.
n=10000; da=NULL; N=6
for(i in 1:n) {X=0; for(j in 1:N) if(runif(1, 0,1)<1/8) X=X+1; da[i]=X}
hist(da, seq(-0.5,N+1/2, 1),probability=TRUE,col="grey",main="",cex.axis=0.9)
BOX(); abovex("burnt cookies in a pack")

# It's clear why the mean is N*p: if every Xi is 1 with probability p X1+… has mean N*p

# NOTE: y = dbinom(x, N, p) is equivalent to  x=0:N; ; binom = ...; y = binom(0:N)
# See here for an example of the multinomial distribution.

# When N is large (as in the following example) I can approximate the binomial
# distribution with the gaussian distribution, that we recall dawn.
# The following graph, on the left, is for N=20: p=0.2; N=20
x=0:N; y=binom(0:N); Plane(0,N, 0,max(y))
POINT(x,y, "blue"); polyl(x,y, "blue"); segm(x,0, x,y, "blue")
m=N*p; v=N*p*(1-p); s=sqrt(v)
# 4      3.2        1.788854

# The distribution rule called normal or gaussian has the following density function f;
# it depends only on 2 parameters, the mean m and σ, called the standard deviation of f. #      (σ is the Greek letter "sigma", which corresponds to our "s")

# In R the normal density of mean m and standard deviation s is called
# dnorm(x, mean=m, sd=s). How to obtain the graph above on the right:

z = function(x) dnorm(x, mean=m, sd=s)
x=0:N; y=binom(0:N); Plane(0,N, 0,max(y))
POINT(x,y, "blue")
graph2(z, -20,20, "brown")

# Among the few examples of gaussian phenomenon (in addition to those of the distribu-
# tion of seed lengths already considered - see here), there is the height of the people
# of a particular sex, age, place of origin. Let's use the data already considered here
# (medical examination for a call-up - Italy, 1987):

str(da)
#  'data.frame':   4170 obs. of  4 variables:
#   \$ chest : int  91 86 89 88 92 95 91 83 87 88 ...
#   \$ weight: int  62 64 64 68 70 77 65 62 55 60 ...
#   \$ region: int  15 15 15 15 15 15 15 15 15 15 ...
#   \$ height: int  73 70 74 79 66 69 66 79 75 65 ...
# We add 1/2 to truncated data, so we do not have to add 1/2 to the average values
da\$chest = da\$chest+1/2; da\$weight = da\$weight+1/2; da\$height = da\$height+1/2
H = da\$height; min(H); max(H)
#              47.5    96.5
m=mean(H); s=Sd(H)
z = function(x) dnorm(x, mean=m, sd=s)
BF=3;HF=2.1
graph2F(z, 47,97, "brown")
graph2(z,47,97, "brown"); abovex("data")
POINT(m+s,z(m+s),"blue"); POINT(m-s,z(m-s),"blue")
# The data fit in the graph: see below on the left.
# I can simulate the phenomenon with rnorm(n=N, mean=m, sd=s) that generates N random
# data with normal distribution; obviously also the simulation fit (graph on the right):
#
H1 = rnorm(n=4000, mean=m, sd=s)
graph2F(z, 47,97, "brown")
graph2(z,47,97, "brown"); abovex("random")
POINT(m+s,z(m+s),"blue"); POINT(m-s,z(m-s),"blue") # We pointed out (in blue) in the graphs also m ± σ. One can prove that these points
# are flex points. In general, whatever the gaussian, we have:
integral(z,   m-s,m+s)        #  0.6826895
integral(z, m-s*2,m+s*2)      #  0.9544997
integral(z, m-s*3,m+s*3)      #  0.9973002
# and obviously:
integral(z,  -Inf,Inf)        #  1

# For which s the integral is 0.99?
m=0; s=1; F=function(x) integral(z, -x,x); solution(F,0.99, 0,100)        # 2.575829

# The variable "height" is virtually continuous: unlike the number of coins or biscuits
# which only assumed integer values, I can think that a length is a real number.
# Increasing the number of intervals the upper edge of the histogram tends to a curve;
# instead of the sum of the areas of the columns of a histogram between x=a and x=b, I
# have the integral of a function between x=a and x=b.
# In the discrete case I represent the frequency density (ie the unitary percentage
# frequency), in the continuous one I represent the density function.

---------- ---------- ---------- ----------

# We note that x1k, x2k, x3k, … (if k ≠ 1) have a different kind of distribution of x1,
# x2, x3, …   An example:  x with gaussian distribution and x^3.

x <- rnorm(n=10000,mean=5,sd=1); x3=x^3
Plane(min(x),max(x),0,max(hist(x,plot=FALSE)\$density))
statistics(x)
# median = 5.005     mean = 5.002
Plane(min(x3),max(x3),0,max(hist(x3,plot=FALSE)\$density))
statistics(x3)
# median = 125.400   mean = 140.200
x x^3

# x -> x^3 is a growing function; so the median of x^3 is the cube of the median of x;
# this does not happen for the mean!

---------- ---------- ---------- ----------

# Generalizing what we just saw (here), we have:
# if I sum up N random variables that have the same distribution, I obtain a random
# variable that (as N -> ∞) tends toward a gaussian distribution. An example:
tot = 2000; x = NULL               #                N = 1
for (i in 1:tot) x[i] = sqrt(runif(1))+3*runif(1)^2
dev.new(); hist(x, probability=TRUE, col="grey",main="",cex.axis=0.9)
BOX(); abovex("n=1")
#                                                   N = 20
for (i in 1:tot) {x[i]=0; for (j in 1:20) x[i]=x[i]+sqrt(runif(1))+3*runif(1)^2}
dev.new(); hist(x, probability=TRUE, col="grey",main="",cex.axis=0.9)
BOX(); abovex("n=20") # That's the reason why the gaussian distribution is important:  there are a few
# gaussian phenomena, but if I calculate the mean of N values with the same
# distribution (ie if I calculate their sum divided for N), this mean has a gaussian
# distribution.  This fact goes under the name of central limit theorem:

# If U1,…,Un are random indipendent variables with the same law of distribution, and if
# m and σ0 are its mean and standard deviation, when n grows the variable (U1+ … +Un)/n
# tends to have a normal distribution with mean m and standard deviation σ0/√n.

# Let's consider as an example of non-Gaussian distribution the weights of the
# population of twenty years considered above.
da\$chest = da\$chest+1/2; da\$weight <- da\$weight+1/2; da\$height <- da\$height+1/2
W = da\$weight; min(W); max(W)
#              48.5    126.5
BF=3.5; HF=2.5
max(hist(W, probability=TRUE, col="grey",main="",cex.axis=0.9)\$density)
#   0.03856115
Plane(48,127,0,0.04)
abovex("weight - call-up - Italy, 1987"); POINT(mean(W),0,"blue") length(W); Sd(W)/sqrt(length(W))
#  4170     0.1630721
E=3*Sd(W)/sqrt(length(W)); mean(W)-E; mean(W)+E
#           70.48752   71.46595
# The mean weight of the twenty years italian men was 71 ± 0.5 kg

# But the mean and the standard deviation that we considered are not the mean and the
# standard deviation of the population of all the twenty years italian men (in 1987),
# they are those of the sample. It can be proved that, while the statistical mean is a
# good approximation of the theoretical one, I have to consider (insted of the
# theoretic standard deviation, Sd) the experimental standard deviation sd, where n is
# replaced by n-1:
#                 Sd  is  √( ( (X1-M)^2 + … + (Xn-M)^2 ) / n )
#                 sd  is  √( ( (X1-M)^2 + … + (Xn-M)^2 ) / (n-1) )  [ > Sd]
# So, (U1+ … +Un)/n tends to have a normal distribution which have the experimental
# mean as a mean and sd/√n as a standard deviation. I can calculare this value with
# SdM (standard deviation of the mean).  See here
# When n is large the difference between the two values is negligible:

Sd(W)/sqrt(4170); sd(W)/sqrt(4170); SdM(W)
#  0.1630721       0.1630916        0.1630916
# It is due to the replacement of √n with √(n-1):
n=4170; sqrt(n/(n-1)); n=20; sqrt(n/(n-1)); n=10; sqrt(n/(n-1))
#       1.00012              1.025978             1.054093

# Therefore, more correct calculations in our case would be:
E=3*SdM(W); mean(W)-E; mean(W)+E
#           70.48746   71.46601

# To be precise we should say: there is the probability of confidence of 99.73% that
# [70.48746,71.46601] contains the exact value. If we use E=2*SdM(W) we should say
# the probability of confidence is 95.45%. The interval is called a confidence interval.

---------- ---------- ---------- ----------

# In the discrete case the probability that U is between a and b is Pr(U=a)+…+Pr(U=b);
# in the continuous case the probability that the density function U is between a and b
# (where a and b may be infinite) is ∫[a,b] U(x) dx.  In the discrete case if U has values
# in {v1,v2,…} the mean is v1·Pr(U=v1) + v2·Pr(U=v2) + …;  in the continuous case if U
# varies in I it is ∫I x·U(x) dx;  so, if m is the mean, the variance is ∫I (x-m)²·U(x) dx

# If I sum up only two random variables uniformly distributed on [0,1] I have (just
# like we saw here for the launch of two balanced dice):
HF=2; BF=4
PLANE(0,2, 0,1)
n = 1e6; U1 = runif(n,0,1); U2 = runif(n,0,1)
hist(U1+U2, probability=TRUE, col="grey90", add=TRUE, nclass=40)   # or:  n=40
# the density function (from x=0 to x=2) is:
f = function(x) -abs(x-1)+1
graph1(f, 0,2, "blue"); abovex("runif+runif")
# and (as it should be) I have:
integral(f, 0,2)
# 1
# and the mean:
g = function(x) x*f(x); integral(g, 0,2)
# 1
# The graph of the probability that U1+U2 is less than x (see here):
Gintegra(f,0,2,"black") # This function (the function with black graph) is the cumulative distribution function
# of U1+U2.

# In the case of the gaussian distribution I have:

z = function(x) dnorm(x, mean=0, sd=1)
HF=2; BF=3
graphF(z, -4,4, "blue")
# On the right the graph of the cumulative distribution function:
Plane(-4,4, 0,1)
Gintegra(z,-10,10, "brown")
# I should have drawn the graph of ∫(-∞,x] z, but
# in Gintegra(f,A,B,col) I must use A and B finite. However ∫(-10,x] z is practically the
# same; indeed:
integral(z, -Inf,-10)
# 7.619853e-24 ---------- ---------- ---------- ----------

# The examples of simulation considered here employ a procedure that is based on
# central limit theorem and the methods for studying the binomial distribution
# considered at the beginning of this document, but merely compute the mean and the
# standard deviation  [ in this case N=1 and Var=p*(1-p) ]

# Let's remember the first example:
# the probability that throwing a pair of balanced dice their difference is 1
# (we - really - know it is 5/18 = 0.2777…)

Event <- function() abs( RUNIF(1, 1,6)-RUNIF(1, 1,6) ) == 1

PR(1e4)
# 27.9 +/- 1.34559
PR(1e5)
# 27.737 +/- 0.4247283
PR(1e6)
# 27.8421 +/- 0.1344666

# As n -> ∞ the standard deviation σ/√n -> 0 as 1/√n: the "error" of PR(N) is divided
# by 10 if N multiplies by 100.

# It is basically for this reason that (while analyzing statistical data) if the number
# of data is multiplied by 100 I can round the average with a further figure.
# If I have at least 10 data which are rounded to the thents I can round the mean to
# the cents.  If I have at least 1000 data which are rounded to the thents I can round
# the mean to the thousandths.  If I have at least 10 data which are rounded to the
# units I can round the mean to the tens (see).  Verification:

# Some data rounded to the units:
data=runif(11, 50,100);     c( mean(data),  mean(round(data)) )
# 77.32430  77.27273
data=runif(101, 50,100);    c( mean(data),  mean(round(data)) )
# 75.86425  75.83168
data=runif(1001, 50,100);   c( mean(data),  mean(round(data)) )
# 74.99541  74.99800
data=runif(10001, 50,100);  c( mean(data),  mean(round(data)) )
# 75.18941 75.18628
data=runif(100001, 50,100); c( mean(data),  mean(round(data)) )
# 74.94769  74.94847

# Data must be rounded!!!  See here.  See here too.

Other distributions

Other examples of use

```