Statistics and probability

Basic terms

Symmetry (probability): case where multiple view points are essentially same
- https://math.stackexchange.com/questions/2062947/what-is-symmetry

Average (Arithmetic mean)

     1  ⎛ N-1   ⎞
x̅ = ─── ⎜  ∑  xᵢ⎟
     N  ⎝ i=0   ⎠

There are many kinds of 'mean'-s.

Average is arithmetic mean.

https://www.cuemath.com/data/difference-between-average-and-mean/

Standard deviation

A measure of 'spread' of data.

aka σ, s, SD

         ⎡ 1  ⎛ N-1          ⎞⎤
σ² =     ⎢─── ⎜  ∑  (xᵢ - μ)²⎟⎥
         ⎣ N  ⎝ i=0          ⎠⎦



       --------------------------                    
      /  ⎡ 1  ⎛ N-1          ⎞⎤
σ =  /   ⎢─── ⎜  ∑  (xᵢ - μ)²⎟⎥
    √    ⎣ N  ⎝ i=0          ⎠⎦

σ² is variance.

To get the value of the same unit as the xᵢ values, we take the square root of σ², which is the standard deviation σ.

Mode

Most frequently occurring value.

Eg:

In

10, 23, 42, 23, 20, 24, 19, 39, 24, 28, 24

24 is mode.

DOUBT: What if there are multiple values which occur most frequently?

https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch11/mode/5214873-eng.htm

Median

The middle value when the values are arranged from smallest to largest.

From Britannica:

(mean, mode and median are) the three principal ways of designating the average value of a list of numbers.

DOUBT: How can we get average value from median or mode?

Regression

From Spiegelhalter's popsci book:

any process of fitting lines or curves to data

Difference (or error) of a point from the line: residual

Response variable:

the variable whose values we wish to predict
dependent on the explanatory variable
usually plotted on y-axis

Explanatory variable:

independent variable
used to predict/explain value of response variable
usually plotted on x-axis

The gradient/slope of the regression curve/line: regression coefficient

Statistical model

Model built using available data.
Could be used to predict further data points.

Errors

Type I error	False positive
Type II error	False negative

Algorithm performance

In a classification problem.

Error matrix aka confusion matrix.

Percentage of true positives: sensitivity
Percentage of true negatives: specificity
Percentage correctly classified: accuracy

Markov process

A process where next state depends only the current state.

'Future is independent of the past' in some sense. ˡ

Probability distribution models

Poisson distribution

Same event that happens multiple times over a time interval.

Discrete
Parameter: mean/expectation (λ)

Probability of k events (probability density/mass function):

        λᵏ.e⁻ᵏ
P(k) = ───────
         k!

Poisson density function is not continuous. It's defined only for integer values of k.

See: https://brilliant.org/wiki/poisson-distribution/

Gamma distribution

Parameters
- Shape parameter k
- Rate parameter θ

DBT: Mean is past the midpoint in the graph always??

Normal distribution

aka Gaussian distribution
Bell shaped curve
Continuous ??

Categorical distribution

Discrete
Doesn't have anything to do with category theory
Value of random variable is from one among a set of predefined categories ʳ
Each category has an associated probability
A generalization of Bernoulli distribution ?? ʳ

Examples:

Rolling of a 6-faced die.
- Each category has equal probability: 1/6
Coin toss
Selecting marbles from an urn ʳ
- Suppose urn has 5 red, 3 green and 2 blue marbles
- Probabilities of categories are like:
  - Red: 5/10
  - Green: 3/10
  - Blue: 2/10

Exponential
- For between 2 occurrences of a same event??
Erlang
Binomial
Bernoulli
Uniform distribution

Bayes' theorem

Derivationʳ:

P(A ∩ B) is the probability of A times probablity of B given that A has already happened.

P(A ∩ B) = P(A) * P(B/A)

It could also be defined as the probability of B times probablity of A given that B has already happened.

P(A ∩ B) = P(B) * P(A/B)

Equating the two,

P(A) * P(B/A) = P(B) * P(A/B)


                 P(A) * P(B/A)
 =>    P(A/B) = ──────────────
                     P(B)

Monte Carlo methods

https://cermics.enpc.fr/~bl/Halmstad/monte-carlo/lecture-1.pdf

Law of large numbers ??

There is strength in numbers. :-D
https://en.wikipedia.org/wiki/Law_of_large_numbers

Gamma function

Generalization of factorial to non-integer argument
A function on positive real numbers (Γ: ℝ⁺ → ℝ⁺)
- Can be extended to even complex numbers though (??)
Often used as normalizing constants for probability distributions like Chi-square and gamma.
Can be seen as a smooth curve on which all n! values lie for n ∈ ℕ (ie, interpolation)
Notation Γ is from the French mathematician Legendre

Γ(z) = (z-1) * Γ(z-1)

Another definitionʳ:

       ∞
Γ(z) = ∫(xᶻ⁻¹). e⁻ˣ. dx
       0

—

For x ∈ ℕ, Γ(x) can be expressed in terms of factorial,

∀x ∈ ℕ,
  Γ(x) = (x-1)!

—

Handy valuesʷ:

x	Γ(x)	Comment
1/2	√π
1	1	Γ(1) = 0!
3/2	√π/2
-3/2	4√π/3
2	1	Γ(2) = 1!
3	2	Γ(3) = 2!

Central limit theorem

Given a collection of points (of any probability distribution. Need not be normal), if we select k number of points repeatedly with replacement (ie, the k points are considered to be 'put back' after each trial), the mean value of the trials will be normally distributed.

See:

Range: Difference between largest and smallest values in the sample.

Geometric mean

Useful for data where growth/decline is exponential.