#+ST ARTED: 09-Oct-2022
Basic terms
- Symmetry (probability): case where multiple view points are essentially same
Average (Arithmetic mean)
1 ⎛ N-1 ⎞
x̅ = ─── ⎜ ∑ xᵢ⎟
N ⎝ i=0 ⎠
There are many kinds of 'mean'-s.
Average is arithmetic mean.
https://www.cuemath.com/data/difference-between-average-and-mean/
Standard deviation
A measure of 'spread' of data.
aka σ, s, SD
⎡ 1 ⎛ N-1 ⎞⎤
σ² = ⎢─── ⎜ ∑ (xᵢ - μ)²⎟⎥
⎣ N ⎝ i=0 ⎠⎦
--------------------------
/ ⎡ 1 ⎛ N-1 ⎞⎤
σ = / ⎢─── ⎜ ∑ (xᵢ - μ)²⎟⎥
√ ⎣ N ⎝ i=0 ⎠⎦
σ² is variance.
To get the value of the same unit as the xᵢ values, we take the square root of σ², which is the standard deviation σ.
Mode
Most frequently occurring value.
Eg:
In
10, 23, 42, 23, 20, 24, 19, 39, 24, 28, 24
24 is mode.
DOUBT: What if there are multiple values which occur most frequently?
https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch11/mode/5214873-eng.htm
Median
The middle value when the values are arranged from smallest to largest.
From Britannica:
(mean, mode and median are) the three principal ways of designating the average value of a list of numbers.
DOUBT: How can we get average value from median or mode?
Regression
From Spiegelhalter's popsci book:
any process of fitting lines or curves to data
Difference (or error) of a point from the line: residual
Response variable:
- the variable whose values we wish to predict
- dependent on the explanatory variable
- usually plotted on y-axis
Explanatory variable:
- independent variable
- used to predict/explain value of response variable
- usually plotted on x-axis
The gradient/slope of the regression curve/line: regression coefficient
Statistical model
- Model built using available data.
- Could be used to predict further data points.
Errors
Type I error | False positive |
Type II error | False negative |
Algorithm performance
In a classification problem.
Error matrix aka confusion matrix.
- Percentage of true positives: sensitivity
- Percentage of true negatives: specificity
- Percentage correctly classified: accuracy
Markov process
A process where next state depends only the current state.
'Future is independent of the past' in some sense. ˡ
Monte Carlo methods
https://cermics.enpc.fr/~bl/Halmstad/monte-carlo/lecture-1.pdf
Law of large numbers ??
- There is strength in numbers. :-D
- https://en.wikipedia.org/wiki/Law_of_large_numbers
Gamma function
Generalization of factorial to non-integer argument
A function on positive real numbers (
Γ: ℝ⁺ → ℝ⁺
)- Can be extended to even complex numbers though (??)
Often used as normalizing constants for probability distributions like Chi-square and gamma.
Can be seen as a smooth curve on which all n! values lie for n ∈ ℕ (ie, interpolation)
Notation Γ is from the French mathematician Legendre
Γ(z) = (z-1) * Γ(z-1)
Another definitionʳ:
∞
Γ(z) = ∫(xᶻ⁻¹). e⁻ˣ. dx
0
—
For x ∈ ℕ, Γ(x) can be expressed in terms of factorial,
∀x ∈ ℕ,
Γ(x) = (x-1)!
—
Handy valuesʷ:
x | Γ(x) | Comment |
---|---|---|
1/2 | √π | |
1 | 1 | Γ(1) = 0! |
3/2 | √π/2 | |
-3/2 | 4√π/3 | |
2 | 1 | Γ(2) = 1! |
3 | 2 | Γ(3) = 2! |
More
- Range: Difference between largest and smallest values in the sample.
- ANOVA: Analysis of Variance
- ANCOVA: Analysis of covariance
- MANCOVA: Multivariate analysis of covariance
Geometric mean
Useful for data where growth/decline is exponential.