2019: PP: Sam Staton
Pre?
Bayesian statistics
From here:
an approach to data analysis based on Bayes' theorem, where knowledge about parameters in a statistical model is updated with the information in observed data.
- From Thomas Bayes (1701-1761), an English statistician.
- Based on the 'Bayesian interpretation of probability' ʷ
- ie, probability indicates the 'degree of belief in an event' ʷ
- Compute and update probabilities after encountering new data.
Bayes law:
P(A/B) is probability of A occuring provided B happens (conditional probability).
P(A)
P(A/B) = ────── * P(B/A)
P(B)
Posterior probability ∝ Likelihood * Prior
- Prior: Prior information.
- Background knowledge.
- A distribution
- Likelihood: Observed data
- Posterior probability
- Updated probability distribution based on prior and likelihood ??
Posterior can be used to make predictions.
—
P(d/x) * P(x)
P(x/d) = -----------------------
a normalizing constant
P(d/x) * P(x)
= -----------------------
a normalizing constant
(P(x/d)
is P(x)
given data d
)
What could the normalizing constant be?
It's a big sum (TODO: But why and how ??). ie, intergral.
⌠
│ P(d/y) * P(y) * dy
⌡
Probabilistic programming (PP)
Sampling |
Prior |
Observation / scoring |
Likelihood |
normalize / simulate |
Posterior probability |
An example PP program:
Misc
Poission distribution
- A discrete probability distribution
- Probability of an event happening n number of times within a given interval of time or space
- TODO: What could interval of space mean here??
- Has only one parameter: mean number of events ʳ
- Can be obtained as an approximation of a binomial distribution when the number of trials n is large, success probability p is small and np is finite ʳ
Lecture 1
Bus weekend example
On an average:
- Weekend => 2 buses in an hour
- Weekday => 10 buses in an hour
Saw 4 buses.
Is it a weekend?
Let's consider the probability of it being a weekend.
x
Buses showing up.
- Same event happening multiple times within a time period.
- Matches description of a Poisson distribution.
Example languages:
- Hakaru: Indiana
- PSI: ETH Zürich
—
Methods of running a probabilistic program
- 'Direct calculation'
- 'Simulation method'
—
:Side-note:
- Definitional interpretor
- By John Reynolds
- Describing a programming language by defining an interpreter for it
- Focus is on meaning, not efficiency
- Dana Scott and Christopher Strachey proposed denotational semantics
- TODO: What's the difference from definitional interpreter?
- Denotational semantics: Describes what a program does by associating it with a mathematical object
- Definitional interpretor: Describes what a program does by giving an interpretor giving that mathematical object ??
Simulation
Let's see two ways:
- Monte Carlo with rejection
- Weighted Monte Carlo
- aka importance sampling Monte Carlo
Monte Carlo with rejection
- Run the program numerous (a big number) times
- Make random choice on each run
Note: Likelihood and probability are not the same. The subtle difference is more clearly in the case of continuous.
TODO: How does one decide which probability distribution would be best suited to a scenario?
Uncertainity
Incorporating uncertainity about facts into the model.
- Gamma distribution
- Rate/mean = value that we think it is
- Changing observation
Regression
Tries to answer this question: 'What function could have generated these data points?'
- Linear regression => this function is a straight line.
- 'Traditional statistics'=> find the line that fits best. Least squares method.
- Bayesian statistics
- TODO: Cubic functions will have 4 parameters ?? Why?
Metropolis-Hastings algorithm
Lecture 2
- An attraction of probabilistic programming: Model is separate from the inference algorithm