2019: PP: Sam Staton

Pre?

Bayesian statistics

From here:

an approach to data analysis based on Bayes' theorem, where knowledge about parameters in a statistical model is updated with the information in observed data.

From Thomas Bayes (1701-1761), an English statistician.
Based on the 'Bayesian interpretation of probability' ʷ
- ie, probability indicates the 'degree of belief in an event' ʷ
Compute and update probabilities after encountering new data.

Bayes law:

P(A/B) is probability of A occuring provided B happens (conditional probability).
```
          P(A) 
P(A/B) = ────── * P(B/A)
      P(B)
```
```
Posterior probability  ∝  Likelihood * Prior
```
- Prior: Prior information.
  - Background knowledge.
  - A distribution
- Likelihood: Observed data
  - A function
- Posterior probability
  - Updated probability distribution based on prior and likelihood ??
Posterior can be used to make predictions.

—
```
               P(d/x) * P(x)
P(x/d) = -----------------------
          a normalizing constant



               P(d/x) * P(x)
       = -----------------------
          a normalizing constant
```
(P(x/d) is P(x) given data d)

What could the normalizing constant be?

It's a big sum (TODO: But why and how ??). ie, intergral.
```
⌠
│ P(d/y) * P(y) * dy
⌡
```

Probabilistic programming (PP)

PP	Bayesian
Sampling	Prior
Observation / scoring	Likelihood
normalize / simulate	Posterior probability

An example PP program:

Misc

Poission distribution
- A discrete probability distribution
- Probability of an event happening n number of times within a given interval of time or space
  - TODO: What could interval of space mean here??
- Has only one parameter: mean number of events ʳ
- Can be obtained as an approximation of a binomial distribution when the number of trials n is large, success probability p is small and np is finite ʳ

Lecture 1

Bus weekend example

On an average:

Weekend => 2 buses in an hour
Weekday => 10 buses in an hour

Saw 4 buses.

Is it a weekend?

Let's consider the probability of it being a weekend.

Buses showing up.

Same event happening multiple times within a time period.
Matches description of a Poisson distribution.

Example languages:

Hakaru: Indiana
- https://github.com/hakaru-dev/hakaru
PSI: ETH Zürich
- https://github.com/eth-sri/psi

—

Methods of running a probabilistic program

'Direct calculation'
'Simulation method'

—

:Side-note:

Definitional interpretor
- By John Reynolds
- Describing a programming language by defining an interpreter for it
- Focus is on meaning, not efficiency
- Dana Scott and Christopher Strachey proposed denotational semantics
  - https://homepage.divms.uiowa.edu/~slonnegr/plf/Book/Chapter9.pdf
TODO: What's the difference from definitional interpreter?
- Denotational semantics: Describes what a program does by associating it with a mathematical object
- Definitional interpretor: Describes what a program does by giving an interpretor giving that mathematical object ??

Simulation

Let's see two ways:

Monte Carlo with rejection
- Doesn't always work ??
Weighted Monte Carlo
- aka importance sampling Monte Carlo

Monte Carlo with rejection

Run the program numerous (a big number) times
Make random choice on each run

Note: Likelihood and probability are not the same. The subtle difference is more clearly in the case of continuous.

TODO: How does one decide which probability distribution would be best suited to a scenario?

Uncertainity

Incorporating uncertainity about facts into the model.

Gamma distribution
Rate/mean = value that we think it is
- ie, the uncertain value
Changing observation

Regression

Tries to answer this question: 'What function could have generated these data points?'

Linear regression => this function is a straight line.
'Traditional statistics'=> find the line that fits best. Least squares method.
Bayesian statistics
TODO: Cubic functions will have 4 parameters ?? Why?

Metropolis-Hastings algorithm

Lecture 2

An attraction of probabilistic programming: Model is separate from the inference algorithm