2019: PP: Sam Staton

Pre?

Bayesian statistics

From here:

an approach to data analysis based on Bayes' theorem, where knowledge about parameters in a statistical model is updated with the information in observed data.

  1. Bayes law:

    P(A/B) is probability of A occuring provided B happens (conditional probability).

              P(A) 
    P(A/B) = ────── * P(B/A)
          P(B)
    

    Posterior probability  ∝  Likelihood * Prior
    
    • Prior: Prior information.
      • Background knowledge.
      • A distribution
    • Likelihood: Observed data
      • A function
    • Posterior probability
      • Updated probability distribution based on prior and likelihood ??

    Posterior can be used to make predictions.

                   P(d/x) * P(x)
    P(x/d) = -----------------------
              a normalizing constant
    
    
    
                   P(d/x) * P(x)
           = -----------------------
              a normalizing constant
    

    (P(x/d) is P(x) given data d)

    What could the normalizing constant be?

    It's a big sum (TODO: But why and how ??). ie, intergral.

    ⌠
    │ P(d/y) * P(y) * dy
    ⌡
    

Probabilistic programming (PP)

PP Bayesian
Sampling Prior
Observation / scoring Likelihood
normalize / simulate Posterior probability

An example PP program:

Misc

  1. Poission distribution

    • A discrete probability distribution
    • Probability of an event happening n number of times within a given interval of time or space
      • TODO: What could interval of space mean here??
    • Has only one parameter: mean number of events ʳ
    • Can be obtained as an approximation of a binomial distribution when the number of trials n is large, success probability p is small and np is finite ʳ

Lecture 1

Bus weekend example

On an average:

Saw 4 buses.

Is it a weekend?

Let's consider the probability of it being a weekend.

x

Buses showing up.

Example languages:

Methods of running a probabilistic program

:Side-note:

Simulation

Let's see two ways:

Monte Carlo with rejection

Note: Likelihood and probability are not the same. The subtle difference is more clearly in the case of continuous.

TODO: How does one decide which probability distribution would be best suited to a scenario?

Uncertainity

Incorporating uncertainity about facts into the model.

Regression

Tries to answer this question: 'What function could have generated these data points?'

Metropolis-Hastings algorithm

Lecture 2