6805: 1094 Activities 15
Write your name and answers on this sheet and hand it in at the
end.
Work with others at your table on these activities. Argue about
the answers but work efficiently!
Checking the sum and product rules, and their consequences
Goal: Check using a very simple example
that the Bayesian rules (on slide 1) are consistent with standard probabilities
based on frequencies.
Table 1

Blue

Brown

Total

Tall

1

17

18

Short

37

20

57

Total

38

37

75

Table 2

Blue

Brown

Total

Tall




Short




Total




 Table 1 shows the number of blue or browneyed and tall or short
individuals in a population of 75.
Fill in the blanks in Table 2 with probabilities (in decimals, not fractions)
based on the usual
"frequentist" interpretation of probability (which would say that the
probability of randomly drawing an ace from a deck of cards is 4/52 = 1/13).
Circle the row and/or column that illustrates the sum rule on slide 1.
 What is pr(short,blue)? Is this a joint or conditional probability?
What is pr(blue)? From the product rule,
what is pr(shortblue)? Can you read this result directly from the table?
 Apply Bayes' theorem to find pr(blueshort) from your answers to
the last part.
 What rule does the second row (the one starting with "short")
illustrate? Write it out in "pr()" notation.
 Are the probabilities of being tall and having brown eyes
mutually independent? Why or why not?
Standard medical example problem by applying Bayesian rules of probability
Goal: Use the Bayesian rules to solve a familiar problem.
Suppose there is an unknown disease (UD) and there is a test for it.
 The false positive rate is 2.3%. ("False positive" means the test says you have UD,
but you don't.)
 The false negative rate is 1.4%. ("False negative" means you have UD, but the
test says you don't.)
Assume that 1 in 10,000 people have the disease.
You are given the test and get a positive result.
Your ultimate goal is to find the probability that you actually have the
disease. We'll do it using the Bayesian rules.
We'll use the notation:
 H = "you have UD"
 H = "you do not have UD"
 D = "you test positive for UD"
 D = "you test negative for UD"
 Before doing a calculation (or thinking too hard :), does your
intuition tell you the probability you have the disease is high or low?
 In the "pr()" notation, what is your ultimate goal?
 Express the false positive rate in "pr()" notation.
 Express the false negative rate in "pr()" notation.
By applying the sum rule, what do you also know?
(If you get stuck answering the question, do the next part first.)
 Should pr(DH) + pr(DH) = 1?
Should pr(DH) + pr(DH) = 1?
(Hint: does the sum rule apply on the left or right of the ?)
 Apply Bayes' theorem to your result for your ultimate goal (don't put
in numbers yet).
Why is this a useful thing to do here?
 Let's find the other results we need. What is pr(H)?
What is pr(H)?
 Finally, we need pr(D). Apply marginalization first, and then
the product rule twice to get an expression for pr(D) in terms of quantities
we know.
 Now plug in numbers into Bayes' theorem and calculate the result.
What do you get?
Radioactive lighthouse problem
Goal: Explore a classic problem from Gull (by way of Sivia's book).
In the figure, a radioactive source that emits gamma rays randomly in time
but uniformly in angle is placed at (x_{0}, y_{0}).
The gamma rays are detected on the xaxis and these positions are saved,
x_{k}, k = 1,...,N.
Given these positions, the problem is to estimate the location of the source.
We'll assume we know that y_{0} = 1 (in whatever length units we are
using), so our goal to is estimate x_{0}.
The angle θ is between the γ ray and the yaxis in the figure.
 Claim: in the pr() notation, our goal is to find the posterior PDF
pr(x_{0}  {x_{k}}, y_{0}). How would you translate
this posterior to words?
 By Bayes' theorem, how is this posterior related to
pr({x_{k}}  x_{0}, y_{0}), pr(x_{0}  y_{0}),
and pr({x_{k}}  y_{0})?
 Claim: because the denominator PDF is independent of x_{0},
it is just a normalization factor for
pr(x_{0}  {x_{k}}, y_{0}), so we don't
need to calculate it explicitly.
Do you understand this? What good is an unnormalized posterior
pr(x_{0}  {x_{k}}, y_{0})?
 If we take for the prior PDF
pr(x_{0}  y_{0}) that pr(x_{0}  y_{0}) = pr(x_{0})
= 1/x_{0,max} − x_{0,min} for
x_{0,max} < x_{0} < x_{0,min} and zero elsewhere,
what are we assuming? Why is this more plausible than letting x_{0}
be anything? Why do we assume a constant PDF? Is this PDF normalized?
 If we assume that the x_{k}s are mutually independent, then how is
pr({x_{k}}  x_{0}, y_{0}) simplified? Is this a
justifiable assumption?
 Show that
pr(x_{k}  x_{0}, y_{0})
= (y_{0}/π)(y_{0}^{2} +
(x_{k} − x_{0})^{2})^{−1}
given that the angular distribution from θ_{k} is uniform from
−π/2 to +π/2, so
pr(θ_{k}  x_{0}, y_{0}) = 1/π ,
and also that
pr(θ_{k}  x_{0}, y_{0}) dθ_{k}
= pr(x_{k}  x_{0}, y_{0}) dx_{k} .
 Ok, now we're ready to see what the estimates for x_{0} look like.
Open the Mathematica notebook Bayesian_games_part1.nb.
 For this notebook we assume y0 = 1 is known. We are trying to estimate
x0, whose true value is 1 (we don't use that in the notebook).
Run this section
 Look up "CauchyDistribution" in the Mathematica Help to verify it is
the same function derived above.
Run the "Generate a set of random x points" section several times to
see the fluctuations in the distribution.
What can you say about the tails of this distribution compared to your
experience with Gaussian distributions?
 Run the section on the "Posterior for a single x measurement" several times
to see how much it can change.
Would taking the maximum or the mean of these distributions give
us a useful estimate of x0?
 In this section the posterior for x0 is calculated and plotted for
different numbers of data. The prior is taken to be a uniform PDF from
4 to 4 (we really don't believe it is bigger than that but otherwise we
don't know what it is). For each Nmax, besides plotting the posterior for x0,
we calculate the maximum and mean of the posterior
⟨x_{0}⟩,
and the mean of the set of Nmax points
x_{0}.
Run this section several times and record the results in the table:
Nmax

1: ⟨x_{0}⟩

1: x_{0}

2: ⟨x_{0}⟩

2: x_{0}

3: ⟨x_{0}⟩

3: x_{0}

1







2







4







16







64







256







What are your observations about the posterior for x0 as a function
of Nmax and which mean is the better estimate?
[Extra] Maximum entropy and prior PDFs
Goal: Derive the prior probability distribution function (PDF) on slide 12.
 On slide 12 of Bayesian_statistics_basics.pdf, an expression for
the entropy corresponding to the probability distribution function
pr(x) is given. The idea of the maximum entropy approach to priors is
that one should assume only what is known, and no more. This is achieved
by maximizing the entropy subject to appropriate constraints. Note that
this determines the function pr(x), as opposed to a single value
of a function.
This maximization is carried out using Lagrange multipliers.
Are you familiar with using Lagrange multipliers?
How about taking functional derivatives?
 Here's a quick Lagrange multiplier problem to remind you. Find the extrema of
F(x,y) = x^{2}y − ln(x) subject to 8x + 3y = a.
To do this, we take the partial derivatives of F(x,y)  λ(8x + 3y  a)
with respect to x, y, and &lambda. This gives three equations in three
unknowns, which we solve to find the (x,y) points of the extrema.
 Find the three equations.
 Use the Mathematica Solve command to find solutions when a=0.
The only real solution (which is a relative minimum) is x = −1/2, y = 4/3.
What did you get?
 Assuming m(a) is a constant, find the functional derivative of
Q with respect to pr(a  M,R) and set it to zero.
 Find the (ordinary) partial derivatives with respect to λ_{0}
and λ_{1}, and set them each to zero.
 Eliminate the λ_{i} dependence to verify the result given
on slide 12 for pr(a  M,R).
6805: 1094 Activities 15.
Last modified: .
furnstahl.1@osu.edu