-
Notifications
You must be signed in to change notification settings - Fork 0
/
maximum_likelihood_for_election.Rmd
164 lines (133 loc) · 5.55 KB
/
maximum_likelihood_for_election.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
jupyter:
jupytext:
notebook_metadata_filter: all,-language_info
split_at_heading: true
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.16.0
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---
# Maximum likelihood for election problem
```{python}
import numpy as np
import scipy.stats as sps
rnd = np.random.default_rng()
```
Here are the observed counts and percentages for the three candidates (Bush, Perot, Clinton):
```{python}
sample_votes = np.array([270, 225, 210])
n_voters = np.sum(sample_votes)
sample_proportions = sample_votes / n_voters
# Show the proportions in the sample.
np.round(sample_proportions, 3)
```
We are first interested in all benchmark universes where Bush had less than or equal to the vote proportion of one or other candidate.
We collect those — using integers for the percentages:
```{python}
# Our benchmark universes are all those where Bush has
# <= percent of another candidate.
benchmark_universes = []
for bush_no in range(0, 51):
remaining = 100 - bush_no
for perot_no in range(0, remaining + 1):
clinton_no = remaining - perot_no
if bush_no <= perot_no or bush_no <= clinton_no:
benchmark_universes.append([bush_no, perot_no, clinton_no])
benchmark_universes = np.array(benchmark_universes)
# Convert to proportions.
benchmark_universes = benchmark_universes / 100
benchmark_universes
```
We can then go through each universe, and ask how likely the observed counts would arise in that universe. We're interest to find the benchmark universe, among all possible universes, that has is most likely to generate the observed results (counts, percentages).
Do this, we could use simulation - taking samples from each universe, repeatedly, to see how often we see the observed counts. For example, to test the benchmark universe of most immediate interest to us:
```{python}
one_bench_universe = np.array([0.35, 0.35, 0.30])
```
Here we do repeated sampling from the first benchmark universe to see how often
that generates the vote totals we saw in the real world.
```{python}
# This cell takes about a minute on my laptop.
n_same_votes = 0
# Lots of samples, because the probability will be very small.
n_trials = 1_000_000
for i in range(n_trials):
# Draw a sample.
samp = rnd.choice(['Bush', 'Perot', 'Clinton'],
p=one_bench_universe,
size=n_voters)
# Calculate vote counts for each candidate.
n_bush = np.sum(samp == 'Bush')
n_perot = np.sum(samp == 'Perot')
n_clinton = np.sum(samp == 'Clinton')
# Collect them to compare to actual vote counts.
n_all = np.array([n_bush, n_perot, n_clinton])
# Are they the same as the actual vote counts?
if np.all(n_all == sample_votes): # Same as real sample:
n_same_votes = n_same_votes + 1
p_same = n_same_votes / n_trials
p_same
```
This is a little time-consuming. We can use a mathematical short-cut to get this same result, using Scipy's `multinomial` distribution, like this:
```{python}
# A distribution with the given number of voters and probabilities.
mdist = sps.multinomial(n_voters, one_bench_universe)
# What is the probability of seeing exactly the observed votes?
mdist.pmf([270, 225, 210])
```
Notice these two calculations give very similar values, but the multinomial
calculation is a) precise - it doesn't depend on randomness and b) much
quicker.
Notice too that the number we get is very small, because there are so many
different vote numbers that could result, so the chances of getting exactly
these numbers is small.
One way of making sure that the computer doesn't run into trouble with these small numbers is asking for the `logpmf`, which is just the logarithm of the same number you saw above:
```{python}
lpmf = mdist.logpmf([270, 225, 210])
lpmf
```
We can convert the logarithm back to the probability by using `np.exp`:
```{python}
# Convert the log back into the probability, giving the same
# answer as above.
np.exp(lpmf)
```
Using the logarithm instead of the original probability helps by keeping the
numbers from getting so small that the computer will struggle to make accurate
calculations, while keeping the numbers comparable. If the original
probability is greater than another probability for a different benchmark
universe, the logarithm is also greater.
We can then get the `logpmf` calculation for all our proposed benchmark
universes. The benchmark universe with the highest `logpmf` is also the
benchmark universe with the highest probability of giving the exact observed
vote counts from the sample.
```{python}
n_benchmark_universes = len(benchmark_universes)
bench_universe_logp_obs = np.zeros(n_benchmark_universes)
for i in np.arange(n_benchmark_universes):
# Multinomial distribution.
mdist = sps.multinomial(n_voters, benchmark_universes[i])
# How likely are the observed numbers, if benchmark holds.
bench_universe_logp_obs[i] = mdist.logpmf([270, 225, 210])
```
The largest log probability:
```{python}
np.max(bench_universe_logp_obs)
```
```{python}
# The corresponding index position of the maximum.
index = np.argmax(bench_universe_logp_obs)
# So this will give us the maximum again.
bench_universe_logp_obs[index]
```
```{python}
# The benchmark universe (triplet of proportions) most likely to generate
# the observed vote totals:
benchmark_universes[index]
```
So the [0.35, 0.35, 0.3] benchmark universe is the one most compatible with the observed votes, while still having Bush less than or equal to another candidate.