Over the last few months, I’ve often heard that polls can’t be trusted. In particular, I have heard that they can’t be trusted because each one usually involves study of only about a thousand individuals. I have even heard that argument from a retired quantitative linguist.1 So I’ve put together this essay in order to explain how polls work, why a random sample of a thousand should usually be considered sufficient, and why the results should be treated as informative even though they do not enable us to predict precise numbers of votes (which is a particular problem when the results are going to be close — because then, precise numbers can make all the difference).

The distinction between polling and prediction is an important one. Polling is a research method, which is why I take an interest in it: I do not teach political science, but I do teach research methods. By general agreement, polling is not just a method, but probably the best available method for estimating public opinion.

Estimates of public opinion provide a basis on which to predict the outcome of a vote, but they do not in themselves predict the vote. Such estimates are inherently imprecise, and this is something about which I shall have much to say below. Moreover, public opinion may be estimated accurately at some particular point in time yet lead to an inaccurate prediction with regard to a future vote, both because public opinion changes all the time, and so may change between the time when a poll was conducted and the time when the vote takes place, and because public opinion does not necessarily translate into votes in a straightforward way, as not everyone with an opinion will get around to casting a vote.

All this means that the only way to know the vote is to have the election. But for all sorts of reasons, organisations want to know how elections are likely to turn out, and so they turn to polls — and to increasingly elaborate attempts to analyse polls — for the sake of prediction.

It’s really important to understand that clients commission polls in order to acquire information about public opinion. Opinion polling is one of several kinds of social research that are conducted outside the academy, in the private sector. Others include market research, which is often carried out by the same companies. Reputable polling and market research companies compete with one another to provide the most accurate estimates and predictions. They don’t just tell their clients what their clients want to hear, because if they did that, then their businesses would fail. Success for a pollster depends upon reputation, and a pollster’s reputation suffers when the pollster in question is seen to have provided misleading information.

Polling works as follows. You want to know what the members of a particular population think, but you can’t ask all of them. Instead, you pick a sample of the population, and you ask members of the sample. The sample needs to be small enough to be cost-effective, but it also needs to be representative of the population as a whole. Regardless of how enormous that population is, you can usually get a somewhat representative sample by picking about a thousand of its members at random.

I wrote this essay to explain why that is the case. There are currently two versions of it: one on my blog, with text and linked images but without code, and one self-contained R Notebook, with text, images, and code. The explanation of basic principles is followed by a survey of some recent events widely regarded as polling failures, both to show how these things work out in the real world and to explain why recent experience doesn’t entitle anyone to wish away a series of bad polls. I finish up by taking a look at a series of really, really bad polls to show what bad looks like. (Bad, that is, for the losing party — from the point of view of its main rival, those same polls look absolutely great.)

1 A simple polling simulation

To understand how (and why) random sample polling works, we’re going to look at a simulation. We will start with two hypothetical voting populations, A and B. Each consists of a million voters whose preferences are divided between the Sensible Party and the Silly Party. In Population A, 51% of voters support the Sensible Party and 49% support the Silly Party. In Population B, 70% of voters support the Sensible Party and 30% support the Silly Party. (In case you missed it, this is a Monty Python reference. Sorry; I’m British.)2

# Make the kable function available
library('knitr')
# Make the sequence of random numbers reproducible
set.seed(1)
# Create Population A and Population B
population.a <- factor(c(rep('Sensible',510000),
                         rep('Silly',490000)),
                        levels=c('Sensible','Silly'))
population.b <- factor(c(rep('Sensible',700000),
                        rep('Silly',300000)),
                       levels=c('Sensible','Silly'))
# Check the two populations directly (without polling)
table(population.a)
population.a
Sensible    Silly 
  510000   490000 
table(population.b)
population.b
Sensible    Silly 
  700000   300000 

How can we find out what the members of our two populations think about the two parties? In reality, we’d have to contact them and persuade them to tell us, but this is a simulation, so we only need to write a function to pick random individuals from a population and record their voting preferences. To test it, let’s sample the whole of each population.

# Function to carry out random sample polling
poll.count <- function(pop,samp.size) {
  s <- table(sample(pop,samp.size)) / samp.size * 100
  return (as.vector(s))
}
# Function to poll two populations and tabulate results
poll.both <- function(pop1,pop2,samp.size) {
  p1 <- poll.count(pop1,samp.size)
  p2 <- poll.count(pop2,samp.size)
  m <- as.matrix(c(p1,p2))
  dim(m) <- c(2,2)
  return(t(m))
}
# Function to display polls as a nice-looking table
poll.tab <- function(p,pop.names) {
  colnames(p) <- c('Sensible','Silly')
  row.names(p) <- pop.names
  k <- kable(as.matrix(p),
             'markdown',
             format.args = list(width = 2),
             align = 'r')
  return(k)
}
# 'Random' sample of a million from each population
poll.tab(poll.both(population.a,population.b,
                   1000000),
         c('A','B'))
Sensible Silly
A 51 49
B 70 30

The perfect result (obviously!), but this is daft: there’s no point sampling people at random if we’re going to sample all of them anyway. So how many people are we going to sample?

1.1 Single polls with varying sample sizes

Let’s start with a sample of one from each population:

# Random sample of 1 individual per population
poll.tab(poll.both(population.a,population.b,
                   1),
         c('A','B'))
Sensible Silly
A 0 100
B 100 0

Clearly a sample of one is no good: the finding is always going to be a landslide for the preferred choice of whichever individual we happen to ask. In this case, polls of random samples of one happen to have called one election correctly and one incorrectly, but that’s pure chance. So let’s try a sample of ten:

# Random sample of 10 individuals per population
poll.tab(poll.both(population.a,population.b,
                   10),
         c('A','B'))
Sensible Silly
A 50 50
B 90 10

That’s a bit better. Population A looks like a dead heat (which is almost right), while the Sensible Party’s lead has been recognised but greatly exaggerated in Population B. As we’ll see in a moment, a random sample of ten could easily be more wrong than this, but it will usually be less wrong than a sample of one. Now let’s try polling a sample of a hundred:

# Random sample of 100 individuals per population
poll.tab(poll.both(population.a,population.b,
                   100),
         c('A','B'))
Sensible Silly
A 45 55
B 73 27

Population B’s preferences have been estimated pretty accurately this time, but we’ve somehow got the Silly Party in the lead in Population A — and by a whopping 10%! That’s not because a sample of a hundred is worse than a sample of ten, but because the effects of randomness in the selection procedure have a major effect on the outcome whether you take a sample of ten or of a hundred. Samples of a hundred are more reliable than samples of ten because — as we’ll see very shortly — they will be wrong less often. But we would be very unwise to set any store at all by a single study with a sample of a hundred, and the above reveals why this is the case. By picking the wrong hundred individuals to sample, we can get a very unrepresentative result. And when you’re picking individuals at random and you’re only picking a hundred of them, it’s far from unlikely that you’ll get an unrepresentative hundred on any one particular occasion.

So now let’s go all the way up to a thousand, which is about as far as most polling companies usually go:

# Random sample of 1000 per population
poll.tab(poll.both(population.a,population.b,
                   1000),
         c('A','B'))
Sensible Silly
A 52.1 47.9
B 70.0 30.0

Not bad at all! Population B’s polling is bang on, while Population A’s is only barely off.

What if we sample ten thousand?

# Random sample of 10000 per population
poll.tab(poll.both(population.a,population.b,
                   10000),
         c('A','B'))
Sensible Silly
A 50.52 49.48
B 69.61 30.39

This doesn’t really look like an improvement on the previous one: Population A’s preferences have been estimated slightly more accurately, while Population B’s have been estimated slightly less accurately. Polling companies very rarely take samples as large as this because telephoning ten thousand people involves a massive increase in cost but brings only a slight increase in reliability.