Can opinion polls ever be accurate? Probably not
Enrico Scalas is Professor of Statistics and Probability at the University of Sussex and Dr Nicos Georgiou is a Lecturer in Mathematics, Probability and Statistics at the University of Sussex.
When they head into the polling booth on June 23, voters in the referendum on the UK’s membership of the European Union know there will be only two possible outcomes: in or out. The result is determined by a simple majority vote. No democratic option can be simpler.
With fluctuations in overall “polls of polls” as the referendum approaches, can these polls be trusted – particularly given the glaring failures to predict the outcome of the 2015 UK general election.
The reasons behind the discrepancy between polls and results lie in the polling procedure itself. And there is an obvious difficulty when public opinion is split almost in half.
Public opinion is not independent of geographical location. Isolated or small places tend to be conservative and based on past results and large metropolitan cities tend to be more liberal.
Consider a population of 60m electors in which 30,050,000 have an opinion A, and the rest opinion B. Now assume you accept some reasonable limitations on your sample size and you choose to poll 1,500 people at random to get a feel for public opinion. In the most generic of situations, an argument based on the theory of probability gives that on average, at least two in five polls (44% to be precise) will show that B is leading with at least 50.1% of the vote share. In other words, two in five polls will give the wrong answer.
Now consider the same situation with a random sample of 6,000. Again, at least three out of ten polls will show the wrong camp in the lead. There is a definite improvement as we increase the sample size – this is because larger independent samples lead to better approximations. This is the effect of what is known to mathematicians as the “strong law of large numbers” and the “central limit theorem”.
Bigger crowds needed. Andy Sedg/flickr.com (CC BY-NC-ND)
As an extreme case, consider a forbiddingly large sample of 600,000. Then the probability of a poll not giving the correct outcome is less than 0.2%, so only one in 500 polls is expected to be wrong. With a sample size this large, the probability of error is what is called a large deviation – incredibly rare.
Sample size considerations are enough to guarantee that the most accurate polls are the exit polls – but, at that point, the poll result is only useful to ignite discussion in TV shows as the final tally is only hours away.
Reality is even more complicated. Public opinion is not independent of geographical location. Isolated or small places tend to be conservative and based on past results and large metropolitan cities tend to be more liberal. They are also densely populated. A selection of 100 people from a small area and a large city will then give a skewed perception of public opinion.
While some polling companies do publish details of the way they carry out their polls, it is not always clear.
A possible fix
So ideas to fix this process start to take shape. For example, the sample needs to be representative of the population – this is known as “stratified sampling”.
There are always social, economic, cultural and national variations in the way people think on any political issue. So, in order to have an accurate poll, a pollster would need to stratify the population according to factors, such as their wealth or where they live, and sample each group properly and separately – rather than choosing people at random. While some polling companies do publish details of the way they carry out their polls, it is not always clear.
Stratified sampling is more complicated (and also more political) than simple random sampling. Pollsters would need to decide on the guiding factors of public opinion on an issue, such as the UK’s membership of the EU, and stratify the population according to those factors. These may differ every year, as other events take place shaping public opinion.
In the end, the poll results depend on how people reply to a survey. There is no reason to reply honestly, or even at all. And there is also the likelihood that a sampled individual won’t actually vote on the election day. So we can probably never have an accurate polling procedure when the public opinion is so split so close to 50-50.
This article was originally published on The Conversation. Read the original article.