ActiVote’s Polling Accuracy
Surprisingly Accurate Polls
We are often asked how accurate ActiVote’s polling is, as it is somewhat unconventional in a number of ways:
• We include rankings from the past 90 days in a single poll, instead of polling a large sample of people in just 2-3 days.
• Participants in our polls are self-selected in that people chose to download our app and rank elections, instead of being randomly selected by us.
• Our polls often have smaller sample sizes than typical polls.
On this page we provide both a summary and more elaborate answer that shows that ActiVote’s polling is surprisingly accurate.
Summary of ActiVote’s Polling Accuracy
A simple rule-of-thumb for our polls is that for any state-wide general election the average expected deviation is approximately equal to the “Margin-of-Error” for a probabilistic poll of the same sample size:
According to the American Association of Public Opinion Research (AAPOR), in 2020 the following average error was found for reputable pollsters:
In order to compare our polling capabilities with those of the AAPOR report, we retroactively applied our polling algorithm to the data we collected prior to the 2020 election. It shows the following performance:
A high-level conclusion, based on these 8 ActiVote polls, is that ActiVote’s polling is approximately on-par with the average of other pollsters as found by AAPOR.
A deeper dive into ActiVote’s polling performance
ActiVote has over 10,000 elections in our app and our users will rank their preferred candidates for any of them. Some of those races can be for the mayor of a small town, or the school board in a small school district, where we end up with very few rankings in our sample, due to the limited number of users in that district. In other cases, like the national presidential election, we had a sample of over 6000 votes.
In order to improve our polling algorithms, we do a post-mortem analysis for every single poll for which we have 10 or more rankings. Even though a poll of just 10 votes is only slightly better than just “noise”, by including very small polls in our data we can see the trendlines of what the expected error is for each type of poll. In the table below we show the results of 164 state-wide general election polls in 2020 and 2021:
The main takeaway from the table is that the average error decreases with the Margin-of-Error (MoE) of the sample size, where for all polls of 50 or more voters, the average error is approximately equal to the MoE. Of the 17 polls that had 200+ votes (with an average sample size of 322), our average error of 5.5% is about the same as state-wide errors of other pollsters (while they often use sample sizes of approximately 1000).
Based on this analysis, we have decided for the 2022 midterms to publish polls with approximately 200 users or more, as the average error is expected to be like those of other pollsters.
But WHY does it work?
We are aware of the various reasons that ActiVote’s polling might not have worked:
• Our participants are self-selected
• We take our sample over a long period of time (90 days)
• Our samples are relatively small.
And still it does work. So, let’s address each of these concerns separately to find out why we believe our polling works after all.
The gold standard of past polling was using probabilistic samples: a set of phone numbers was selected randomly in such a way that the whole target population was equally represented. Then, everyone on the list was interviewed to produce a poll. Unfortunately, with number recognition, with the decline of landlines, and with some groups of people less likely to participate in polls, probabilistic samples no longer work the way they used to. A typical response to a poll is only 5% or so (instead of the 70% response rate of several decades ago), meaning that 95% of the probabilistic sample does not participate. This makes that probabilistic samples are just as flawed as others and will need to be corrected for the imbalance in respondents.
Thus, almost all pollsters nowadays use other methods to reach their sample and then correct for oversampling of some groups and under sampling of other groups. Our method of self-selected users who downloaded the app happens to work well, likely because we have users from every state (we have as many users per capita from Wyoming, a very small red state, as from California, a very large blue state), from each party affiliation, gender, age, ethnicitiy, income, education, rural or urban and political leanings.
Thus, while we need to adjust the poll for over- and underrepresentation, this is not different from what other pollsters need to do. And it then happens to show that for the vast majority of all polls, our self-selected group is a good representation of the overall electorate.
Instead of taking a poll in a 2-3 day period, we collect all rankings created in the 90 days before we publish our poll. The question now is in how far we get “outdated” opinions. Research shows that in most cases most people don’t change their mind towards an election. The biggest change is that more and more people actually make up their mind. And if people change their mind, they are just as likely to change it back again. This last phenomenon can be seen after the party conventions before a presidential election. Typically a candidate gets a “bump” in the polls from that convention, but that bump dissipates in the weeks after. It can be argued that trying to take a poll to capture that bump actually leads to a worse prediction for the actual election outcome, than having a sample that is spread over a longer period of time.
In our app, people who have not yet made up their mind will simply not create a ranking for a race. Nobody is prompting them on the phone or by text or online to make a quick decision on who they would vote for. Nor do they get any reward for participating. Therefore, we may have far fewer rash and unreliable decisions from anyone who felt compelled to express an opinion when prompted by a poll they were not ready for.
Thus, having a longer time period can be as much an asset as a liability and our historic performance suggests that collecting three months of rankings creates excellent results.
Small Sample Sizes
Clearly, our results show that larger sample sizes are better than smaller sample sizes. Our expected error diminishes in line with the MoE related to a poll’s sample size. Thus, the larger the sample the better. The question, however, is what to do with a poll where we happen to have a smaller sample size? Should we withhold that poll until we have 1000 votes, or should we publish it?
Our choice is to publish polls once they get close to, or pass the 200 threshold, as this allows us to publish a large number of polls, including for races where virtually no other pollster would attempt to create a poll. For instance, we published in early September 2022 a general election poll for the Texas Commissioner of Agriculture with just under 200 votes, because the polling algorithm showed that the sample was well-balanced. There may not be any poll from other pollsters for this race in the whole midterm cycle, which suggests that it is a good public service to publish such polls.
In the 2022 Primary season ActiVote published about 40 primary polls. Some pollsters do not produce any primary polls as these races are notoriously hard to poll. We did so to determine what the relationship is between the average polling error and the MoE of those polls. The conclusion was that the average error is close to 1.5 times the MoE, thus 1.5 times worse than what we expect for our general election polls.
For the future, this means that especially primary polling will benefit from (much) larger sample sizes than we had during the 2022 cycle.
For any questions and comments on this analysis of ActiVote’s polling capabilities, please contact firstname.lastname@example.org.