One of the most important questions in user testing is how many users you should test with. Because you can’t test with all your users—a natural question follows: What sample size do you need to get valid results?
TLDR: the number of testers you need depends on various factors. We recommend you test with at least twenty users to uncover most issues in your design.
The short answer to this question is an unsatisfactory: It depends. While practitioners in the industry have readily adopted the 5-user rule as standard practice, the actual number of participants you need depends on various variables such as the type of usability testing you plan to do, your project’s criticality, and more.
In this article, we aim to explain why five users aren’t enough by examining the existent literature, and in the process, provide a more rounded answer to the question.
The 5-user rule was promulgated by Nielsen (2000) in his article Why You Only Need to Test with 5 Users. Based on the studies he and Landauer conducted, Nielsen argued that the number of usability problems found in a test decreases after the fifth user. This rule was based on the Law of diminishing returns set forth by Virzi (1992).
According to the curve, five users are enough to find 85% of usability problems in the design. However, for the Nielsen-Landauer formula to be true, the problem discovery rate should be 31% (this is the L or p (0.31) value in the formula). What this means is that at least one user will discover a third of the usability problems. According to Nielsen, the 31% in the formula has been “averaged across a large number of projects they studied.”
While the 5-user rule holds when the discovery rate is 31%, subsequent studies have shown that this number is hard to predict. The 31% average registered by Nielsen as standard isn’t valid for all testing scenarios. Unless you calculate this percentage beforehand for each one of your projects, there’s no way to predict how often problems can be found.
It's also worth pointing out that the 5-user rule only applies to qualitative usability testing when participants are observed completing tasks with the product. The formula is valid when the product has a relatively homogeneous target user group. In the final section of the same article titled When to test more users, a part overlooked by many adopters of the 5-user rule, Nielsen writes:
“You need to test additional users when a website has several highly distinct groups of users. The formula only holds for comparable users who will be using the site in fairly similar ways.”
Nielsen was writing at a time when getting buy-in for usability testing was difficult, recruiting participants was expensive, and so conducting tests took a big chunk out of your time and budget. Nowadays, new technology has facilitated the development of more affordable and efficient tools to test your designs with users.
As mentioned above, subsequent tests by Laura Faulkner revealed that the 31% problem discovery rate is harder to predict than first assumed. The study tested a group of 60 users and sampled random sets of five users or more to determine the percentage of usability problems found by each group.
The study showed that as predicted by the Nielsen formula, the average of usability problems found was 85% in 100 tests of five users. However, that percentage ranged significantly across the spectrum. She writes:
“The percentage of problem areas found by any one set of 5 users ranged from 55% to nearly 100%. Thus, there was large variation between trials of small samples.”
Importantly, these tests demonstrated that just by increasing the number of user participants from 5 to 10, the number of usability problems found and the confidence in the data increased as well:
“Groups of 10 found 95% of the problems. Groups of 5 found as few as 55% of the problems, whereas no group of 20 found fewer than 95%.”
By adopting the 5-user rule as standard, harder-to-find problems may be missed. Anyone conducting usability testing and blindly following the Nielsen formula may think that they uncovered 85% of all usability problems—whereas this percentage might actually be lower.
Five users can be a good starting point—and can sometimes be enough to provide valuable insights—but it shouldn't be thought of as a 'one size fits all' number that applies to every user test, every single time.
The popularity of the 5 user-rule stems from the simplistic nature of its solution. Everyone likes a concrete and straightforward answer that will help solve the issue of how many participants are required once and for all.
Unfortunately, when it comes to calculating how many test participants you need, there are numerous variables that might influence your results and their validity. When determining the number of participants, you need to take into account:
You can use the Sample Size Calculator from SurveyMonkey to work out the number of participants you need. This tool can give you the answer if you set the margin of error and confidence level you want.
Similarly, when doing A/B testing, use the A/B Testing Calculator to learn if the results you got are statistically significant.
In this article, we aimed to articulate the risks of adopting the 5-user rule as standard practice in usability testing, by looking at the studies made by Nielsen (2000) and Faulkner, L. (2003).
There have been more studies and more arguments made during the years, and this is just one look at a small part of the debate. Check out the articles mentioned for a complete reading on the subject.