AB Test Sample Calculator
Estimate how many visitors you need in each variant before launching an A/B test. This calculator helps marketers, product managers, analysts, and growth teams size experiments with better statistical discipline.
Enter your baseline conversion rate, minimum detectable effect, significance level, statistical power, and traffic split to calculate the recommended sample size per variant and total test requirement.
Calculator Inputs
Use realistic assumptions for stronger experiment planning.
Current conversion rate for your control variant.
Relative uplift you want to reliably detect.
Lower alpha reduces false positives but requires more traffic.
Higher power lowers false negatives but increases sample size.
Represents the smaller group’s traffic share.
Two-sided is the safer default for most business tests.
Results
Enter your assumptions and click Calculate Sample Size to see the required visitors per variant, total sample size, and estimated conversion targets.
Sample Size Visualization
How to Use an AB Test Sample Calculator for Smarter Experiment Planning
An ab test sample calculator helps you estimate the number of users, sessions, or visitors required before an experiment can reliably detect a meaningful difference between two variants. In practice, most teams run A/B tests to compare a control page against a modified treatment page. The challenge is that conversion rates naturally fluctuate from day to day, which means a small observed difference can easily be random noise. A good calculator translates your assumptions into a target sample size so that your test has a realistic chance of identifying a true effect.
The core inputs are usually straightforward. First, you need a baseline conversion rate, which is the typical performance of your current page or flow. Next, you define your minimum detectable effect, often called MDE. This is the smallest uplift or decline that would matter enough for the business to act on. Then you set a significance level and statistical power. Significance controls the risk of false positives, while power controls the risk of missing a real improvement. Once those assumptions are set, a sample calculator estimates how much traffic each variant needs.
Why sample size matters in A/B testing
Running a test without enough sample size is one of the most common reasons teams make poor product and marketing decisions. If a test is underpowered, you can get dramatic looking swings that disappear as more data arrives. That often leads teams to stop too early, ship changes that do not really help, or reject ideas that might actually work. A properly planned sample size does not guarantee perfect decisions, but it significantly improves the quality of evidence.
- It reduces the chance of reacting to random fluctuations.
- It helps align stakeholders before the test launches.
- It makes expected test duration easier to forecast.
- It encourages teams to prioritize changes with meaningful business impact.
- It creates more consistent reporting standards across experiments.
For example, if your current conversion rate is 5% and you want to detect a 10% relative uplift, the treatment conversion rate you are targeting is 5.5%. That is only a 0.5 percentage point absolute change. Small absolute changes require a lot of traffic, especially when confidence and power are high. This is exactly why experiment planning matters so much in growth programs.
Understanding the main calculator inputs
To get value from any ab test sample calculator, you need to understand what each setting means and how it changes the output.
- Baseline conversion rate: This is your expected control rate. Lower baseline rates generally require larger samples to detect the same relative change.
- Minimum detectable effect: A smaller MDE increases the required sample dramatically because it asks the test to distinguish between very similar conversion rates.
- Significance level: Most teams use 0.05, which corresponds to 95% confidence. Tightening this to 0.01 raises the sample requirement.
- Power: Common defaults are 80% or 90%. Increasing power makes your test more likely to catch real effects, but also demands more traffic.
- Traffic split: Uneven traffic allocation is sometimes useful, but it is statistically less efficient than a balanced split.
- One-sided vs two-sided testing: Two-sided tests are more conservative because they account for movement in either direction.
Most online teams use a two-sided test, 95% confidence, and 80% power as a practical starting point. Those values are not magic. They reflect a tradeoff between rigor and speed. If a mistake would be costly, you might choose stricter settings. If experimentation speed is critical and the downside risk is limited, some teams accept slightly looser assumptions.
Comparison table: how MDE changes sample size pressure
The table below shows how required sample size changes when the baseline conversion rate is 5%, using a typical 95% confidence level and 80% power. These values are representative planning estimates for a balanced test and illustrate how quickly sample needs grow as the desired detectable effect becomes smaller.
| Baseline Rate | Relative MDE | Target Treatment Rate | Approx. Visitors Per Variant | Approx. Total Visitors |
|---|---|---|---|---|
| 5.0% | 20% | 6.0% | 8,100 | 16,200 |
| 5.0% | 15% | 5.75% | 14,400 | 28,800 |
| 5.0% | 10% | 5.5% | 31,400 | 62,800 |
| 5.0% | 5% | 5.25% | 125,400 | 250,800 |
Planning values shown above are rounded examples for educational use. Exact results vary by formula, tails, and allocation ratio.
How the math works at a practical level
For a standard A/B conversion test, the calculator models the experiment as a comparison between two proportions. One proportion is the control conversion rate and the other is the expected treatment conversion rate. The statistical formula combines two ingredients: the threshold needed to avoid false positives and the threshold needed to avoid false negatives. These are represented by critical values associated with the chosen significance level and statistical power. The formula then scales those thresholds by the variance in the conversion data and divides by the square of the expected difference between variants.
There are several technical formulations used in experimentation platforms and statistical textbooks, but the business intuition is simple:
- Smaller differences are harder to detect.
- Noisier data requires more observations.
- More confidence and more power both demand larger samples.
- Balanced traffic allocation is generally the most efficient.
If your site only gets a few thousand visitors per week, trying to detect a 3% relative lift on a low-conversion page may be unrealistic. In that scenario, it can be smarter to test larger design or messaging changes, optimize a higher-traffic funnel step, or combine several micro-conversions into a stronger composite outcome.
Comparison table: the impact of confidence and power
Below is another planning example using a 5% baseline conversion rate and a 10% relative MDE in a balanced A/B test. It demonstrates how stricter statistical settings increase traffic requirements.
| Confidence Level | Power | Approx. Visitors Per Variant | Total Visitors Needed | Interpretation |
|---|---|---|---|---|
| 90% | 80% | 24,700 | 49,400 | Faster, but more false-positive risk than 95% confidence. |
| 95% | 80% | 31,400 | 62,800 | Common default used by many teams. |
| 95% | 90% | 42,000 | 84,000 | Better chance of detecting a real effect, but slower to run. |
| 99% | 90% | 60,900 | 121,800 | Very conservative and usually reserved for higher-risk decisions. |
Common mistakes when using an AB test sample calculator
One major error is setting the MDE based on hope instead of business reality. If your team would only act on a change that improves revenue by a meaningful amount, your MDE should reflect that threshold. Another mistake is using an outdated baseline conversion rate. If seasonality, traffic mix, or campaign activity has changed recently, your baseline may no longer be accurate. Teams also frequently underestimate how much uneven traffic allocation hurts efficiency. Unless there is a strong product reason to favor the control or treatment, a 50/50 split is usually best.
Stopping early is another classic problem. Even if your calculator says you need 30,000 visitors per variant, people often peek at results after a few thousand users and rush to judgment. This behavior inflates error rates and undermines the very planning work the sample size estimate was supposed to support. Use a fixed decision rule and commit to a stopping plan before launching.
How to choose a realistic MDE
The right minimum detectable effect is not purely a statistical choice. It is a business choice. Ask yourself what size of improvement would justify the cost of implementation, design time, engineering work, or opportunity cost. For a high-traffic landing page, even a 3% relative uplift may create substantial revenue. For a low-traffic niche flow, you may need to target larger wins to make experimentation practical.
A simple framework is to evaluate MDE using three lenses:
- Financial significance: Would this uplift materially change revenue, leads, retention, or margin?
- Operational feasibility: Can your traffic volume reach the required sample in a reasonable timeframe?
- Strategic relevance: Is the proposed improvement large enough to influence roadmap decisions?
If the sample size becomes too large, do not force the test anyway. Reframe the experiment. Try a stronger treatment, test a higher-volume audience, or focus on an earlier funnel step where larger effects are more plausible.
How long will my A/B test take?
After calculating sample size, most teams immediately want to translate that into runtime. The formula is straightforward: estimate daily eligible traffic and divide by the total sample size needed. If you need 60,000 visitors total and expect 3,000 eligible visitors per day, the test should take about 20 days. In reality, you should also consider weekday and weekend variation, campaign spikes, and implementation ramp time. It is usually better to let the test span full business cycles rather than stop at an arbitrary midpoint.
In many organizations, the calculator becomes a planning tool for the experimentation backlog. Low-effort, high-traffic tests with reasonable MDEs tend to be the fastest opportunities. High-effort tests on low-traffic pages may belong later in the roadmap unless the potential upside is exceptional.
Useful external references for statistical rigor
If you want to validate your experimentation methodology with authoritative educational sources, review the material from these organizations:
- National Institute of Standards and Technology (NIST) for statistical methods and measurement guidance.
- U.S. Census Bureau for accessible statistical concepts and variance-related resources.
- Penn State Online Statistics Education for university-level explanations of hypothesis testing and sample size planning.
Final takeaway
An ab test sample calculator is not just a convenience tool. It is a safeguard against weak experiment design. By setting a baseline rate, choosing a practical minimum detectable effect, and selecting confidence and power levels that fit your risk tolerance, you can estimate the traffic needed before the test begins. That makes your analysis cleaner, your stakeholder communication stronger, and your product decisions more defensible. Use the calculator above as a planning checkpoint before every major A/B test, and your experimentation program will become more disciplined, efficient, and trustworthy over time.