🔬 A/B Test Statistical Significance Calculator

Enter your test results to check if your variant's performance difference is statistically significant.

🅐 Control (Original)

🅑 Variant (Challenger)

🔑 Next step: Your winner is clear — now calculate your true return on spend. ROAS & CPA Calculator →

How to Read Your A/B Test Results

This calculator uses a two-tailed Z-test for proportions — the industry standard for comparing conversion rates in marketing split tests. Unlike a one-tailed test, the two-tailed approach detects both positive and negative effects, protecting you from implementing variants that accidentally hurt performance.

Understanding the Key Metrics

  • Conversion Rate (CVR): The percentage of visitors who completed the desired action. CVR = Conversions ÷ Visitors × 100.
  • Relative Lift: How much better (or worse) the variant performs compared to the control, expressed as a percentage. A +12% relative lift on a 2% baseline CVR means the variant CVR is 2.24%.
  • Z-Score: Measures how many standard deviations the observed difference is from zero (no difference). A z-score above the critical threshold confirms the difference is not random.
  • P-Value: The probability that the observed difference occurred by chance. At 95% confidence, you need p-value ≤ 0.05. At 99%, you need p-value ≤ 0.01.
  • Statistical Significance: Reached when p-value ≤ (1 - confidence level). At 95% confidence: p-value must be ≤ 0.05.
Advertisement · 336×280

Common A/B Testing Mistakes to Avoid

  • Peeking bias: Checking results daily and stopping the test as soon as significance is reached inflates the false positive rate. Decide your required sample size before starting and commit to it.
  • Multiple testing: Running many variants simultaneously (A vs B vs C vs D) without adjusting the confidence threshold inflates the probability of finding a false winner. Use Bonferroni correction or limit to 2 variants at a time.
  • Low traffic tests: Testing on very low traffic sites produces unreliable results. A 3% vs 4% CVR difference requires roughly 10,000 visitors per variant to reach 95% confidence.
  • Ignoring practical significance: A variant that is statistically significant but only 0.1% better in absolute CVR may not be worth implementing if it requires months of development. Always consider the business impact alongside statistical significance.

Frequently Asked Questions

What does 'statistically significant' mean?
Statistical significance means the observed difference in conversion rates is unlikely to have occurred by random chance. At 95% confidence, there is only a 5% probability the difference was due to chance — the industry standard before implementing a variant as the winner.
How many visitors do I need for a valid A/B test?
A minimum of 1,000 visitors per variant is a rule of thumb, but the actual number depends on your baseline CVR and minimum detectable effect. Low baseline CVR and small expected lifts require much larger sample sizes.
What's the difference between 90%, 95%, and 99% confidence?
90% = 10% false positive rate. 95% = 5% (industry standard). 99% = 1% (high-stakes changes). Higher confidence requires more traffic to reach the same conclusion.
What if my test is NOT statistically significant?
Do NOT declare a winner. Continue running until you reach your required confidence level or planned sample size. Stopping early based on raw CVR differences is "peeking bias" and inflates false positives.
Is this a one-tailed or two-tailed test?
Two-tailed — the safer default. It detects whether the variant is better OR worse, protecting you from implementing variants that accidentally hurt performance.