What does 'statistically significant' mean in an A/B test?

Statistical significance means the observed difference in conversion rates between your control and variant is unlikely to have occurred by random chance. At 95% confidence (the industry standard), significance means there is only a 5% probability that the difference was due to chance. Most marketing teams require 95% confidence before implementing a variant as the winner.

How many visitors do I need before my A/B test is valid?

As a rule of thumb, most A/B tests need a minimum of 1,000 visitors per variant (2,000 total) to achieve adequate statistical power. But the actual sample size depends on your baseline conversion rate and the minimum detectable effect (how small an improvement you want to detect). A 5% lift on a 2% baseline CVR requires far more traffic than detecting a 20% lift on a 10% baseline CVR.

What is the difference between 90%, 95%, and 99% confidence?

90% confidence means 10% probability the result is due to chance — acceptable for low-stakes tests. 95% confidence (5% false positive rate) is the industry standard for most marketing optimizations. 99% confidence (1% false positive rate) is used when the stakes are very high, such as changes to a payment flow or major homepage redesign. Higher confidence requires more traffic to reach the same conclusion.

What should I do if my test is NOT statistically significant?

Do NOT declare a winner based on raw conversion rate differences alone. Continue running the test until you reach your required confidence level or your planned sample size. Stopping early because the variant 'looks better' is a common mistake called peeking bias. If your test runs to the planned sample size and still isn't significant, declare it inconclusive and test a more radical variant.

What is a one-tailed vs two-tailed test?

A two-tailed test (used here) checks whether the variant is EITHER better OR worse than the control. This is the safer default for marketing A/B tests. A one-tailed test only checks one direction (variant is better). Using a one-tailed test can make results appear significant faster, but it misses cases where your variant is actually hurting performance — a costly mistake in conversion optimization.

A/B Test Calculator – Statistical Significance

How to Read Your A/B Test Results

This calculator uses a two-tailed Z-test for proportions — the industry standard for comparing conversion rates in marketing split tests. Unlike a one-tailed test, the two-tailed approach detects both positive and negative effects, protecting you from implementing variants that accidentally hurt performance.

Understanding the Key Metrics

Conversion Rate (CVR): The percentage of visitors who completed the desired action. CVR = Conversions ÷ Visitors × 100.
Relative Lift: How much better (or worse) the variant performs compared to the control, expressed as a percentage. A +12% relative lift on a 2% baseline CVR means the variant CVR is 2.24%.
Z-Score: Measures how many standard deviations the observed difference is from zero (no difference). A z-score above the critical threshold confirms the difference is not random.
P-Value: The probability that the observed difference occurred by chance. At 95% confidence, you need p-value ≤ 0.05. At 99%, you need p-value ≤ 0.01.
Statistical Significance: Reached when p-value ≤ (1 - confidence level). At 95% confidence: p-value must be ≤ 0.05.

Common A/B Testing Mistakes to Avoid

Peeking bias: Checking results daily and stopping the test as soon as significance is reached inflates the false positive rate. Decide your required sample size before starting and commit to it.
Multiple testing: Running many variants simultaneously (A vs B vs C vs D) without adjusting the confidence threshold inflates the probability of finding a false winner. Use Bonferroni correction or limit to 2 variants at a time.
Low traffic tests: Testing on very low traffic sites produces unreliable results. A 3% vs 4% CVR difference requires roughly 10,000 visitors per variant to reach 95% confidence.
Ignoring practical significance: A variant that is statistically significant but only 0.1% better in absolute CVR may not be worth implementing if it requires months of development. Always consider the business impact alongside statistical significance.

Cómo Interpretar los Resultados de Tu Prueba A/B

Esta calculadora utiliza un Z-test de dos colas para proporciones — el estándar de la industria para comparar tasas de conversión en pruebas divididas de marketing. A diferencia de una prueba de una cola, el enfoque de dos colas detecta tanto efectos positivos como negativos, protegiéndote de implementar variantes que accidentalmente perjudiquen el rendimiento.

Comprendiendo las Métricas Clave

Tasa de Conversión (CVR): El porcentaje de visitantes que completaron la acción deseada. CVR = Conversiones ÷ Visitantes × 100.
Lift Relativo: Cuánto mejor (o peor) rinde la variante en comparación con el control, expresado como porcentaje. Un lift relativo de +12% sobre un CVR base del 2% significa que el CVR de la variante es 2.24%.
Z-Score: Mide cuántas desviaciones estándar está la diferencia observada de cero (sin diferencia). Un z-score por encima del umbral crítico confirma que la diferencia no es aleatoria.
P-Valor: La probabilidad de que la diferencia observada haya ocurrido por azar. Con un 95% de confianza, necesitas un p-valor ≤ 0.05. Con el 99%, necesitas p-valor ≤ 0.01.
Significancia Estadística: Se alcanza cuando el p-valor ≤ (1 - nivel de confianza). Con 95% de confianza: el p-valor debe ser ≤ 0.05.

Errores Comunes en Pruebas A/B a Evitar

Sesgo de espiar (Peeking bias): Revisar los resultados diariamente y detener la prueba en cuanto se alcanza la significancia infla la tasa de falsos positivos. Define el tamaño de muestra requerido antes de comenzar y comprométete con él.
Pruebas múltiples: Ejecutar muchas variantes simultáneamente (A vs B vs C vs D) sin ajustar el umbral de confianza infla la probabilidad de encontrar un ganador falso. Usa la corrección de Bonferroni o limita a 2 variantes a la vez.
Pruebas con poco tráfico: Las pruebas en sitios con muy poco tráfico producen resultados poco confiables. Una diferencia de CVR del 3% vs 4% requiere aproximadamente 10,000 visitantes por variante para alcanzar el 95% de confianza.
Ignorar la significancia práctica: Una variante estadísticamente significativa pero solo 0.1% mejor en CVR absoluto puede no valer la pena implementar si requiere meses de desarrollo. Considera siempre el impacto empresarial junto con la significancia estadística.

🔬 A/B Test Statistical Significance Calculator

How to Read Your A/B Test Results

Understanding the Key Metrics

Common A/B Testing Mistakes to Avoid

Cómo Interpretar los Resultados de Tu Prueba A/B

Comprendiendo las Métricas Clave

Errores Comunes en Pruebas A/B a Evitar

Frequently Asked Questions

🔬 A/B Test Statistical Significance Calculator

How to Read Your A/B Test Results

Understanding the Key Metrics

Common A/B Testing Mistakes to Avoid

Cómo Interpretar los Resultados de Tu Prueba A/B

Comprendiendo las Métricas Clave

Errores Comunes en Pruebas A/B a Evitar

Frequently Asked Questions

💡 You Might Also Need