What is A/B Testing and what Statistical considerations should be top-of-mind?

5 min readMar 14, 2021

Ever thought about if changing the text, color, or orientation of a CTA button on a particular page of your website will help improve the experience of your users? Did it ever occur to you that that small change can also perhaps increase your conversions for good?

It must have crossed your mind.

It’s one thing to think it. It’s another to actually put that thought through a scientific process to determine if it is worth implementing.

This is what A/B testing is about.

Simply, you’re showing two variants (backed by sound hypothesis) of the same landing page to different segments of your audience to see which one drives them to take the desired action better and faster.

A/B tests, when done right, essentially help you make better and trustworthy decisions on your website that will improve conversions. Conversions can be clicks, leads, transactions, downloads, etc.

This week, I spent some time completing Ton Wesseling’s A/B testing mastery course and Ben Labay’s Statistics fundamentals course as part of my Growth marketing Minidegree with CXL.

Both courses went into so much detail. I’ve summarized some below.

A little Background

Ton mentioned that before 2016, the tools used in A/B testing were largely enterprise software applications that were pretty expensive and used cookies. Marketers made a lot of mistakes with trial and error.

2016 heralded the maturity for A/B testing in the industry. The quality of tools for A/B testing became way better and personalization, segmentation, and AI came to the marketplace to supplement that.

More marketers employed an agile-approach to A/B testing and dumped the one big launch approach which was failure-prone.

What is an A/B test used for?

Summarily, A/B tests are used in 3 buckets:

(Re)Deployments: As a marketer, if you’re deploying something on your website — feature, update, etc. you want to know if what you’re deploying has no negative impact on your KPIs.
Research: Here, you’re simply doing conversion signal mapping i.e., you are running experiments by leaving out specific elements on a page e.g., removing a text. The purpose isn’t to look for any winner but rather to find out if there’s any impact at all from leaving out an element.
Optimization: This is also called lean deployment. Here, you’re comparing two “challengers” and looking for wins.

Do you have enough data to conduct A/B tests?

I use to think that I can run A/B tests with any sample size. I was wrong. Ton explained this concept using the ROAR model.

The model comprises 4 optimization phases:

Risk — 1000 conversions per month
Optimization — 10000 conversions per month
Acquisition
Rethink

A/B Test isn’t for everyone

If you’re getting below 1000 conversions per month, you‘re in the risk phase and may want to de-emphasize A/B tests. It’s difficult to find a winner between the tests because you’ll be working with small data with lower significance.

Essentially, the second test needs to beat the control by 15%, which is quite difficult.

Even if you end up running A/B tests and see a winner, chances are that it isn’t a real winner. If your data isn’t significant, your results may end up being skewed when you now have significant data. Rather, you can do research, come up with hypotheses and experiments, implement them and tweak as required.

At 10,000 conversions per month, you’re at the optimization stage and can start 4 A/B tests per week and run over 200 tests in a year. At 10,000, the second test has to beat the other test with at least 5%.

Key

Statistical Power: Test with high power (>80%) so that you don’t have false negatives.

Statistical Significance: Test with a high significance level (90%) so that you don’t have false positives.

What KPIs should you be on top of?

Ton lists 5 relevant KPIs:

Clicks: The least important KPI is clicks
Behavior: This is an important KPI particularly for brands at the risk stage of the optimization model
Transactions for B2C | Leads for B2B: Marketers should be optimizing largely for this KPI.
Revenue per user
Potential lifetime value

The research methodology for A/B tests

Before doing any A/B tests, you need to do some soul searching. You need to understand your basic user behavior and customer journey and extract inputs for establishing your hypothesis.

Ton introduces us to the 6V conversion canvas for conducting research:

Value: what company values are important and relevant? What are the short and long-term goals? What focus delivers the most business impact?
Versus: Who are the competitors? With which competitor does an audience overlap exist? Experience your competitors' products and listen in to their activity online.
View: what insights can be found from web analytics and web behavior data?
Voice: What insights can be taken from the voice of customer data such as surveys, feedback, and service contact?
Verified: what scientific research, insights, and models are available?
Validated: What insights are validated in previous experiments or analyses?

After research, what next?

It’s time to get into the meat. You’ll get into designing and developing your test, configuring tools required, determining the duration, and monitoring. Subsequently, you’ll need to compile the outcome into a digestible format and present it to management. The insights will help form the basis of decision-making.

Statistics fundamentals for testing

If you don’t know basic statistics, you can’t tell if your test results are statistically significant or have the right power. Hence, you can’t evaluate your A/B tests properly.

Ben Labay’s statistics fundamentals for testing course is quite comprehensive and examines concepts digital marketers should be familiar with.

Just like in economics, the definition of concepts like sampling (population, parameters, and statistics), mean, variance, and confidence intervals are not far apart. For A/B tests, however, I believe the most critical part is the understanding of statistical significance (p-value), statistical power, and statistical size.

How do you calculate the minimum sample size you’ll require for a test, the sampling error, and the regression? This course answers all of these and even highlights 4 traps to avoid. It was a hell of a ride.

You may need to see the videos multiple times to internalize the concepts though.

Key take-aways

2 things. First, A/B tests aren’t for everybody. Second, A/B test isn’t child’s play — there’s a process. If you’re a small business, you shouldn’t be thinking about A/B tests as you may not have significant data to get the maximum value from the process. Another reason is cost. You’ll be expending a huge sum to access third-party tools that can improve the quality of your final result.

What you can do, however, is come up with your hypothesis, create experiments that can help approve or disprove them, then jump on to implementing singly over a specified period and measurement.

For businesses that possess significant data, the focus should be on ensuring that they actually incorporate the insights from the outcome so the whole process isn’t a total waste of time, effort, and resources.