Two-Active-Treatment (Bivariate A/B) Controlled Experiment
A Two-Active-Treatment (Bivariate A/B) Controlled Experiment is a active treatment-controlled experiment with two active treatments.
- Context:
- It can (typically) be created by an A/B Test Creation Task.
- It can (typically) include a Bivariate Controlled Experiment Evaluation Task.
- It can be supported by a Two-Active-Treatment Controlled Experiment Testing System.
- It can range from being an A/B Test to being an A/A Test.
- It can range from being a Randomized Two Active-Treatment Controlled Experiment to being a Non-Randomized Two Active-Treatment Controlled Experiment.
- …
- Example(s):
- Counter-Example(s):
- See: Independent Two-Sample t-Test, Contingency Table, Randomized Controlled Experiment, Statistical Hypothesis Testing, Web Design Optimization, User Experience Design, Evidence Based Practice.
References
2020
- (Optimizely, 2020) ⇒ https://www.optimizely.com/optimization-glossary/multivariate-test-vs-ab-test/
- QUOTE: ... What is the difference between A/B testing and multivariate testing? Let's take a look at the methodology, common uses, advantages, and limitations of these testing methods.
- A/B testing, which you may also have heard referred to as split testing, is a method of website optimization in which the conversion rates of two versions of a page — version A and version B — are compared to one another using live traffic. Site visitors are bucketed into one version or the other. ...
...
- Multivariate testing uses the same core mechanism as A/B testing, but compares a higher number of variables, and reveals more information about how these variables interact with one another. As in an A/B test, traffic to a page is split between different versions of the design. The purpose of a multivariate test, then, is to measure the effectiveness each design combination has on the ultimate goal. ...
...
- A/B testing, which you may also have heard referred to as split testing, is a method of website optimization in which the conversion rates of two versions of a page — version A and version B — are compared to one another using live traffic. Site visitors are bucketed into one version or the other. ...
- QUOTE: ... What is the difference between A/B testing and multivariate testing? Let's take a look at the methodology, common uses, advantages, and limitations of these testing methods.
2018
- (MLG TensorFlow, 2018) ⇒ (2008). "A/B testing". In: Machine Learning Glossary (TensorFlow) Retrieved 2018-04-22.
- QUOTE: A statistical way of comparing two (or more) techniques, typically an incumbent against a new rival. A/B testing aims to determine not only which technique performs better but also to understand whether the difference is statistically significant. A/B testing usually considers only two techniques using one measurement, but it can be applied to any finite number of techniques and measures.
2017
- (Kohavi & Longbotham, 2017) ⇒ Kohavi R., Longbotham R. (2017) Online Controlled Experiments and A/B Testing. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA
- ABSTRACT: The Internet connectivity of client software (e.g., apps running on phones and PCs), websites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the scientific method to form a hypothesis of the form “If a specific change is introduced, will it improve key metrics?” and evaluate it with real users.
The theory of a controlled experiment dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, and the topic of offline experiments is well developed in Statistics (Box et al., Statistics for experimenters: design, innovation, and discovery. Wiley, Hoboken, 2005). Online-controlled experiments started to be used in the late 1990s with the growth of the Internet. Today, many large sites, including Amazon, Bing, Facebook, Google, LinkedIn, and Yahoo!, run thousands to tens of thousands of experiments each year testing user interface (UI) changes, enhancements to algorithms (search, ads, personalization, recommendation, etc.), changes to apps, content management system, etc. Online-controlled experiments are now considered an indispensable tool, and their use is growing for startups and smaller websites. Controlled experiments are especially useful in combination with Agile software development (Martin, Clean code: a handbook of Agile software craftsmanship. Prentice Hall, Upper Saddle River, 2008; Rubin, Essential scrum: a practical guide to the most popular Agile process. Addison-Wesley Professional, Upper Saddle River, 2012), Steve Blank’s Customer Development process (Blank, The four steps to the epiphany: successful strategies for products that win. Cafepress.com., 2005), and MVPs (minimum viable products) popularized by Eric Ries’s Lean Startup (Ries, The lean startup: how today’s entrepreneurs use continuous innovation to create radically successful businesses. Crown Business, New York, 2011).
- ABSTRACT: The Internet connectivity of client software (e.g., apps running on phones and PCs), websites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the scientific method to form a hypothesis of the form “If a specific change is introduced, will it improve key metrics?” and evaluate it with real users.
2016a
- (Wikipedia, 2016) ⇒ https://en.wikipedia.org/wiki/A/B_testing Retrieved:2016-9-14.
- In marketing and business intelligence, A/B testing is a term for a randomized experiment with two variants, A and B, which are the control and variation in the controlled experiment. A/B testing is a form of statistical hypothesis testing with two variants leading to the technical term, two-sample hypothesis testing, used in the field of statistics. Other terms used for this method include bucket tests and split-run testing. These terms can have a wider applicability to more than two variants, but the term A/B testing is also frequently used in the context of testing more than two variants. In online settings, such as web design (especially user experience design), the goal of A/B testing is to identify changes to web pages that increase or maximize an outcome of interest (e.g., click-through rate for a banner advertisement). Formally the current web page is associated with the null hypothesis. A/B testing is a way to compare two versions of a single variable typically by testing a subject's response to variable A against variable B, and determining which of the two variables is more effective. As the name implies, two versions (A and B) are compared, which are identical except for one variation that might affect a user's behavior. Version A might be the currently used version (control), while version B is modified in some respect (treatment). For instance, on an e-commerce website the purchase funnel is typically a good candidate for A/B testing, as even marginal improvements in drop-off rates can represent a significant gain in sales. Significant improvements can sometimes be seen through testing elements like copy text, layouts, images and colors, but not always. The vastly larger group of statistics broadly referred to as multivariate testing or multinomial testing is similar to A/B testing, but may test more than two different versions at the same time and/or has more controls, etc. Simple A/B tests are not valid for observational, quasi-experimental or other non-experimental situations, as is common with survey data, offline data, and other, more complex phenomena. A/B testing has been marketed by some as a change in philosophy and business strategy in certain niches, though the approach is identical to a between-subjects design, which is commonly used in a variety of research traditions. A/B testing as a philosophy of web development brings the field into line with a broader movement toward evidence-based practice. The benefits of A/B testing are considered to be that it can be performed continuously on almost anything, especially since most marketing automation software now, typically, comes with the ability to run A/B tests on an on-going basis. This allows for updating websites and other tools, using current resources, to keep up with changing trends.
2016b
- (Wikipedia, 2016) ⇒ https://en.wikipedia.org/wiki/A/B_testing#Common_test_statistics Retrieved:2016-9-14.
- "Two-sample hypothesis tests" are appropriate for comparing the two samples where the samples are divided by the two control cases in the experiment. Z-tests are appropriate for comparing means under stringent conditions regarding normality and a known standard deviation. Student's t-tests are appropriate for comparing means under relaxed conditions when less is assumed. Welch's t test assumes the least and is therefore the most commonly used test in a two-sample hypothesis test where the mean of a metric is to be optimized. While the mean of the variable to be optimized is the most common choice of estimator, others are regularly used.
For a comparison of two binomial distributions such as a click-through rate one would use Fisher's exact test.
- "Two-sample hypothesis tests" are appropriate for comparing the two samples where the samples are divided by the two control cases in the experiment. Z-tests are appropriate for comparing means under stringent conditions regarding normality and a known standard deviation. Student's t-tests are appropriate for comparing means under relaxed conditions when less is assumed. Welch's t test assumes the least and is therefore the most commonly used test in a two-sample hypothesis test where the mean of a metric is to be optimized. While the mean of the variable to be optimized is the most common choice of estimator, others are regularly used.
Assumed Distribution | Example Case | Standard Test | Python Implementation |
---|---|---|---|
Gaussian | Average Revenue Per Paying User | Welch's t test | scipy.stats.ttest_ind |
Binomial | Click Through Rate | Fisher's exact test | scipy.stats.fisher_exact |
Poisson | Average Transactions Per Paying User | E-test | None |
Multinomial | Number of each product Purchased | Chi-squared test | scipy.stats.chisquare |
Unknown | -- | Mann–Whitney U test | scipy.stats.mannwhitneyu |
2006
- (Maurin, 2006) ⇒ Michel Maurin. (2006). “An Original Comfort/Discomfort Quantification in a Bivariate Controlled Experiment: Application to the Discomfort Evaluation of Seated Arm Reach.” In: Proceedings of the SA-DHM Congress.