Optimizing sample sizes in A/B testing


If you’re a data scientist, you’ve surely encountered the question, “How big should this A/B test be?” The standard answer is to do a power analysis, typically aiming for 80% power at α=5%. But if you think about it, this advice is pretty weird. Why is 80% power the best choice for your business? And doesn’t a 5% significance cutoff seem pretty arbitrary?

I like this post a lot—80% power and 5% significance were originally selected primarily for academic work, and as statisticians move into business domains they generally don't think a lot about these baseline numbers. But power and significance are statements about priorities. How much does time matter? How much does certainty matter? It turns out that the answers to these questions are quite different to organizations operating in different contexts.

Really fantastic, thoughtful post.


Want to receive more content like this in your inbox?