The Subtle Sources of Sampling Bias Hiding in Your Data

sloanreview.mit.edu

Plummeting data acquisition costs have been a big part of the surge in business analytics. We have much richer samples of data to use for insight. But more data doesn’t inherently remove sampling bias; in fact, it may make it worse.

This is a really excellent article. Very readable, and makes several points that any modern data practitioner should always keep in mind. Here's another great quote:

Historically, variation within a sample has been used to infer sampling error. With increased data volumes, statistical significance is trivial to find but distracts from the larger point of sampling error; the sample may be internally consistent but not reflect the desired population. Data volume then gives false comfort.

Sound familiar?

Read more...
Linkedin

Want to receive more content like this in your inbox?