The Subtle Sources of Sampling Bias Hiding in Your Data

Plummeting data acquisition costs have been a big part of the surge in business analytics. We have much richer samples of data to use for insight. But more data doesn’t inherently remove sampling bias; in fact, it may make it worse.

This is a really excellent article. Very readable, and makes several points that any modern data practitioner should always keep in mind. Here's another great quote:

Historically, variation within a sample has been used to infer sampling error. With increased data volumes, statistical significance is trivial to find but distracts from the larger point of sampling error; the sample may be internally consistent but not reflect the desired population. Data volume then gives false comfort.

Sound familiar?

