Plummeting data acquisition costs have been a big part of the surge in business analytics. We have much richer samples of data to use for insight. But more data doesn’t inherently remove sampling bias; in fact, it may make it worse.
This is a really excellent article. Very readable, and makes several points that any modern data practitioner should always keep in mind. Here's another great quote:
Historically, variation within a sample has been used to infer sampling error. With increased data volumes, statistical significance is trivial to find but distracts from the larger point of sampling error; the sample may be internally consistent but not reflect the desired population. Data volume then gives false comfort.
Sound familiar?Read more...