The Datasaurus Dozen

blog.revolutionanalytics.com

There's a reason why data scientists spend so much time exploring data using graphics. Relying only on data summaries like means, variances, and correlations can be dangerous, because wildly different data sets can give similar results. This is a principle that has been demonstrated in statistics classes for decades with Anscombe's Quartet: four scatterplots which despite being qualitatively different all have the same mean and variance and the same correlation between them. (You can easily check this in R by loading the data with data(anscombe).) But what you might not realize is that it's possible to generate bivariate data with...

Read more...
Linkedin

Want to receive more content like this in your inbox?