What to Watch for when Moving from R to SparkR

veekaybee.github.io

Data Science Roundup reader Vicki Boykis has a great new post on the idiosyncrasies of SparkR's dataframe. As a data scientist using R, SparkR is an incredibly powerful tool to extend your existing skillset into the world of parallelized computing, but it's important to understand what's going on under the hood. Vicki's article does a great job of showing exactly that.

Also, I'm embarrassed by just how much I enjoyed this joke from the article:

Some people, when confronted with a problem, think “I know, I’ll use multithreading”. Nothhw tpe yawrve o oblems.

It's a good point: use Spark only when you have to.

Read more...
Linkedin

Want to receive more content like this in your inbox?