😎 Summer of Data Science (#SoDS18
)
is upon us, and (thanks to Renée (aka
@BecomingDataSci)) there is already
some great guidance out there. I’m a big fan of “mini”
projects — and, unlike Highlander, I’m pretty sure there can only be one
isn’t the name of the SoDS
game.
So, I wanted to share some ideas I have for potential micro-projects…
🙌 do something by hand
I know, I know — this borders on heresy. As my Archer visualization partner-in-crime, Elijah Meeks, put it:1
Whenever you tell someone you painstakingly annotated something by hand they grimace and get uncomfortable like you told them you enjoy thrash metal.
But, according to expert educators (Albert Y. Kim and Chester Ismay, to be specific), there’s still a lot of value in taking ye olde approach to things. 2
Intro stats & data science #chalktalk of grammar of graphics + homage to @katyperry today, #ggplot2 tomorrow #rstats pic.twitter.com/1CQksTGqeM
— Albert Y. Kim (@rudeboybert) September 11, 2017
📦 Compare/recommend packages
First off, I want to acknowledge that there have been some great
algorithmic/technical approaches to this, and there are projects under way
(🐬 give flipper
a look when you have a chance).3 In fact, I recommend you make use of these
approaches should you give it a go (detailed nicely in packagemetrics
- Helping you choose a package since runconf17)
— but give it that certain human je ne sais quoi. 💅
This doesn’t have to be exhaustive! I really enjoyed two posts by Adam Medcalf, “My favourite R package for: summarising data” and “R packages for summarising data – part 2”.
⚡ tour of options. "My favourite R 📦 for: summarising data" by @adam_medcalf https://t.co/pQhKWCuAFx ht @rweekly_org #rstats pic.twitter.com/E0IFBNGqnp
— Mara Averick (@dataandme) January 8, 2018
👍 series on useful 📦s!
— Mara Averick (@dataandme) March 7, 2018
"R packages for summarising data – pt 2" by @adam_medcalf https://t.co/cv2N0UuMmt #rstats #ggalley pic.twitter.com/nWg7dvp1fQ
This can also be great info to add to a package README or vignette. For example, Jenny Bryan (readxl’s maintainer, and all- around awesome human) discusses similar packages in the readxl README. This is a win-win, since she’ll point users who file issues to a different Excel-related package when it’s appropriate — as is often the case when it comes to tidyxl’s specialty of handling awkward, non-tabular Excel files.4
There’s no need to leave this up to the maintainers, though. If you go through a few packages while trying to accomplish a task, you are in a great position to describe what it was about them that led to your choice!
Recommending packages can also be of great help to others. Check out Sharon Machlis’ posts for some inspiration in that department.
🎊 Just a friendly reminder that @sharon000 has *super* helpful #rstats guides… https://t.co/NffvvKV0Pm #r4ds pic.twitter.com/vGWy2AMLHI
— Mara Averick (@dataandme) June 1, 2018
👼 Bring a dataset to life
🎴 How many times have you used the iris
dataset?
It is always good to know what we are actually talking about.#Statistics #RStats #iris pic.twitter.com/OKLa5hZkQC
— Antoine Bichat (@_abichat) May 24, 2018
🚗 What about mtcars
?
🏎 ⚡️ “mtcars” (for curious #rstats nerds)https://t.co/euDY7XjzK0
— Mara Averick (@dataandme) May 30, 2018
I can’t speak for Antoine Bichat’s experience
with iris
, but hunting down and sharing pics of the frequently-plotted
’74 vehicles was a pretty eye-opening experience. Among other things, thanks
to the keen eye of Nathanael Aff
we found out that the Mazda RX4 and RX4 Wagons have rotary engines. Even
if you allow for the cylinder-to-rotor conversion (which is a bit of a stretch),
it’s like comparing apples to oranges (or doritos to a water pump).
Update: Thanks to Ben Bolker, I can rest knowing that the source paper from which mtcars is taken acknowledges this unsettling error.
from Henderson and Velleman 1981, Table 1, footnote: pic.twitter.com/WNkPCPDlGW
— Ben Bolker (@bolkerb) August 25, 2018
👨🎤 And more…
Let the spirit move you! Share your ideas with others (including me, naturally),
and make it an #SoDS18
to remember.
👀 you should read the whole piece, Visualizing Archer: Data visualization to further your enjoyment of narrative, because it’s great…and I’m totally not biased at all.↩
Check out the slides from Albert Kim’s talk from Data Day Texas 2018, “Something old, something new, something borrowed, something blue Ways to teach data science (and learn it too!)”.↩
packagemetrics and its related issues from the rOpenSci 2017 unconf in will give you a better sense of this problem than I could ever hope to!↩
These include: openxlsx, writexl, the C-library libxlsxwriter, and tidyxl.↩
In fact, there was a whole session about this at useR! 2017, which you can learn more about from Julia Silge’s posts, “How do you discover R packages?”, and “Seeking guidance in choosing and evaluating R packages”.↩