Hey friends I’ve been off Twitter now for 34 days. I feel great. More focussed, greater attention span, happier.

These are syndicated posts coming from my micro.blog account.

I am tracking some #rstats Twitter stuff via RSS. But I’m not checking DMs. I’m open to chat via email or Slack spaces we share.

Withholding my CRAN submission #rstats

I spent the last few nights polishing up a new submission for CRAN. I had planned to submit today. However I learned someone I greatly respect, whom I know to be almost certainly the most responsive and generous package maintainer in the #rstats community, has become the latest victim of CRAN irrationality and toxicity. I am sure he didn’t deserve to have his weekend ruined because one seemingly rogue administrator can elect to punish people without any accountabilty.

And what about the bystanders who are going to attend work tomorrow and find their builds are no longer reproducible, because a keystone package was archived? Do they deserve that punishment too?

I am withholding my submission for now. I am not sure what to do. I don’t want to enable this behaviour, but I also want to make a tool I am enjoying as accessible as possible. A lot of thoughts are swirling about this. There’s more to write with a cooler head.

Recently I mapped out our #rstats centric public service data science stack for runapp. Here’s some words on the collaboration blob: www.milesmcbain.com/posts/the…

the_stack.png

Talking about this to my R-nerd friends in the R Users Network: Australian Public Policy (Runapp) this week. runapp-aus.github.io

A bunch of software logos with a large lens flare exploding from R

#rstats work project dependency usage over the last 2 and a bit years

Unsung dev heroes miss out like: styler, mapview, lintr, languageserver, mapedit et. al.

A bar plot of R package use frequency by year

This moment had me reflecting on the etymology of the word “parking’.

This $3k rig is standing in for a second car for us. Day care and shopping runs are no sweat. And the kids love it!

#Brisbane PSA: the best day to hit your local supermarket is the day after lockdown, once all the preppers have stopped thronging in the aisles.

  • yours truly, common sense

Approximating #rstats RStudio’s F2 shortcut in VSCode

I made an approximate equivilant to RStudio’s default F2 shortcut for VSCode. In RStudio this key opens a function definiton in a new editor tab.

The JSON from my settings.json:

{
    "key": "b",
    "name": "browse function source in new window",
    "type": "command",
    "command": "r.runCommandWithSelectionOrWord",
    "args": "rstudioapi::documentNew(paste0(as.character(styler::style_text(deparse($$))), collapse = '\\n'))"
}

I use a shorcut sequence , c b with the VSCode whichkey extension so your setup will probably look a bit different for "key".

A major drawback of this approach is that since it’s not a saved file, the language mode is not automatically detected, so I have to set the language mode to R to see syntax highlighting etc.

You could also make it show up slightly faster by avoiding styling the code, but I find this is a vast improvement over the default styling.

Magpies have started sneaking in the back door to steal the kids scraps from under the table and TBH I’m not even mad.

Let’s stop doubly-screwing data science learners

I frequently see tweets that highlight the fact that people learning coding are not taught in depth about fundamental tools or processes like using a linter, or debugging. For example, this blog post from Greg Wilson.

It’s possible that in Data Science land we are doubly-screwing over learners by not only not teaching them fundamental coding knowledge, but also not teaching analogous things in our own domain.

I know of one particularly progressive course in Business Analytics at Monash University that teaches RMardkown for writing analytical documents, and even touches on Shiny for interactive apps. Students are rightfully being taught how to put together a polished looking piece of data driven communication as core coursework.

To me this a really insightful move, because out here in the trenches I have seen and felt the pain first hand of people who think they are on to a winning idea, but can’t make it connect, due to inability to communicate it in a convincing way. My sense is that Monash’s approach is the exception rather than the rule, and that is doing students a disservice.

The big one, the one that I think DS educators are totally sleeping on, is building project pipelines. By that I mean the craft of building out scalable software machines that ingest data from various sources and transmute it into various outputs, probably involving aforementioned presentation layer technology for the final leg.

Tools in this space are becoming mature and ubiquitous. It seems that every big data driven tech company has had to build one, and a few have open sourced them. Examples: Airbnb and Airflow, Spotify and Luigi, Netflix and Metaflow. In the R world we have been very fortunate to have the rOpensci peer-reviewed option in {drake}, and soon we’ll have another peer-reviewed option in {targets}.

I have written at some length about {drake}, and how its benefits can be felt all the way down to small projects. Recently a colleague of mine who is studying told his lecturer and tutors about our {drake} workflow, and was invited to teach his class about it. At least some of his peers, data science students, are now using it for their assignments and raving about it.

This confirms to me that pipeline tools, and the principles that underpin them are ready to be incorporated into the canon of core Data Science knowledge. I really hope I hear of more institutions following Monash’s lead, and teaching students modern tools, arising from the data science domain, that can set them up for success in industry.