Let’s stop doubly-screwing data science learners

I frequently see tweets that highlight the fact that people learning coding are not taught in depth about fundamental tools or processes like using a linter, or debugging. For example, this blog post from Greg Wilson.

It’s possible that in Data Science land we are doubly-screwing over learners by not only not teaching them fundamental coding knowledge, but also not teaching analogous things in our own domain.

I know of one particularly progressive course in Business Analytics at Monash University that teaches RMardkown for writing analytical documents, and even touches on Shiny for interactive apps. Students are rightfully being taught how to put together a polished looking piece of data driven communication as core coursework.

To me this a really insightful move, because out here in the trenches I have seen and felt the pain first hand of people who think they are on to a winning idea, but can’t make it connect, due to inability to communicate it in a convincing way. My sense is that Monash’s approach is the exception rather than the rule, and that is doing students a disservice.

The big one, the one that I think DS educators are totally sleeping on, is building project pipelines. By that I mean the craft of building out scalable software machines that ingest data from various sources and transmute it into various outputs, probably involving aforementioned presentation layer technology for the final leg.

Tools in this space are becoming mature and ubiquitous. It seems that every big data driven tech company has had to build one, and a few have open sourced them. Examples: Airbnb and Airflow, Spotify and Luigi, Netflix and Metaflow. In the R world we have been very fortunate to have the rOpensci peer-reviewed option in {drake}, and soon we’ll have another peer-reviewed option in {targets}.

I have written at some length about {drake}, and how its benefits can be felt all the way down to small projects. Recently a colleague of mine who is studying told his lecturer and tutors about our {drake} workflow, and was invited to teach his class about it. At least some of his peers, data science students, are now using it for their assignments and raving about it.

This confirms to me that pipeline tools, and the principles that underpin them are ready to be incorporated into the canon of core Data Science knowledge. I really hope I hear of more institutions following Monash’s lead, and teaching students modern tools, arising from the data science domain, that can set them up for success in industry.


Keyboards vs. developer skill and the virtuos loop of productive developers

A bit of nonsense in the Twitterverse this week about developer seniority and usage of the mouse.

I see this as recurrence of the long running thread that rears up now and again about how ‘real’ developers use keyboard-driven editors like Emacs or Vim.

Some thoughts:

There could be a loose correlation between seniority and keyboard driven editors due to:

  • Age. These are old tools, and the people who started out when they were cutting edge are now old, and yes senior developers.
  • Injuries. Ergonomically, a mouse and standard size keyboard just don’t work long term for a segment of the population. Ergonomic keyboards, and keyboard mappings in keyboard-driven editors are a common solution to this. But you have to be at a mouse and keyboard for a fair amount of time for this to become a pain issue - skill accumulated over that time again probably leads to a loose correlation with developer seniority.

So I think some people might be observing a signal that is real (if weak), but surprise surprise getting themselves snagged in the correlation-causation-conundrum.

I have my own theories about better markers for productive programmers. I think after you gain enough programming skill you reach an inflection point where that skill can be brought to bear not just on the problems you have, but on your processes for solving them. You can write code to make yourself more efficient at writing code. You craft your own tools to fit your own niche problems.

There are examples of people who are known to be highly productive doing this everywhere. In the R world think about how {knitr}, {devtools}, {usethis}, {reprex} and their like came to be. They’re programming/CLI tools intended to supplement the capabilities of a GUI in a composite interface to the niche problems of building documents, packages, projects, and examples.

An interesting thing often happens where these things start out as command line things, and become so important to a workflow that they graduate to a keybinding or a GUI button. And so here I think we encounter another loose correlation between preference for keyboard-driven and seniority:

If you’re in the business of crafting the interface to your workflow, keybindings or buttons allow you to reduce the friction of that interface and make it ‘feel’ nicer to use. I guess it’s like the digital equivalent of a wall-mounted pegboard for tools. Having all these for-purpose tools right at your fingertips, you can reach for without thinking, helps you focus on what’s on the bench.

You could array your tools with buttons or menus to be moused-on, but keybindings give you a bit more ‘space’ to work with before things get unweildly - you run out of pixels fast! So there’s a practicality aspect that could be a driver for keybindings and editors that make keybindings easy to execute.

But it’s not creation of buttons or keybindings that is important. What exactly is a ‘low friction’ inteface will vary by person, and is relative to the friction of the task being interfaced with. In fact if you have powerful commands, a sharp memory, and are a fast typist, maybe a CLI already feels friction free.

The important thing - the productity multiplier - is using your skills to shape your tools and the environment that you work in, which in-turn makes your skills more effective. It’s an extremely virtuos loop, and I think possibly what people are really aspiring to, rather than say mastery of the keyboard or a keyboard-driven editor like Vim or Emacs.

Commands, buttons, bindings, foot pedals, voice commands, gesture controls… these are all just implementation options for interfaces created by that virtuos loop.

How I got working on my rmarkdown distill blog

What is

I frist saw these on Nick Tierney’s site.

The premise: What if your blog comment threads were just GitHub issue threads on your blog source repo? What if they were syncronised between Github and the footer of the your blog posts? Neat idea hey? You keep control of your data (relatively speaking) and there’s one less site to data mine and track your readers - yes this is a thing in Disqus. Booouuurrrnnns.

How did I get it working?

I didn’t. I fought and fought for some hours with the grid CSS in the Distill template and no matter what I tried my comments iframe always had 0 height. Then I had a big whinge to my colleague Anthony North who defied the grid using a html include with a javascipt payload that injects the Utterances iframe into the end of the article. Very hacky and very cool.

Here’s the HTML file which shouldn’t be too difficult to adapt to your own distill site if you have one.

You need to refer to it in your _site.yaml like:

    css: mmstyle.css
      in_header: utterances.html

Recover is the apex R debugging method.

I just debugged a ‘non-numeric argument error’ being thrown by this beastie in under 5 minutes with #rstats’ options(error = recover).

A strength of recover over other methods for stuff like this is that all 4(!) loop indices will be set to the values they were on failure. Chef’s kiss

IMHO, contrary to Jenny Bryan’s ranking in her incredible object of type closure is not subsettable talk, this makes recover the apex R debugging method.

Error turned about to be from line 12 due to sticky sf geometry btw. 🥳

edit: I accidentally wrote recover = TRUE when I first posted this, a typo I often make due to wishful thinking perhaps.

Spacemacs for the rest of us

Despite the joy it brings me I’ve always balked at recommending Spacemacs + ESS as a dev environment for #rstats due to the brutal learning curve. However yesterday, thanks to Jack of Some’s Youtube channel, I discovered there is a reasonably faithful port of Spacemacs to VSCode. It’s called VSpaceCode and it’s completely compatible with the R extension!

I gave it a blast today and work and it screams on Windows compared to the Emacs version. The responsiveness just can’t be un-felt, and will definitely be addictive. There’s even a built in port of magit that was again super slick compared to the slowness of the Emacs native version.

Interestingly, I have noticed the situation is reversed on my personal laptop running linux, where there seem to be a few performance glitches in VSCode. There’s still no one editor to rule them all it seems - But it feels like a similar enough experience that I could be productive using the faster one on each platform. We’ll see as I rack up more time with VSpaceCode.