Another riff on the placeholder idea with |>

Another riff on the placeholder idea with |>:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
. <- function(.dat, template){
    template_code <- deparse(substitute(template)) 
    arg <- deparse(substitute(.dat))
    interpolated_code <- gsub("(?<=[(, ])?[.](?=[), \\[])", arg, template_code, perl = TRUE)
    eval(parse( text = interpolated_code))
}

"a" |>
 .(c(., "b")) |>
 .(setNames(., .))
#>   a   b 
#> "a" "b"

mtcars |> 
    transform(kmL = mpg / 2.35) |>
    .(lm(kmL ~ hp, data = .))
#> 
#> Call:
#> lm(formula = kmL ~ hp, data = transform(mtcars, kmL = mpg/2.35))
#> 
#> Coefficients:
#> (Intercept)           hp  
#>    12.80803     -0.02903

"col_name" |> 
  .(mutate(mtcars, . = "cool")) |>
  .(bind_cols(., .)) |>
  .(.[1, ])
#> New names:
#> * mpg -> mpg...1
#> * cyl -> cyl...2
#> * disp -> disp...3
#> * hp -> hp...4
#> * drat -> drat...5
#> * ...
#>           mpg...1 cyl...2 disp...3 hp...4 drat...5 wt...6 qsec...7 vs...8
#> Mazda RX4      21       6      160    110      3.9   2.62    16.46      0
#>           am...9 gear...10 carb...11 col_name...12 mpg...13 cyl...14 disp...15
#> Mazda RX4      1         4         4          cool       21        6       160
#>           hp...16 drat...17 wt...18 qsec...19 vs...20 am...21 gear...22
#> Mazda RX4     110       3.9    2.62     16.46       0       1         4
#>           carb...23 col_name...24
#> Mazda RX4         4          cool

Created on 2021-06-24 by the reprex package (v2.0.0)

I call . the ‘neutering’ function.

How you’d fix the #rstats dog’s balls pattern

The dog’s balls pattern is a thing. I didn’t name it.

This is the pattern:

mtcars |>
    transform(kmL = mpg / 2.35) |>
    ( \(df)
      lm(kmL ~ hp, data = df)
    )()

Copy pasta from this tweet.

Noisy syntax involving parentheses, including a werid empty pair hanging out in the breeze at the end. The easiest thing for beginners anyone to forget or accidentally unbalance.

So rather than reinvent the wheel, let’s take a quick look at how other programming languages with pipes have solved this issue.

Well there’s the Hack pipe and it uses a $$ placeholder to allow the user to set the position without making a lambda:

$x = vec[2,1,3]
  |> Vec\map($$, $a ==> $a * $a)
  |> Vec\sort($$);

But Hack? That’s a bit obscure.

What about Julia? Something more data sciencey and close to home. Well Julia uses a @pipe macro to, you guessed it, let the user deploy a placeholder to the arg position to be piped to:

@pipe a |> addX(_,6) + divY(4,_) |> println # 10.0

This macro theme is repeated in other languages. Checkout Clojure, it has so many pipes: -> pipe to first, ->> pipe to last, and ofcourse, as-> pipe to placeholder.

Okay so I am just cherry-picking examples. But the placeholder or placeholder/macro combination is a solution with precedent to the problem of how to pipe into an argument other than the first.

So let’s think now about R. We don’t have macros. Game over? No. R’s famed syntax malleability via lazy evaluation and syntax tree operations is how we get that kind of stuff done.

To fix Dog’s balls we’d be looking at some kind of function that manipulates the syntax tree. That is to say, it can turn:

a |> b(x, _) into a |> b(x, a)

Clearly, it needs to know about the symbols a and b(x, _) so it has to be an infix operator. Something like:

a %|>% b(x, _)

Where the %|>% function’s job is to rewrite the syntax tree by replacing any _ in the tree on its right-hand side, with the thing on its left-hand side. Easy done? Well, there is a recursion issue. It needs to rewrite:

a %|>% b(x, _) %|>% c(y, _) into c(y, b(x, a)) but details details.

I do think we can probably shave down some characters…. maybe drop the |? Still keeps the forward idea going.

And how do we feel about _… a bit Pearl-ish… maybe ? hmmm no that doesn’t inspire confidence… . ahhhh brief but firm - I like it. Putting it all together we have our new pipe:

a %>% b(x, .)

Now, I already know what you’re going to say, “This is not a pipe”.

VSCode is the platform for #rstats keyboard shortcut lovers

With VSCode you can configure a keybinding to run artibrary #rstats code, including {rstudioapi} calls in just a matter of seconds. That code can refer to things like the current selection, cursor location, or the current file.

For example here’s me making myself a knit button, where the placeholder $$ refers to the current file:

{
    "description": "knit to html",
    "key": "ctrl+i",
    "command": "r.runCommandWithEditorPath",
    "when": "editorTextFocus",
    "args": "rmarkdown::render(\"$$\", output_format = rmarkdown::html_document(), output_dir = \".\", clean = TRUE)"
}

And here’s a shortcut that opens a window to interactively edit the spatial object the user has the cursor on or has selected. In this case $$ refers to that object:

{
    "key": "e",
    "name": "mapedit object",
    "type": "command",
    "command": "r.runCommandWithSelectionOrWord",
    "args": "mapedit::editMap(mapview::mapview($$))"
}

Snippets are also easy. There’s about 3 different ways to achieve inserting text, all in the same simple json config style:

{
    "key": "ctrl+shift+m",
    "command": "type",
    "when": "editorLangId == r || editorLangId == rmd && editorTextFocus",
    "args": { "text": " %>% " }
}

Although RStudio addins are supported in VSCode, many things popular addins do can be done with a few lines of config. It’s a keyboard shortcut lover’s dream - I’d argue even more so than ESS. RStudio users should campaign for this!

What if the standard format to browse #rstats help was Rmd?

Here’s a little thing I was noodling with today. A drop in replacement for help() that pulls up the help file as an RMardkown document in your editor pane, not some weird special web browser window off to one side.

It’s reminiscent of the way help works in ESS/Emacs:

https://github.com/MilesMcBain/rmdocs

rmdocs

The advantages are:

  • You don’t take your hands off the keyboard to browse help
  • Search a help file using your standard editor shortcuts
  • Run examples in the console using standard mechanism (e.g. ctrl + enter)
  • Text and example code uses your editor fonts, themes, and plugins
  • Remix and edit examples in-situ (!)
  • Copy and paste using your keyboard only
  • You get to parse markdown with your eyes

On the downside:

  • At the moment you lose the links between help files. They’re not browsable (as in ESS).
  • You have to parse markdown with your eyes

It would be possible to bring it on par with ESS, but it would take a bit of work on the VSCode side, and then the VSCode-R extension would have 2 modes to view help in. Is this a good thing? I am not sure. I think this is probably good enough to fill the aching void in my setup.

With just little more work it could be used as a keyboard shortcut in RStudio as well.

Debugging cantrip from an #rstats wizard

For the benefit of my future self and other lovers of #rstats debugging:

Kevin Ushey just shared an incredible little trick with me that I am still reeling from in this issue thread.

You can use it to get a stack trace for code that is getting stuck in infinite loops or just generally taking a really long time. You can use that stack trace to see where in the code execution flow is getting bogged down.

I was there hacking in timing code and print statements (aka banging rocks together) when Kevin dropped this construct:

withCallingHandlers({
  ..YOUR SLOW CODE HERE..
}, interrupt = function(e) browser())

Here’s an example of it working:


[ins] r$> my_bad <- function() {
            while(TRUE) {
              lapply(letters, I)
            }
          }

          withCallingHandlers({
            my_bad()
            }, interrupt = function(e) browser())
Called from: (function(e) browser())(list())

[ins] Browse[1]> traceback()
7: unique.default(c("AsIs", oldClass(x)))
6: unique(c("AsIs", oldClass(x)))
5: structure(x, class = unique(c("AsIs", oldClass(x))))
4: FUN(X[[i]], ...)
3: lapply(letters, I) at #3
2: my_bad() at #2
1: withCallingHandlers({
       my_bad()
   }, interrupt = function(e) browser())
   

So when I interrupted the code running in the console with CTRL+C, I was kicked into browse mode, and from there I could call traceback()!

I am still trying to figure out how to wield this new power. It seems that depending on where you interrupt it, you may or may not have traceback available. But if the stack trace is available are the environment frames?!

Noodling around with the idea I came up with this, which seemed to work consistently:

withCallingHandlers({
            my_bad()
            }, interrupt = function(e) traceback())

Sweeet!

There’s also a more powerful version that Kevin shared down the thread that allows resuming. That trapped me in a bit of a loop of my own, but that’s what you get when you play with MAGIC.

Update

Luke Tierney (Gandalf level wizard), chimed in with some info that this trick can be pulled off with:

options(interrupt = browser)

Wow!

But then that lead me to try:

options(interrupt = recover)

Which is epic!

In case you don’t know about recover you REALLY should have a go with it. It’s pretty special. So special I made a video about it: https://youtu.be/M5n_2jmdJ_8 .

Dog’s Balls

A mature debate was had about whether #rstats’ new |> requires the use of “dog’s balls”, ()(), for lambdas with \(). Sadly it does. But it’s still kind of cool, and if you want to feel extra thankful for our benevolent overlords you could take a walk through the smouldering ashes of the JS native pipe train wreck: github.com/tc39/prop…

How to test against almost any R version with VSCode and Docker

Last week I hit a spot of bother trying to test against R-devel using Rhub. The issue is now fixed but it was blocking all builds against R-devel for a few days.

While that was being resolved I decided to try using VSCode’s docker integration to test against the Rocker R-devel container locally. This turned out to be quite easy! So here’s how you can test locally against any R version that has a tagged Rocker Docker container version!

Prerequisites

To pull this off you’ll need:

Step 1: ‘Reopen in container’

Click the little stylised >< icon in the bottom left corner. It’s bright purple in my screenshots. It will open the remote development menu. Choose Remote Containers: Reopen in container:

Step 2: ‘Add Development Container Configuration Files’

From the next menu you will be offered some default containers for Linux distributions. If you choose Show all definitions…, You will be offered R (community) - choose it!

Step 3: Wait for container to download

This starts the process of reopening your project in the container. You will have to wait for the container to download. This took a few minutes for me.

Step 4: Set the container tag version

Your project should have opened in the rocker/r-ver:latest container. If you open an R terminal you should be able to confirm that R is the latest release version. This is pretty sweet, but what we want is to be running against rocker/r-ver:devel.

To configure this we have to alter some files VSCode has placed in your project directory. You will have a new folder called .devcontainer under the project root:

.
├── .Rbuildignore
├── .devcontainer
│   ├── Dockerfile
│   ├── devcontainer.json
│   └── library-scripts
│       └── common-debian.sh

We need to make a small change to Dockerfile and devcontainer.json.

In Dockerfile, change the line right at the start that has:

ARG VARIANT="latest"

to

ARG VARIANT

The hardcoding of “latest” stops us being able to set it in the devcontainer.json.

Now in devcontainer.json, change this bit of JSON that has:

{
	"name": "R (Community)",
	"build": {
		"dockerfile": "Dockerfile",
		// Update VARIANT to pick a specific R version: latest, ... ,4.0.1 , 4.0.0
		"args": { "VARIANT": "latest" }
	}

to

{
	"name": "R (Community)",
	"build": {
		"dockerfile": "Dockerfile",
		// Update VARIANT to pick a specific R version: latest, ... ,4.0.1 , 4.0.0
		"args": { "VARIANT": "devel" }
	}

Make sure both of those are saved.

Step 5: Rebuild container

Using the stylised >< icon in the bottom left corner access the remote development menu and choose: Remote Containers: Rebuild container

The container will now rebuild via much the same process as step 3.

Step 6: Confirm you’re in R-Devel

Now when the project opens you can open an R terminal and run version to confirm you’re running against devel:

Done!

And that’s it. Now you can run devtools::test() and check() againt R-devel.

We could also go back to previous releases with this method by setting other tags in the devcontainer.json see available tags on the r-ver container here - they go back to around 3.2!

Using the remote development menu (><) we can flip back to our local R environment by choosing Remote Containers: Reopen Locally.

After the container versions have been downloaded the first time, flipping back and forth between local and container environments via >< takes just a couple of seconds!

WFH in the 20’s: “I CAN’T HEAR YOU MY HEADPHONES ARE BOOTING”

#rstats tip: Use R-universe to harden yourself against CRAN irrationality. Here’s me installing {targets}, which is currently archived for nonsensical reasons.

Hey friends I’ve been off Twitter now for 34 days. I feel great. More focussed, greater attention span, happier.

These are syndicated posts coming from my micro.blog account.

I am tracking some #rstats Twitter stuff via RSS. But I’m not checking DMs. I’m open to chat via email or Slack spaces we share.

Withholding my CRAN submission #rstats

I spent the last few nights polishing up a new submission for CRAN. I had planned to submit today. However I learned someone I greatly respect, whom I know to be almost certainly the most responsive and generous package maintainer in the #rstats community, has become the latest victim of CRAN irrationality and toxicity. I am sure he didn’t deserve to have his weekend ruined because one seemingly rogue administrator can elect to punish people without any accountabilty.

And what about the bystanders who are going to attend work tomorrow and find their builds are no longer reproducible, because a keystone package was archived? Do they deserve that punishment too?

I am withholding my submission for now. I am not sure what to do. I don’t want to enable this behaviour, but I also want to make a tool I am enjoying as accessible as possible. A lot of thoughts are swirling about this. There’s more to write with a cooler head.

Recently I mapped out our #rstats centric public service data science stack for runapp. Here’s some words on the collaboration blob: www.milesmcbain.com/posts/the…

the_stack.png

Talking about this to my R-nerd friends in the R Users Network: Australian Public Policy (Runapp) this week. runapp-aus.github.io

A bunch of software logos with a large lens flare exploding from R

#rstats work project dependency usage over the last 2 and a bit years

Unsung dev heroes miss out like: styler, mapview, lintr, languageserver, mapedit et. al.

A bar plot of R package use frequency by year

This moment had me reflecting on the etymology of the word “parking’.

This $3k rig is standing in for a second car for us. Day care and shopping runs are no sweat. And the kids love it!

#Brisbane PSA: the best day to hit your local supermarket is the day after lockdown, once all the preppers have stopped thronging in the aisles.

  • yours truly, common sense

Approximating #rstats RStudio’s F2 shortcut in VSCode

I made an approximate equivilant to RStudio’s default F2 shortcut for VSCode. In RStudio this key opens a function definiton in a new editor tab.

The JSON from my settings.json:

{
    "key": "b",
    "name": "browse function source in new window",
    "type": "command",
    "command": "r.runCommandWithSelectionOrWord",
    "args": "rstudioapi::documentNew(paste0(as.character(styler::style_text(deparse($$))), collapse = '\\n'))"
}

I use a shorcut sequence , c b with the VSCode whichkey extension so your setup will probably look a bit different for "key".

A major drawback of this approach is that since it’s not a saved file, the language mode is not automatically detected, so I have to set the language mode to R to see syntax highlighting etc.

You could also make it show up slightly faster by avoiding styling the code, but I find this is a vast improvement over the default styling.

Magpies have started sneaking in the back door to steal the kids scraps from under the table and TBH I’m not even mad.

Let’s stop doubly-screwing data science learners

I frequently see tweets that highlight the fact that people learning coding are not taught in depth about fundamental tools or processes like using a linter, or debugging. For example, this blog post from Greg Wilson.

It’s possible that in Data Science land we are doubly-screwing over learners by not only not teaching them fundamental coding knowledge, but also not teaching analogous things in our own domain.

I know of one particularly progressive course in Business Analytics at Monash University that teaches RMardkown for writing analytical documents, and even touches on Shiny for interactive apps. Students are rightfully being taught how to put together a polished looking piece of data driven communication as core coursework.

To me this a really insightful move, because out here in the trenches I have seen and felt the pain first hand of people who think they are on to a winning idea, but can’t make it connect, due to inability to communicate it in a convincing way. My sense is that Monash’s approach is the exception rather than the rule, and that is doing students a disservice.

The big one, the one that I think DS educators are totally sleeping on, is building project pipelines. By that I mean the craft of building out scalable software machines that ingest data from various sources and transmute it into various outputs, probably involving aforementioned presentation layer technology for the final leg.

Tools in this space are becoming mature and ubiquitous. It seems that every big data driven tech company has had to build one, and a few have open sourced them. Examples: Airbnb and Airflow, Spotify and Luigi, Netflix and Metaflow. In the R world we have been very fortunate to have the rOpensci peer-reviewed option in {drake}, and soon we’ll have another peer-reviewed option in {targets}.

I have written at some length about {drake}, and how its benefits can be felt all the way down to small projects. Recently a colleague of mine who is studying told his lecturer and tutors about our {drake} workflow, and was invited to teach his class about it. At least some of his peers, data science students, are now using it for their assignments and raving about it.

This confirms to me that pipeline tools, and the principles that underpin them are ready to be incorporated into the canon of core Data Science knowledge. I really hope I hear of more institutions following Monash’s lead, and teaching students modern tools, arising from the data science domain, that can set them up for success in industry.

Visitors

I’ve been writing a fair bit of Typescript and #rstats in VSCode over the last month and I’m struck by how much confidence the TS type linting gives me to slash at the code base. Editor highlights all the things I’ve broken quite well. Most of the bugs have been in the R code…

Keyboards vs. developer skill and the virtuos loop of productive developers

A bit of nonsense in the Twitterverse this week about developer seniority and usage of the mouse.

I see this as recurrence of the long running thread that rears up now and again about how ‘real’ developers use keyboard-driven editors like Emacs or Vim.

Some thoughts:

There could be a loose correlation between seniority and keyboard driven editors due to:

  • Age. These are old tools, and the people who started out when they were cutting edge are now old, and yes senior developers.
  • Injuries. Ergonomically, a mouse and standard size keyboard just don’t work long term for a segment of the population. Ergonomic keyboards, and keyboard mappings in keyboard-driven editors are a common solution to this. But you have to be at a mouse and keyboard for a fair amount of time for this to become a pain issue - skill accumulated over that time again probably leads to a loose correlation with developer seniority.

So I think some people might be observing a signal that is real (if weak), but surprise surprise getting themselves snagged in the correlation-causation-conundrum.

I have my own theories about better markers for productive programmers. I think after you gain enough programming skill you reach an inflection point where that skill can be brought to bear not just on the problems you have, but on your processes for solving them. You can write code to make yourself more efficient at writing code. You craft your own tools to fit your own niche problems.

There are examples of people who are known to be highly productive doing this everywhere. In the R world think about how {knitr}, {devtools}, {usethis}, {reprex} and their like came to be. They’re programming/CLI tools intended to supplement the capabilities of a GUI in a composite interface to the niche problems of building documents, packages, projects, and examples.

An interesting thing often happens where these things start out as command line things, and become so important to a workflow that they graduate to a keybinding or a GUI button. And so here I think we encounter another loose correlation between preference for keyboard-driven and seniority:

If you’re in the business of crafting the interface to your workflow, keybindings or buttons allow you to reduce the friction of that interface and make it ‘feel’ nicer to use. I guess it’s like the digital equivalent of a wall-mounted pegboard for tools. Having all these for-purpose tools right at your fingertips, you can reach for without thinking, helps you focus on what’s on the bench.

You could array your tools with buttons or menus to be moused-on, but keybindings give you a bit more ‘space’ to work with before things get unweildly - you run out of pixels fast! So there’s a practicality aspect that could be a driver for keybindings and editors that make keybindings easy to execute.

But it’s not creation of buttons or keybindings that is important. What exactly is a ‘low friction’ inteface will vary by person, and is relative to the friction of the task being interfaced with. In fact if you have powerful commands, a sharp memory, and are a fast typist, maybe a CLI already feels friction free.

The important thing - the productity multiplier - is using your skills to shape your tools and the environment that you work in, which in-turn makes your skills more effective. It’s an extremely virtuos loop, and I think possibly what people are really aspiring to, rather than say mastery of the keyboard or a keyboard-driven editor like Vim or Emacs.

Commands, buttons, bindings, foot pedals, voice commands, gesture controls… these are all just implementation options for interfaces created by that virtuos loop.

Howdy eveyrone it’s me Acrobat Reader. I’m a Reader for Pee Dee Effs. Definitely gonna write-lock those suckers tho (haha), so don’t forget to close me down or I’m gonna have to derail the shit out of your rendring pipelines. Seriously, I will DESTROY them. Have a great day!

Can one get a Phd in the fiddly little offset maths involved in inserting text into documents programatically?

Feeling like a bit of an outlier while wistfully throwing name in hat for Github codespaces. Also a Vim checkbox but no Emacs!

I have a feeling this is going to shake things up quite a bit when it drops.