There’s been some great discussion around a post by Koreen Olbrish. One part of that discussion has been questions from designers around the idea of “How can Tin Can help?” Kevin Thorn and Julie Dirksen both posted a variety of scenarios, and I’m hoping we see a lot more.
I’m going to join the discussion and put on a second hat that doesn’t come up too often. My main hat nowadays is CTO of Saltbox – developer, ops, chicken sacrifices, and any other technical stuff, and that’s the hat I’m usually wearing when I participate in the Tin Can API standards discussion. This second hat is something that used to be a big part of what I did: social science researcher doing data analysis.
Because that’s what we’re mostly talking about when we ask how Tin Can can help in various situations: data analysis. Not always, but Tin Can itself doesn’t do much until the data it collects becomes a target for analysis. That means that a lot of the time what I’m talking about will be useful independently of Tin Can.
So, without further ado, the first in a series of “Tin Can Can” posts, the Vehicle Fleet scenario from Kevin Thorn.
I have a fleet of 4,000 vehicles. Brakes are replaced once a year. However, over 300 are being replaced 2-3 times a year. I need to know if those vehicles are in fact faulty, or do I have a driving performance problem with drivers riding their foot on the brake pedal? Can TinCan help me with that?
I’m going to make a simplifying assumption that vehicles have one primary driver in a time period. Most of what I talk about could still be done without that assumption, but the verbiage would be a lot more complicated.
Before we even talk about Tin Can, here’s what I’d do first. I’d make a stacked histogram of miles driven (presumably the company tracks this for the fleet) for vehicles having and not having brake failures in a given year. I’d also graph home locations (also a reasonable assumption for tracking) on a map, coloring vehicles having brake failures to stand out.
That might be enough. If I see that most brake failures seem reasonably explained by miles driven and/or geography, then there would be relatively few people who might be assisted by training or performance support. If I see the relationship, I could check my simple model by looking for drivers who have multiple brake replacements go away or appear with a change in driving frequency.
Assuming I don’t see that, or I have some reason for seeking out more detail, I would next look into patterns for particular drivers. I would probably make another histogram, showing how many drivers had more than one brake failure in a year how many times. If there aren’t very many with multiple years, that suggests the problem has little to do with the driver, or if it does, drivers quickly adjust. To check for the latter, I might make a dot plot showing times a car had multiple breakdowns vs years the driver had been driving for the company at that time.
If I see the latter, or brake problems don’t correlate with miles driven/location but do correlate with driver*, I’m going to start becoming confident that there’s an opportunity for training or performance support.
Now I’m going to talk about Tin Can.
My sketch above leads to two scenarios. In one, we have reason to believe that drivers start out more likely to need brake replacements, but acquire skills that avoid the problem over time. In the other, we have not explained why drivers need brake replacements, but it seems to be caused by something related to particular drivers – quite likely something about how those drivers are driving, since miles driven and geographic location, two likely culprits, have been looked into. I’ll call the first the “performance ramp” scenario and the second the “repeated problem” scenario.
The performance ramp scenario is probably the Tin Can-friendliest, because the evidence suggests specific behaviors exist and can be acquired that address the problem, and moreso that there are existing experiences that lead to that acquisition. Conveniently, what we have on hand is an Experience API.
I see several possible approaches. In one approach, we would turn the detailed experience of any given drive into stylized happenings recorded via Tin Can. That could be through an automated system (send a statement when braking hard, send a statement when RPM exceeds a certain amount at rest, et cetera) or through ridealong observers with checklists (I predict there will be a number of Tin Can enabled checklist/survey offerings). Then I would look at summaries of that stylized data to see different patterns – for instance, I might apply Principal Components Analysis and color the points by whether or not the driver had repeated brake problems.
If that approach is not feasible, I would consider a training course for new drivers specifically tailored to reduce brake abuse, then watch the brake replacements of (randomly assigned) course graduates closely vs others in their cohort. One thing to be careful of would be enrolling sufficient drivers to get useful results. If the course reduced the cost of brake replacements by more than the cost of running the course, I’d expand it to all new drivers. I’d Tin Can stylized accounts of student behavior similarly to see how they interacted with test scores (recorded by Tin Can).
The repeated problem scenario is trickier. Approaches like described above may help, or may not. For instance, if the problem is sufficiently heterogenous, fixing it may be more expensive than the problem itself. If I implemented collection of stylized data, I would consider using matching to find possible causes. To match a problem driver’s record, find the most similar record among non-problem drivers. By matching a few problem drivers and looking at the characteristics in the differences with their matches individually, patterns may be visible that suggest causes.
And that’s how I’d investigate the problem described, and where I would bring in Tin Can to do so. I’m going to reuse techniques from this post again and again: stylized performance data, matching, histograms, checklists, and so forth. What makes Tin Can so important is that many applications of these techniques will be able to build with reusable components that may be composed in other situations to solve other problems.
*I’ve only talked about fairly simple graphs for getting at this information, but in reality there are much more powerful toolkits to do so. They are also, however, more complicated, and for many problems people care about often unnecessary. Even experienced practitioners will often see things in simpler graphics that they miss in more complicated models. If you’re interested in learning more, I’m talking about regression. In a situation like this, if I felt the need, I’d probably start with a simple regression using locale and miles driven, then if that didn’t work do a multilevel regression including those and locale. If I had a large number of variables, I’d consider a regression tree.