Leaders are always looking for ways to make their data science team better; however, they rarely go to the extent of defining exactly what better means.

At least every minute there’s a new piece of advice for how to improve your team, coming from an endless supply of team-building gurus (including yours truly). For instance, I frequently write about the benefits of pair programming — I never miss an opportunity to debunk the myth that it’s only half as efficient. But how do you know the actual difference in efficiency if you don’t measure it?

There are two critical success factors that should always be measured and monitored on your data science team: effectiveness and efficiency. In this column, I’ll offer advice on how to measure efficiency.

Focus on the features developed

Efficiency is the rate at which your data science team can produce solutions. Efficiency works hand-in-hand with effectiveness, which speaks to the quality of the solution from the end users’ perspective. The ideal state is for your data science team to produce highly effective solutions with a high degree of efficiency. Higher efficiency translates to quicker problem solving and more flexibility as you navigate through your strategic challenges. However, you can’t manage what you can’t measure, and you can’t measure the undefined.

The trickiest part of measuring efficiency is figuring out your unit of measurement. I suggest you use a feature-based approach and develop an explicit definition of a feature. This concept was inspired by my work with Feature-Driven Development (FDD), an agile software development methodology that breaks a solution into many small features. I like feature-based measurement systems because they inherently reflect the end users’ interests. A feature is something an end user will use, so it’s a good basis for gauging how quickly something should be produced. Compare this with lines of code, which means nothing to an end user.

A feature then, is something an end user can actually benefit from, though we must be more specific in defining its scope and scale. A collection of features is just another large feature and when you put all the features together, you have your entire solution. So how large or small should a feature be? I suggest you go as small as possible. I call this fine-grained discrimination. I challenge you to decompose your scope into as many tiny feature particles as possible. Not only will this increase the resolution of your measurement, but it will also help standardize the time it takes to develop a feature. Development of a new analytic engine could take weeks or it could take years, while coding the input layer of a neural network should take between one to four hours.

The science of measuring features

It’s important for you and your team to develop a sense for how big a feature should be. The ideal timeframe to build any feature should be about two hours — that way, any data scientist should be able to build about four features a day. And any feature can be built with one two-hour coding session. If you achieve feature decomposition to this level, you can gauge progress on a daily basis, which is a best practice that I strongly recommend. And if you prioritize feature development properly, you’ll at least get one or two of the most important features out of every data scientist, every single day. That’s not bad in my book.

Once you develop your definition of a feature, the rest is easy — just track how many features your data science team are putting out in a day, a week, a month, and a quarter. And even though the development of a feature is a binary activity (i.e., either it’s developed or it’s not), because the discrimination is fine-grained, you can treat this measurement like a continuous variable. This will give you better options for measurement analysis and control, thereby increasing the quality of your measurement.

For example, instead of just experimenting with a concept like pair programming, run a controlled study. You would start by baselining your team’s performance against the number of features developed per week. A team of six data scientists might average 120 features per week, with a standard deviation of 10 features. That means, in any given week, there’s a 95% chance you’ll get between 100 and 140 features developed. Now, introduce pair programming and see what happens. After six months, you’ll have 26 weeks of data. Many people would assume that your efficiency will be halved to 60 features per week, but based on my experience, I can assure you that won’t happen. My guess is that your efficiency will only drop slightly to around 105, but your spread will tighten up to about 5 features and your effectiveness (measured separately) will skyrocket. That’s just me though — it’s best to find out for yourself. Now you know how.


Once you’ve figured out how to produce effective solutions, it’s time to drive efficiency. Overcoming the challenge of measuring efficiency is a worthwhile pursuit, and I’ve just given you my best pointers for pulling this off.

Work hard with your data science team to develop a set of features that are microscopic and consistent in development time. Then put a simple data collection and control plan in place, and you’re on the road to fast and furious. Slow and steady may win the race, but fast and steady wins the game.