Big Data

How to get your big data team to implement functional code every two weeks

For most data science teams, the idea of delivering functional code within two weeks seems impossible. Discover why and how a two-week deadline for code is feasible.

Video: Enterprise companies are sitting on a "gold mine" of big data

Typically, when I challenge data science teams to build functional code in two-week iterations, they think I'm crazy. When most teams embark on a project to create a new data solution, the best they can imagine accomplishing in the first two weeks is maybe a high-level understanding of the requirement. So, the notion of having functional code that end users can use after a period of two weeks is unthinkable. How can you possibly combine advanced mathematics and/or artificial intelligence, a high-volume persistence system, data visualization, and a user interface into a usable system in just two weeks?

Well, as I like to say, "it's only impossible until it's not."

To be fair, there are qualifications to this assertion that are worth noting.

First, you must be prepared to launch into a project with two-week iterations. If you're not already set up for success, you'll need to invest some time into an Iteration Zero.

Second, although you'll deliver functional code that's useful to end users, the scope of the deliverable (especially the first deliverable) is very limited. Although the users certainly have the option of stopping after just two weeks of development, that's not the intent.

Finally, you must have an ecosystem of rules and practices, like those found in Extreme Programming that insulate the two-week iteration from all the risks that are inherent in this style of development.

All that said, even with the right environment in place, I still find that data professionals have a hard time conceiving of a two-week iteration. So, I developed a structured method to help data science teams get started with this initially difficult practice--it's simply called the 4-3-2 method.

SEE: The Big Data & Analytics Master Toolkit Online Course (TechRepublic Academy)

Image: iStock/Rawpixel

Getting started

The 4-3-2 method consists of: four days of testing, followed by three days of development, followed by two days of refactoring. It fits nicely into a two-week iteration that starts with a one-day iteration-planning meeting with the end user(s). I understand that this method, on the surface, makes the idea of a successful two-week iteration look even more impractical than when we started--however, just stick with me, and give it a try.

At the iteration-planning meeting, although it's good to have the perspective from multiple end users, I find it's best if you have one gold user who can speak for all of them; this will prevent valuable time lost in your iteration-planning meeting while end users argue with each other about the requirement.

There are a few important outcomes from this meeting, most importantly the scope for the iteration. The scope must be very small: a simple report, data transformation, or mathematical function that will make their lives a little easier. There will be a lot of reports and functions that the users want, but pick the one that users can most benefit from right now (or in two weeks) and that the data science team knows it can build in two weeks. Then pick one to three more functions and rank them in priority order; your team will definitely hit the top priority, and will probably hit one or more of the others. A full eight-hour day will give you enough time to understand the big picture, understand the small picture (i.e., the scope for this iteration), and set proper expectations. Get a good night's sleep, because tomorrow will kick off an intense nine days of development.

SEE: Job description: Data Scientist (Tech Pro Research)

A look inside the two-week iteration

Okay, it's time to get started. If you hold the iteration-planning meeting on a Monday (recommended), then the balance of the week will be spent writing tests. That's right, writing tests--no production code.

The 4-3-2 method follows a test-driven design (TDD) approach, albeit a bit different than how it's normally practiced. Typically, TDD design quickly alternates between building tests and building production code. You build a small test that will fail (because there's no production code); you write production code so the test will pass; and you repeat this process until you have a full suite of tests and a good base of production code.

In the 4-3-2 method, you spend four solid days writing tests: user tests, system tests, and some unit tests. Although it may seem odd to spend 40% of your iteration time writing tests, it will force you and the end users to become crystal clear on the requirement. And it will put you in a great position to start developing production code.

With only one week to go, it's time to start writing production code. You'll be amazed at how much easier it is to write production code when you have a robust set of tests developed. A large part of the time that it takes to build a data solution is spent in ambiguity. With the TDD approach, there is no uncertainty--that's all worked out in the tests. The only job for the data professionals now, over the next three days, is to make the tests pass as quickly as possible. Don't worry yet about how the code looks; we'll deal with that next.

The last two days of the week are spent in refactoring. At this point, you've already met the end users' requirement, and you have functional code that they can use. These last two days are for the data professionals--this is when you make the code look good. This is where you clean up your variables, remove duplication, and apply all those other good code design practices that we're all aware of. These two days should be easy on the team--much unlike the typical crunch time before a deliverable is due.

SEE: Video: How data science can help sales teams see why deals are falling through (TechRepublic)

Conclusion

For most data science teams, the two-week iteration seems like a unicorn. I can assure you that it's very real and doable, but like anything else, you must have a good environment, a disciplined team, and a good method of implementation. I've shared one such approach: the 4-3-2 method.

After a one-day iteration planning meeting, spend four days writing tests, three days writing code, and then two days cleaning up the code.

The end users shouldn't expect to get much in two weeks of development, but they can get something that will make their lives a bit easier. And more importantly, you'll get valuable feedback from the end users on the viability of the data solution--even if it's only a small part.

If you're stuck trying to get functional code to end users in two weeks, try the 4-3-2 method. You have nothing to lose and everything to gain.

Also see

Visit TechRepublic