Big Data

Big data is a big reason government IT is embracing Amazon Web Services

Blue Canopy has found a way to make big data projects work for the public sector...and the private sector should listen in.

Image: iStockphoto/Melpomenem

In the private sector, big data initiatives always take much longer to roll out than the business initially expected. Even so, enterprises persevere as the anticipated payoff is significant enough.

As bad as the private sector is, however, the public sector puts it to shame, what with all of its notorious challenges of decision making and nightmare procurement processes. One company, Blue Canopy, may have the answer for both public and private big data deployments, however, turning to a very modern public cloud infrastructure and data visualization and analytics stack.

SEE: CIOs keep trying to defy cloud gravity (TechRepublic)

Founded in 2001 as a woman-owned small business, Blue Canopy provides information technology and cyber security solutions to US government and commercial enterprises. With 450 technologists spread across two dozen large-scale engagements for government and commercial clients at any given time, Blue Canopy is working on a project as part of a $39 million contract to build a cloud that fights fraud for the SEC, an initiative with the Defense Intelligence Agency to help build a common cloud-based IT platform for military and intelligence agencies as part of a $6 billion project, and a social media analytics solution to help monitor national security and emerging threats for the intelligence community.

I recently spoke with Jim McGinn, chief technologist at Blue Canopy, to see how big data projects can succeed even as they play by sometimes cumbersome government rules.

Making government IT work

TechRepublic: It's hard enough to rally the forces required in the private sector to launch a big data initiative, how do you help your clients succeed in government?

McGinn: Historically, the waits have been long. An organization might spend a year, or more, just getting in position with a project—developing a data model to understand the data, designing databases, loading and indexing data, and all the rest. After all that time, users still haven't seen anything that helped them do their jobs or improve their organization.

And remember that, for the most part, government agencies are used to building IT solutions in on-premises data centers. Getting and configuring space, servers, storage, and an environment for a large initiative distracts organizations from what they really want to do—focus on their organizational missions.

We like to act as disruptive innovators, so we can help organizations react more quickly to changes—both in their missions, and in technology. The imperative is to deliver new capability quickly, and we help them do that.

Getting started in the land of bureaucracy

TechRepublic: What is the most important first step in a government big data project?

McGinn: The first thing you do is show how easy it is to get access to cloud-based compute and storage environments. Those, together with an array of analytic and visualization tools, take much of the waiting and frustration out of big data initiatives.

Historically, our clients have wanted to do data visualization, but they didn't understand the data. We didn't want to spend a year or more doing data models—we wanted business users to start leveraging data as soon as possible, and develop their understanding that way. That's agile.

To do this, we've created a cloud-based ecosystem with flexible, scalable storage and compute capability, with a range of data management analytic tools. The appeal of Zoomdata on AWS is how quickly—along with the other parts of the ecosystem—it lets us do that.

TechRepublic: How quickly can you get started?

McGinn: In 30-60 days, we can have data loaded in our cloud-based ecosystem, with quality visualizations on top of it. Users can immediately get value from the data, and we can engage with them to further define their analytic requirements. It's great with Zoomdata that we can tap a wide range of data sources.

Second, the data doesn't have to be moved or reformatted, and data can be combined across sources and types. We're not stuck in massive ETL jobs just to start a project.

Building the Amazon way

TechRepublic: Why does Blue Canopy prefer cloud-based solutions, and especially on AWS?

McGinn: Some Blue Canopy engagements still employ on-premises solutions, because clients have a need for them or want to leverage investments. In some cases, they've already spent millions, and months, trying to get licenses and pricing worked out, more time getting software installed in data centers, and then more trying to get data loaded and connected.

With the cloud, I can do that in a matter of minutes. So we're doing analytics for our clients, right away, instead of creating infrastructure that, by itself, has no value.

SEE: Labor costs can make up 50% of public cloud migration, is it worth it? (TechRepublic)

There's also a user-focused angle. Google, Facebook, and so many other consumer-oriented companies have changed user expectations, permanently. In order to gain user adoption nowadays, you must be agile, you must focus delivering user capability, and focus on the user experience—not on IT infrastructure.

And being nimble means not being locked into a specific technology. That's why we take an ecosystem approach, enabling organizations to bring the right tools to bear when and where appropriate. For us, Amazon has the most mature and capable cloud environment. It lets us get things done quickly for our clients—and that's what we want to do.

There was a time when, on a new solution, we might choose to build a data warehouse using the Oracle database and MicroStrategy analytics tool. But today, maybe we won't even need a relational database. If we can bring data in, and store it in Amazon Simple Storage Service (Amazon S3) in an hour, and users can start asking analytic questions immediately...why go old school? Do we need 500 GB of storage today? We get it instantly, and load the data right away. Next week, another petabyte to ingest? No delay. I just change my AWS configuration and I have the storage, all provisioned.

This kind of elastic scalability—for both storage and performance—is an unparalleled combination.

Also see

About Matt Asay

Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and other tech media. Asay has also held a variety of executive roles with leading mobile and big data software companies.

Editor's Picks

Free Newsletters, In your Inbox