Are you a big data laggard? Here's how to catch up

If your company has gotten behind in its big data management, there are ways to get current using automation.

bigdata.jpg

Image: monsitj, Getty Images/iStockphoto

Not all companies have the resources to develop their own big data management systems, hire the staff to manage them, and glean all the information they can from it. With large amounts of data coming in, this can lead to massive data management challenges. 

SEE: Report: SMB's unprepared to tackle data privacy (TechRepublic Premium)

"There are companies like Netflix and Twitter that immediately understood the value of big data and that had the resources needed to develop big data staffs and applications at the onset of the big data movement," said Monte Zweben, CEO of Splice Machine, a company that provides a SQL-database platform that's specifically tailored for accelerated big data modeling and deployment. "Unfortunately, most of the Fortune 2000 companies couldn't compete with these efforts, so they fell behind. They ended up with 'data silos,' like having one data science group whose work everyone in the company was competing for."

The companies that are behind in the big data race already know who they are. So, is there a way to catch up?

Zweben and others believe that a convergence of database technology and machine learning will give companies that are behind in their big data initiatives a chance to catch up, because companies will no longer need the raw talent and expertise that have been required to develop and deploy big data applications quickly.

How to catch up your big data projects

Most companies actively working with big data are using a complex, step-by-step process that begins with taking data in from outside sources though a message broker; then taking the ingested data through some kind of extract, transform, and load (ETL) tool that ultimately loads the data into an operational database once the data has been formatted so the database can accept it; and finally applying artificial intelligence and machine learning through an array of analytical tools so business insights can be derived from the data.

"At each stage, you need to learn how the tools work under the hood, become an expert on each, manually instrument each, choose dozens of infrastructure components and configurations, and go through painfully slow iterations to develop, debug, and productionize the complete stack," said Zweben.

SEE: How Netflix uses Python: Streaming giant reveals its programming language libraries and frameworks (TechRepublic)

According to Zweben, the key to expediting big data expertise and applications is to simplify this process with automation and with technology convergence that reduces the amount of hand coding IT needs to do; and to avail a process in which IT, data scientists, and business users can all collaborate.

This is done by automating many of the processes that IT today does by hand. Simplification also relies on being able to provision many different big data and analytics sandboxes in user groups throughout the organization with the help of a single command that automatically allocates storage and processing for each sandbox—and that can just as easily take these sandboxes down when work is complete.

A single command can also take a big data model that has been developed and tested and place it into production when it is ready.

SEE: Navigating data privacy (free PDF) (TechRepublic)

As a result, companies that have gotten behind with their big data projects can use a more streamlined and simplified method of big data application deployment that can assist them with catching up to the rest of the market.

Why catching up on big data projects is important

"In the process, you can break down the functional and data silos and facilitate a more broad-based big data approach to big data in the entire company," Zweben said. "This happens when users, IT, and data scientists can concurrently and actively collaborate with each other."

This makes sense because the less big data architecture IT has to be built from scratch every time someone in the organization wants to develop a new data model, the less work there is and the faster the times to market for new business insights derived from big data. That's good news for companies trying to catch up. 

Also see