The job of a data analyst is much like that of a gourmet chef. They both spend hours preparing the individual components of their respective creations, only to cook them in a matter of minutes. The thankless part of the job is the prep work. A data analysis will be judged only on its findings, not on the work that went into it.
"The cleaning of the data is extraordinarily unglamorous," said Mike Wieners, a marketing analyst at Alight Analytics.
Paxata, a Redwood City big data startup, is cutting the fat in this process. According to research done by Ventana Research (albeit sponsored by Paxata), analysts spend 40 to 60 percent of their time preparing data for analytics. That means that data analysts are spending about half of their time just getting ready to analyze data, and Paxata is hoping to disrupt and streamline that workflow.
The company offers a cloud-based service that prepares raw data in minutes, based on the answer set the analyst requires. Analysts input data sets from different sources and the company's proprietary IntelliFusion engine uses algorithms to find relationships within the data so that analysts can quickly cross-reference and make connections between data sets locked in separate silos. The engine is based on the machine learning branch of artificial intelligence, which proposes that machines can learn to distinguish and categorize data based on past experience.
The company came out of stealth mode in October 2013, after working to quietly build up a customer base and establish an ecosystem with existing services.
"We founded Paxata with the goal of being the first self-service data preparation platform built for the business analyst. We agreed that we didn't just want to create a solution and go find people to buy it. Instead, we wanted market validation to be the foundation of what we would build," said CEO Prakash Nanduri.
Accel and Walden Riverwood have invested in Paxata, and they already have some big-name customers like Box, UBS, Dannon, and Pabst using their service. So, there is definitely value in making data preparation less painful.
Connecting the dots
Innovations in cloud and big data systems mean that companies now have the ability to store and quickly access tons data. The problem is that, although we now have the means to analyze non-linear sets of data, differing labels and descriptions make it difficult to prepare and find commonalities.
When you combine research data, public data, data from third-party suppliers, and corporate data from systems like Salesforce.com or Siebel, you end up dealing with mismatched data sets that require extra work to collate. Even in the year 2014, some data analysts are still working by hand to combine relevant data in a spreadsheet to prepare it for analysis.
Paxata's system lets you dump all of your data into it so it can connect the dots between data from different sources. If decisions need to be made regarding the data, Paxata will highlight those decisions for the analyst to make sure the analyst can get the answer set they need for the business intelligence (BI) tool they prefer. Instead of offering an all-in-one approach, Nanduri explains that they let analysts choose which tool to use.
"Unlike other vendors that have taken an all-in-one approach, Paxata believes in being part of an ecosystem of next generation companies such as Tableau, Qlik, and Cloudera. This gives our customers more flexibility and allows us to focus on what we do best and leave the rest to our partners," Nanduri said.
The Paxata infrastructure is built on OpenStack technology and runs in Rackspace's SSAE 16 certified data centers. The user experience was built in HTML5 to power a responsive design that scales across all devices. According to co-founder Nenshad Bardoliwalla, Paxata also offers time-stamping and versioning of all data modifications at a tenant, user and cell level; and it can be deployed in VMware vCloud environments.
Conversations around cloud-based companies, especially those who deal with company data, always come back to security. According to Cari Jaquet, Paxata's VP of Marketing, they address security with LiveHistory, a feature that, "lets everyone see—step-by-step—what was done to data sets, and either play it back for future use or rollback if they need to make adjustments along the way based on emergent governance. And since you can apply the same steps or modify them quickly if something about the data or requirement changes, you are not manually handcrafting in a constant 'start from scratch' mode."
The goal is to provide a place where IT and business can collaborate on data in real-time without having to adhere to a predetermined workflow. They have options for single analysts and businesses alike.
Making it work
The company made a strategic move when they came out of stealth mode at the Strata Conference and Hadoop World. Looking at pre-existing customers of data analytics software that they can attach to, the Paxata team estimates that there are roughly 70,000 accounts where they can complement the visualization tools already being used by business analysts.
Paxata has a scalable pricing model that is composed of three different options: Pax Personal, Pax Share, and Pax Enterprise. For the lone wolf data analyst, Pax Personal offers unlimited projects and 1 GB of cloud storage for $3,500 a year.
The $10,000 a year Pax Share license gives business users increased collaboration with simultaneous editing, auditing, and traceability. The Pax Share license also gets you 5 GB of data, increased administration features, API access, and access to the Paxata library of datasets. The enterprise option gives you everything previously listed plus support SLAs and storage tailored for the client company's needs.
"Businesses, especially, have far fewer resources than they need for analytics," Wieners said. He added that a product like Paxata could help companies take back that power of quickly analyzing their data, allowing analysts to make more impactful insights, answer questions more quickly, and "illustrate their value much more quickly."
In discussing what makes a client company a good fit for Paxata, Jaquet mentioned three requirements: the company must be data-driven, the organization must see a partnership between IT and the business users, and the business must want to get the most out of their agile BI investments.
As big data continues to grow in the enterprise space, the need for data preparation and analytics tools will grow with it. Paxata is hoping to help companies cut time and costs by turning weeks and months of data preparation into a few minutes of algorithmic magic.
Conner Forrest has nothing to disclose. He doesn't hold investments in the technology companies he covers.
Conner Forrest is a Senior Editor for TechRepublic. He covers enterprise technology and is interested in the convergence of tech and culture.