Big Data

Why graph databases are so effective in analytics projects

Faster batch processing and spotting internal and external fraud are just two benefits enterprises are seeing from using graph databases for analytics.

Image: iStock/whyframestudio

By all definitions, a graph database is not big data technology; it is a NoSQL database that in a growing number of cases is beginning to supplant traditional relational databases.

Nevertheless, graph databases are worth talking about in the big data and analytics context because, behind the scenes, the capabilities of graph databases improve the ability to analyze complex data relationships; these databases also give organizations greater ability to move reporting into a real-time or near-real-time mode. Both of these trends also characterize the big data movement today, so in a very real sense, the general shift of corporate reporting to a relational instead of a transactional context is likely an outcome of the corporate focus on big data and analytics.

"What makes a graph database so effective is its ability to be a highly intuitive data model and also to reflect how the world really operates by being able to find the relational connections between objects and data," said Ryan Boyd, head of developer relations North America for Neo4j, a graph database solutions provider.

SEE: Why your next big database decision may be a graph

Boyd said that graph databases are being adopted in companies because these databases can so effectively and intuitively describe the world through their data handling; because graphs can be very high-performance databases when compared with the performance of traditional relational databases; and because graph databases are agile and can easily optimize new and existing data models with less work. Why is this?

"In a relational database, every JOIN statement requires the application to look at another index to another dataset," said Boyd. "We have enterprise clients that tell us that some of their SQL queries might require over 20 of these JOINs—and this can make data queries really slow. With a graph database, you find a logical starting point and you branch out from there and identify the relationships. For instance, you might write a query that asks, 'Find all of the friends of the friends of John.' Instead of having to JOIN many different indexes, the graph database uses pointer arithmetic that is in-memory or in cache and performs the operation." The result is less compute-intensive and faster processing.

Boyd said that Neo4j has over 200 enterprise clients that are using the graph database so they can explore more complex data relationships and associations and also bring more of their analytics into a real-time processing mode. Even for organizations that rely primarily on batch analytics reporting (and most do), plugging up to a graph database can dramatically shrink the batch-processing window. This is why the movement into graph databases is an important bellwether for future analytics.

How are organizations putting graph databases to use?

"Financial services companies are using graph databases to assist them in discovering instances of both internal and external fraud," said Boyd. "In retail, companies are using the technology to help them with purchase recommendations for customers. In logistics, graph databases are being used to plan package routings — and in networking and IT, it is being used in root cause analysis."

Why graph databases should matter to data analysts

Even so, some might argue that the graph database is simply a NoSQL database alternative to traditional relational databases. Purists can also argue that graph databases focus more on transactional data, so they are technically not big data tools.

However, by advancing the case for real-time analytics along with the capacity for delving into highly complex data relationships, graph databases are raising overall corporate awareness about the importance of data analytics and being able to identify relationships and meanings of data from many different sources, which is what big data is all about.

Also see

About Mary Shacklett

Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President o...

Editor's Picks

Free Newsletters, In your Inbox