How graph databases help analyze complex relationships

Sometimes, traditional databases and analytics aren't right for your data. Here's a different option that may work.

Top 5 things to know about data science
2:13

Last year, Gartner Research listed graph analytics as one of the top ten data and analytics trends. Gartner said graph databases "will increasingly be used to navigate existing and newly discovered relationships more efficiently than relational processing over the next two to three years. 

Graph analytics is an emerging form of data analysis that works particularly well with complex relationships, according to Oracle.com. It involves moving data points and relationships between data points into a graph format and codes queries more efficiently, and it can output results in an easy-to-digest visual format.

The rise of graph databases corresponds to the changes in organizations about how data is viewed.

SEE: 13 things that can screw up your database design (free PDF) (TechRepublic)

Increasingly, data and the relationships it establishes are being seen as highly fluid and mobile—and not necessarily hierarchical or locked into columns and rows as previous data schemas and technologies have represented. Graph databases are well positioned to field fluid data, and to be able to see, analyze and change with it.

Common graph database use cases include:

  • GPS systems and maps that are used to find the shortest route from one point to another
  • Social networks that represent connections between users 
  • Fraud detection based on locations and usage patterns

Graph database relationships can also be used to recognize the most relevant data relationships.

For instance, there are several ways to travel from point A to point B on a Google map—but there is always one way that is flagged as the best route. 

Choosing the best of several options is also widely used in medical diagnosis and treatment choices; in risk management strategy selection; and in predicting the most likely outcome of next year's World Series and the teams that will likely be playing.

SEE: The continuing rise of graph databases (ZDNet)

At the most fundamental level, creators of graph databases have to do four things:

  1. Identify the business use cases that the database will be applied to
  2. Select the data that will populate the database
  3. Create relationship links between the various data elements that will be used in analytics processing
  4. Assign a weight to each relationship link to show its relative importance in the analytics that are to be performed.

To illustrate how all of this works in practice, we can go back to the example of mapping the best driving route between points A and B:

  1. Define a use case that enables travelers to obtain optimum travel routes without having to research the routes themselves.
  2. Populate a graph database with relevant data that can define routes, road construction and other hazards along the way, mode of transportation, starting points, destination points, etc. 
  3. Create relationship links between all of the various data elements to illustrate their relationships to each other (e.g., if you are driving from Dallas to Houston, start and destination points links will be built for both cities, for various routes between the two cities, and for the mode of transportation).
  4. Assign a value to each link (e.g., if there are four different ways to drive from Dallas to Houston, assign the highest numerical weight to the route that can get the driver from Dallas to Houston fastest. This will be flagged as the preferred route).

For IT, there is one more important thing to do: Define where graph databases fit within your overall database architecture.

A majority of mission-critical legacy systems use hierarchical databases that represent relationships between data items. These databases are high performing and highly reliable. Hierarchical databases have been in existence for more than 50 years, and they aren't going away soon. Your database architecture should include them.

Relational databases are also in broad use, and unlikely to go away soon. They use SQL, in which many IT staff are fluent. Relational database structures can be changed quickly, and they can handle many relationships between data elements. Relational databases are easy to work with, but they lack the speed of hierarchical databases.

SEE: Why graph databases are so effective in analytics projects (TechRepublic) 

Graph databases can link together any number of data points in any given order. They are highly flexible and can be easily revised. They can map the interrelationships between data at all levels of a data hierarchy, and can do this more effectively than relational or hierarchical databases. However, graph databases are uniquely suited for queries and analytics. Unlike hierarchical databases, they are not process engines that can handle thousands of transactions in less than a second, nor are they as well suited for line-of-business processing on local servers as their relational database counterparts are.

The IT database architecture should have room for all three of these databases, and IT should have a clear direction on when and where to deploy which database technology. 

Also see

Illustration of orange business chart of growth and fall in stock, money or commodity prices with lines and background change - vector

Image: FORGEM, Getty Images/iStockphoto