How to install graph databases to get the most benefits

Graph databases are becoming more popular in enterprises. Here are three best practices for IT when installing graph databases.

Businessperson Looking At Financial Graph On Digital Tablet

Image: iStock/whyframestudio

Graph databases create links between all types of data and data sizes; this is in contrast to a traditional relational database, which can create links, but the links are to columns of data, not necessarily a variety of data, images, and document objects. Organizations have been using graph databases along with their data to uncover market and risk trends, as well as to identify service and/or maintenance needs.

SEE: 60 ways to get the most value from your big data initiatives (free PDF) (TechRepublic)

Graph databases are not new, but enterprises have been relatively slow at adopting them. What's speeding adoption of graph databases is the entry of unstructured big data and the need for companies to analyze data in highly associative data models that weren't needed as much years ago. This is why, given today's business challenges, companies like Daimler are using graph databases in HR to identify who's working on interdisciplinary swarm projects and who in the organization has expertise that could help another area of the company.

In order for graph databases to be effective, IT as the primary implementer needs best practices to install these databases effectively so the business can get the results. Here are three best practices.

1. Know when to use a graph database and when not to

Graph databases create linkages between all types and sizes of data--whether the data is a single field or a big blob of data, like an image; this enables users to search the links and associations between all of this data in order to find data patterns. Conversely, if your goals are primarily processing transactional sequential data, a more traditional relational or even hierarchical database is a better fit.

2. Create a database model

Creating a graph database model is a bit like defining the static elements in a business workflow. Before you set a workflow in motion, you initiate a whiteboard collaborative design session where individuals with thorough knowledge of the area you want to model and report on identify all of the elements in the area of study. These elements have to be linked to each other so you have the basic fabric defined for your graph database model.

For instance, if you are modeling customers, you might want to know where and when they purchased your products, what the products were, where products were picked up and/or delivered, etc. Each of these elements is identified as a bubble on your data linkage map. The ultimate goal of your graph report will be to analyze all of the data linkages and associations and to come up with a profile of the buyer.

During modeling, it's equally important to narrow your domain for your graph database so it doesn't become too large--you should balance size and inclusiveness against making the graph database too small. Your end objective is to achieve sensible database scale but not at the expense of excluding important data relationships. The best way to achieve this balance is to have IT and end users work collaboratively together.

3. Size your graph database

One way to approach graph database sizing is to identify all of the files from which you want to import data into your graph database. You can then add up the space requirements of these files to reach the total space needed for your graph database. 

In traditional database/storage sizing, it's not uncommon for storage professionals to allocate an additional 20% of storage for data growth; however, graph databases are not intended to be gargantuan, monolithic databases that keep expanding. The best way to deploy graph databases is to narrowly define the business models these databases are operating on; unforeseen capacity growth will be less likely to occur.

Also see