SHARE

6 tips for creating effective big data models

Big data is less predictable than traditional data, and therefore requires special consideration when building models. Here are some things to keep in mind.

Written By

Mary Shacklett

Oct 14, 2022

We may earn from vendors via affiliate links or sponsorships. This might affect product placement on our site, but not the content of our reviews. See our Terms of Use for details.

6 tips for getting the most out of your big data models — Image: iStock/z_wei

Data modeling is a complex science that involves organizing corporate data so it fits the needs of business processes. It requires the design of logical relationships so data can interrelate with each other and support the business. The logical designs are then translated into physical models that can include storage devices, databases and files that house the data.

Must-read big data coverage

Historically, businesses have used relational database technology like SQL to develop data models, because it is uniquely suited for flexibly linking dataset keys and data types together in order to support the informational needs of business processes.

Unfortunately, big data, which now comprises a large percentage of data under management, does not run on relational databases. It runs on non-relational databases like NoSQL. This leads to the belief that you don’t need a model for big data. The problem is, you do need data modeling for big data if you want to leverage it to its full potential. Here are six tips for modeling big data in an accessible and effective way:

Jump to:

1. Don’t try to impose traditional modeling techniques on big data
2. Design a system, not a schema
3. Look for big data modeling tools
4. Focus on data that is core to your business
5. Deliver quality data
6. Look for key inroads into the data

1. Don’t try to impose traditional modeling techniques on big data
2. Design a system, not a schema
3. Look for big data modeling tools
4. Focus on data that is core to your business
5. Deliver quality data
6. Look for key inroads into the data

1. Don’t try to impose traditional modeling techniques on big data

Traditional, fixed record data is stable and predictable in its growth. This makes it relatively easy to model. In contrast, big data’s exponential growth is unpredictable, as are its myriad forms and sources. When sites contemplate modeling big data, the modeling effort should center on constructing open and elastic data interfaces, because you never know when a new data source or form of data could emerge. This is not a priority in the traditional fixed record data world.

2. Design a system, not a schema

In the traditional data realm, a relational database schema can cover most of the relationships and links between data that the business requires for its information support. This is not the case with big data, which might not have a database or might use a database like NoSQL, which requires no database schema.

Because of this, big data models should be built on systems, not databases. The system components that big data models should contain are business information requirements, corporate governance and security, the physical storage used for the data, integration and open interfaces for all types of data, and the ability to handle a variety of different data types.

3. Look for big data modeling tools

There are a variety of commercial data modeling tools that support Hadoop, as well as big data reporting software like Tableau. When considering big data tools and methodologies, IT decision-makers should include the ability to build data models for big data as one of their requirements.

SEE: Tableau Training & Certification Course (TechRepublic Academy)

4. Focus on data that is core to your business

Mountains of big data pour into enterprises every day, and much of this data is extraneous. It makes no sense to create models that include all that data. The better approach is to identify the big data that is essential to your enterprise and to model only that data.

5. Deliver quality data

Superior data models and relationships can be instituted for big data if organizations concentrate on developing sound definitions for their data and thorough metadata that describes where the data came from, what its purpose is, etc. The more you know about each piece of data, the more you can place it properly into the data models that support your business.

SEE: Best practices to improve data quality (TechRepublic)

6. Look for key inroads into the data

One of the most commonly used vectors in big data today is geographical location. Depending on your business and your industry, there are also other common keys to big data that users want. The more you can identify these common entry points into your data, the better you will be able to design data models that support key information access paths for your company.

Read next: Top data modeling tools of 2022 (TechRepublic)

Mary Shacklett

Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.