Data modeling is a complex science that involves organizing corporate data so it fits the needs of business processes. It requires the design of logical relationships so the data can interrelate with each other and support the business. The logical designs are then translated into physical models that consist of storage devices, databases and files that house the data.
Historically, businesses have used relational database technology like SQL to develop data models because it is uniquely suited for flexibly linking dataset keys and data types together in order to support the informational needs of business processes.
Unfortunately, big data, which now comprises a large percentage of data under management, does not run on relational databases. It runs on non-relational databases like NoSQL. This leads to the belief that you don't need a model for big data.
The problem is, you do need data modeling for big data.
Here are six tips for modeling big data:
1. Don't try to impose traditional modeling techniques on big data
Traditional, fixed record data is stable and predictable in its growth. This makes it relatively easy to model. In contrast, big data's exponential growth is unpredictable, as are its myriad forms and sources. When sites contemplate modeling big data, the modeling effort should center on constructing open and elastic data interfaces, because you never know when a new data source or form of data could emerge. This is not a priority in the traditional fixed record data world.
SEE: Deep learning: An insider's guide (free PDF) (TechRepublic)
2. Design a system, not a schema
In the traditional data realm, a relational database schema can cover most of the relationships and links between data that the business requires for its information support. This is not the case with big data, which might not have a database, or which might use a database like NoSQL, which requires no database schema.
Because of this, big data models should be built on systems, not databases. The system components that big data models should contain are business information requirements, corporate governance and security, the physical storage used for the data, integration and open interfaces for all types of data, and the ability to handle a variety of different data types.
3. Look for big data modeling tools
There are commercial data modeling tools that support Hadoop, as well as big data reporting software like Tableau . When considering big data tools and methodologies, IT decision makers should include the ability to build data models for big data as one of their requirements.
SEE: 60 ways to get the most value from your big data initiatives (free PDF) (TechRepublic)
4. Focus on data that is core to your business
Mountains of big data pour into enterprises every day, and much of this data is extraneous. It makes no sense to create models that include all data. The better approach is to identify the big data that is essential to your enterprise, and to model that data.
5. Deliver quality data
Superior data models and relationships can be effected for big data if organizations concentrate on developing sound definitions for the data and thorough metadata that describes where the data came from, what its purpose is, etc. The more you know about each piece of data, the more you can place it properly into the data models that support your business.
6. Look for key inroads into the data
One of the most commonly used vectors into big data today is geographical location. Depending on your business and your industry, there are also other common keys into big data that users want. The more you can identify these common entry points into your data, the better you will be able to design data models that support key information access paths for your company.
- Big data strategy: 5 areas to reassess by mid-2018 (TechRepublic)
- For evidence of big data success, look no further than machine learning (TechRepublic)
- 6 tips for extending business capability of big data projects (TechRepublic)
- Big data and digital transformation: How one enables the other (ZDNet)
- Volume, velocity, and variety: Understanding the three V's of big data (ZDNet)
- Five organizations that are using big data to power digital transformation (ZDNet)
Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.