Developer

Master data management—lessons learned over the course of a year

The key facet of the master data management system (MDMS) is to have a single system as a point of reference for all commonly used items of data within a company.

Just over a year ago, I wrote an article for Builder.com on my thoughts about how best to manage Master Data in a corporate environment based on my experiences with several clients who have been grappling with this issue. In this article, I will examine the progress made towards the goal, the lessons learned, the detours, and the amendments to the perceived goal.

The key facet of the master data management system (MDMS) was to have a single system as a point of reference for all commonly used items of data within a company. Each "consumer" system could then refer to this system for its key data ensuring common and correct data across the entire suite of applications within the organization.


Download a printable PDF version

The text of this article and the original Builder.com article are available in printable PDF form.


Uptake of MDMS by other applications

Within the companies that I covered in the previous article, progress towards the goal is coming slowly but surely. This is mainly because companies have three types of applications:

  • New—Created in house/externally or off the shelf.
  • Upgraded—Where a window exists to do some work on an existing system.
  • Legacy—Where the system does not have an upgrade window or a replacement in planning.

In the first case, links to the MDMS can be introduced, assuming that the systems analysis has identified them, as part of the development or installation and configuration phase of this kind of project. The main reason for not doing the linking, or only part of it, usually comes down to budgets, resources available, or time. However, if these links are thought about—and budgeted—there should be no reason why new applications should not take full advantage of an MDMS.

In the second case, links can be planned and budgeted as part of the routine upgrade of this system. However, these kinds of systems require a lot more analysis to look at the impact of the link; for example, what is the impact to the system if an item of data stored in the MDMS is updated? Also in these kinds of projects, the money is usually for new functionality rather than infrastructure changes, so some lower priority links may not be implemented in a given upgrade.

Lastly, legacy systems, which as well as being among the most prevalent of systems within larger companies, are the most complex to deal with in terms of linking to the MDMS. Many companies today have a significant implementation of so called "legacy" languages such as COBOL. According to a statement on DevX:

"70 percent of the world's data is processed by COBOL and nine out of 10 ATM transactions are done using COBOL. Thirty billion online COBOL transactions are processed daily; 492 of the Fortune 500 use COBOL, including the entire Fortune 100, and current COBOL investment tops $3 trillion."

Despite this situation, very few developers have the necessary COBOL skills. This is because many of the developers who do have skills are retiring. Since COBOL is not seen in the same favorable light as Java or .NET skills by companies and universities, the pool is decreasing significantly, which is likely to cause headaches and huge salaries similar to that of the Y2K era if companies do not think sensibly about it—see this ComputerWorld article for more information.

In some companies the adage, "If it ain't broke, don't fix it," applies and is used as another common reason for not doing any work on these systems—as long as they work, the limited money/resource/time is better spent elsewhere. This makes these kinds of systems very difficult to integrate with a new MDMS, but given the trends, I would suggest that at least thinking about them while the resource is still available relatively cheaply should be a priority.

Developing in parallel

One of the main headaches that we have come across is that the development of an MDMS usually runs parallel to that of other development work within an organization. This can cause problems when planning activities either for the MDMS system or other systems. These problems could include situations in which:

  • Required data is not planned to be available in the MDMS until after the application that needs it "goes live,"
  • Data identified as common and required by a system is not planned for inclusion in the MDMS at this time.
  • Data newly identified as common is used within existing applications that do not take it from the MDMS.
  • Other constraints on either the development team or the MDMS team such as budget, time or resource.

Deploying an MDMS within an organization causes a significant rise in the juggling required by management to try to ensure the best fit of application development/upgrade paths with the MDMS's own development path. While this does cause a lot of headaches and other issues, on the whole it is a price worth paying.

One of the ways that some of the project teams I've been involved with have worked to resolve these potential issues is to initially store their data locally in a database and then, as appropriate, either load data from the MDMS into this database or to change the connection so that the data is loaded directly from the MDMS. This allows the transition of data management from local application to MDMS to happen at its own pace, item by item, and not have it forced by or tied to other activities. This is because changing a database connector or adding another database to a data movement script—such as EAI or ETL or even an SQL Server DTS package—is a relatively simple task.

A case study

One of my clients became very aware of the impact of a change to the MDMS and the rate at which it flows through the other systems during a recent update to its internal classification hierarchy, where only a handful of records were changed. After the change had been released, IT began to get support tickets stating that the data in System A did not match System B—because the change had only reached one system at that point.

Then they discovered that another system, System C, had crashed because its database included a relationship between the classification hierarchy tables and the products the company sold; the change had broken that link, causing errors and some data loss in System C. Luckily, it was caught before it could feed any other systems with the corrupt data, or the end users used the corrupt data for anything important.

It is events like this which make you wonder if the risk of linking systems together via an MDMS is really worth it. The IT team had to invest a serious amount of time and resources bringing System C back up, not to mention the loss of confidence in them from the end users as a result of the entire episode.

Anyone for Pooh Sticks?

Pooh Sticks is a game played by Winnie the Pooh & Friends (created by A. A. Milne) which involved each player dropping a stick into a stream and seeing whose stick was the first to reach a designated finishing point.

When linking systems together to share data in any way, it is very important to look at how that data will flow from that system into any of its child systems and when that flow occurs; is it a nightly load, manually initiated upload, or a weekly data dump, for example? Once this is understood, a proper analysis of what each system will do with that data needs to be undertaken to understand what happens when the changed data in the MDMS is published through your infrastructure.

Due to various factors, some systems may not show exactly the same data at the same point in time, due to data flow though the system. As long as this is known and accepted by the consumers of the system, it may not be the large problem it first appears to be. The impact of an item of data changing may vary depending on the item of data, the system, and what it is used for.

I am reminded of something one of my university lecturers said about real time systems, "the right data at the wrong time is still wrong."

This mapping exercise was originally perceived to be a simple process, as we had previously linked several of our key systems together. It has, however, turned into a major project as we uncovered systems that were not controlled by the central IT team, applications that had not previously been documented, and found out-of-date documentation, etc. In addition to this, discovering the use of the data within each system is not simply a question of asking the relevant Project Manager; developers had to be involved in order to trace the use of data in the code, and key users had to be interviewed so that we would understand the impact of each change. This has turned into the biggest component project of the MDMS rollout.

The triad MDMS management team

In my original article, I suggested that a three pronged management team for the MDMS was the best approach: one to manage the MDMS data, another to manage the use of this data, and the final person(s) to be the key technical contact for the MDMS and the use of its data. (See Figure A)

Figure A

Triad

This approach has been adopted in the company I work for, although none of the "triad" work full-time on the MDMS. However, the combination of these three distinct strands within the project team has ensured that the MDMS is making steady progress towards being the system at the heart of the company's infrastructure model.

In the early stages of deployment of an MDMS, another role is required—that of an MDMS evangelist. This person must sell the MDMS concept and benefits both to the IT team, and to the key stakeholders in the business who will have to fund not only the MDMS, but also the related systems changes that will be required to make them compatible with the MDMS. It is extremely important to convince the various parties to embrace the MDMS concept and inform them about what MDMS can do to make their jobs easier, more efficient, and more productive.

A second case study

On a recent project to provide a change control system for the European arm of a very large corporation, we were able to use their MDMS to provide a significant amount of the data required by the new system. This data was already stored and managed and controlled, so we did not need to do much more than consume it. This saved a large amount of time, resources, and money on the project, which was used to deliver major improvements and new functionality in the new system.

For those data items that were not yet held within the MDMS, we stored them in our own little database and as they became available within the MDMS, we investigated the best possible refresh method for this data. As a result, almost all the data required by this system is now being sourced from the MDMS, some as a live feed, others via a data load taking place at varying times.

Lessons learned

We are now realizing the sheer scope and implication of the Pandora's box that we have opened in this project. Our key learning thus far:

Planning

  • Ensure that you understand how and when the data will flow from the MDMS to each system and what impact a change to it will have.
  • Look at your IT team's plans for the year and try to best match up the inclusion of data within the MDMS with development of similar systems.
  • If you have a major data change, plan it independently.

Development

  • Develop the MDMS only as much as you need to; remember it is a data source and will not be directly used by a significant number of end users.
  • Try to provide a common data access infrastructure between the MDMS and the other systems; this reduces complexity and increases ease of maintenance and support.

Communication

  • Keep talking with the IT Team to ensure that the data they need from the MDMS is available when they need it, where possible.
  • Keep talking to the end user community to ensure that they understand the benefits that the MDMS brings for the extra time and money they are investing.

Looking to the future

Over the next few months, we are looking to identify and include as much common data in the MDMS as possible and encourage systems that currently use this data to take it from the MDMS. We are also looking at the relationships between items of data that are held in each system (if you have a product of type A then your subproduct types can only be 1, 2, or 3—for example), as well as the validation rules used in each system for each data item to ensure a consistent approach. Some of these relationships and rules may end up in the MDMS itself, extending it from being a simple data repository.

We are also considering what other types of data we could store within our MDMS system, for example: translations of common phrases such as Hazard Phrases, usage instructions, and digital asset management for things such as product logos.

The analysis of data flow and usage to other systems is the main project for the team at the moment; the effort involved is consuming the majority of the team resources as well as involving many other people in the analysis of all the systems within the company.

Linking to the MDMS is now a required feature of any new development work, including upgrades, and a good reason is required not to do so. In some cases, IT is pushing back to ensure that the sufficient funding and time is provided for each project and to ensure that this is correctly done. Also, a separate team has been created to develop data-sharing tools within the company such as ETL, EAI, etc. to ensure that intersystem connections are created in as common and manageable a way as possible.

We have also begun looking at software in the new MDMS arena—such as SAP's MDM—which could be used instead of our homegrown MDMS solution.

Editor's Picks

Free Newsletters, In your Inbox