Just over a year ago, I wrote an article for Builder.com on
my thoughts about how best to manage Master Data in a
corporate environment based on my experiences with several clients who have
been grappling with this issue. In this article, I will examine the progress
made towards the goal, the lessons learned, the detours, and the amendments to
the perceived goal.

The key facet of the master data management system (MDMS)
was to have a single system as a point of reference for all commonly used items
of data within a company. Each “consumer” system could then refer to
this system for its key data ensuring common and correct data across the entire
suite of applications within the organization.


Download a printable PDF version

The text of this article and the original Builder.com
article are available in printable PDF
form.


Uptake of MDMS by other applications

Within the companies that I covered in the previous article,
progress towards the goal is coming slowly but surely. This is mainly because
companies have three types of applications:

  • New—Created in house/externally or
    off the shelf.
  • Upgraded—Where a window exists to do
    some work on an existing system.
  • Legacy—Where the system does not
    have an upgrade window or a replacement in planning.

In the first case, links to the MDMS can be introduced,
assuming that the systems analysis has identified them, as part of the development
or installation and configuration phase of this kind of project. The main
reason for not doing the linking, or only part of it, usually comes down to
budgets, resources available, or time. However, if these links are thought
about—and budgeted—there should be no reason why new applications should not
take full advantage of an MDMS.

In the second case, links can be planned and budgeted as
part of the routine upgrade of this system. However, these kinds of systems
require a lot more analysis to look at the impact of the link; for example,
what is the impact to the system if an item of data stored in the MDMS is
updated? Also in these kinds of projects, the money is usually for new
functionality rather than infrastructure changes, so some lower priority links
may not be implemented in a given upgrade.

Lastly, legacy systems, which as well as being among the
most prevalent of systems within larger companies, are the most complex to deal
with in terms of linking to the MDMS. Many companies today have a significant
implementation of so called “legacy” languages such as COBOL.
According to a statement on DevX:

“70 percent of
the world’s data is processed by COBOL and nine out of 10 ATM transactions are
done using COBOL. Thirty billion online COBOL transactions are processed daily;
492 of the Fortune 500 use COBOL, including the entire Fortune 100, and current
COBOL investment tops $3 trillion.”

Despite this situation, very few developers have the
necessary COBOL skills. This is because many of the developers who do have
skills are retiring. Since COBOL is not seen in the same favorable light as Java
or .NET
skills by companies and universities, the pool is decreasing significantly,
which is likely to cause headaches and huge salaries similar to that of the Y2K
era if companies do not think sensibly about it—see this ComputerWorld
article for more information.

In some companies the adage, “If it ain’t broke, don’t
fix it,” applies and is used as another common reason for not doing any
work on these systems—as long as they work, the limited money/resource/time is
better spent elsewhere. This makes these kinds of systems very difficult to
integrate with a new MDMS, but given the trends, I would suggest that at least thinking
about them while the resource is still available relatively cheaply should be a
priority.

Developing in parallel

One of the main headaches that we have come across is that
the development of an MDMS usually runs parallel to that of other development
work within an organization. This can cause problems when planning activities
either for the MDMS system or other systems. These problems could include
situations in which:

  • Required
    data is not planned to be available in the MDMS until after the
    application that needs it “goes live,”
  • Data
    identified as common and required by a system is not planned for inclusion
    in the MDMS at this time.
  • Data
    newly identified as common is used within existing applications that do
    not take it from the MDMS.
  • Other
    constraints on either the development team or the MDMS team such as
    budget, time or resource.

Deploying an MDMS within an organization causes a
significant rise in the juggling required by management to try to ensure the
best fit of application development/upgrade paths with the MDMS’s own
development path. While this does cause a lot of headaches and other issues, on
the whole it is a price worth paying.

One of the ways that some of the project teams I’ve been
involved with have worked to resolve these potential issues is to initially
store their data locally in a database and then, as appropriate, either load
data from the MDMS into this database or to change the connection so that the
data is loaded directly from the MDMS. This allows the transition of data
management from local application to MDMS to happen at its own pace, item by
item, and not have it forced by or tied to other activities. This is because
changing a database connector or adding another database to a data movement
script—such as EAI or ETL or
even an SQL
Server
DTS package—is a relatively simple task.

A case study

One of my clients became very aware of the impact of a
change to the MDMS and the rate at which it flows through the other systems
during a recent update to its internal classification hierarchy, where only a
handful of records were changed. After the change had been released, IT began
to get support tickets stating that the data in System A did not match System
B—because the change had only reached one system at that point.

Then they discovered that another system, System C, had
crashed because its database included a relationship between the classification
hierarchy tables and the products the company sold; the change had broken that
link, causing errors and some data loss in System C. Luckily, it was caught
before it could feed any other systems with the corrupt data, or the end users
used the corrupt data for anything important.

It is events like this which make you wonder if the risk of
linking systems together via an MDMS is really worth it. The IT team had to
invest a serious amount of time and resources bringing System C back up, not to
mention the loss of confidence in them from the end users as a result of the
entire episode.

Anyone for Pooh Sticks?

Pooh Sticks
is a game played by Winnie the Pooh & Friends (created by A. A. Milne)
which involved each player dropping a stick into a stream and seeing whose
stick was the first to reach a designated finishing point.

When linking systems together to share data in any way, it
is very important to look at how that data will flow from that system into any
of its child systems and when that flow occurs; is it a nightly load, manually
initiated upload, or a weekly data dump, for example? Once this is understood,
a proper analysis of what each system will do with that data needs to be
undertaken to understand what happens when the changed data in the MDMS is
published through your infrastructure.

Due to various factors, some systems may not show exactly
the same data at the same point in time, due to data flow though the system. As
long as this is known and accepted by the consumers of the system, it may not
be the large problem it first appears to be. The impact of an item of data
changing may vary depending on the item of data, the system, and what it is
used for.

I am reminded of something one of my university lecturers
said about real time systems, “the right data at the wrong time is still
wrong.”

This mapping exercise was originally perceived to be a
simple process, as we had previously linked several of our key systems
together. It has, however, turned into a major project as we uncovered systems
that were not controlled by the central IT team, applications that had not previously
been documented, and found out-of-date documentation, etc. In addition to this,
discovering the use of the data within each system is not simply a question of
asking the relevant Project Manager; developers had to be involved in order to
trace the use of data in the code, and key users had to be interviewed so that
we would understand the impact of each change. This has turned into the biggest
component project of the MDMS rollout.

The triad MDMS management team

In my original article, I suggested that a three pronged
management team for the MDMS was the best approach: one to manage the MDMS
data, another to manage the use of this data, and the final person(s) to be the
key technical contact for the MDMS and the use of its data. (See Figure A)

Figure A

Triad

This approach has been adopted in the company I work for,
although none of the “triad” work full-time on the MDMS. However, the
combination of these three distinct strands within the project team has ensured
that the MDMS is making steady progress towards being the system at the heart
of the company’s infrastructure model.

In the early stages of deployment of an MDMS, another role
is required—that of an MDMS evangelist. This person must sell the MDMS concept
and benefits both to the IT team, and to the key stakeholders in the business
who will have to fund not only the MDMS, but also the related systems changes that
will be required to make them compatible with the MDMS. It is extremely
important to convince the various parties to embrace the MDMS concept and
inform them about what MDMS can do to make their jobs easier, more efficient,
and more productive.

A second case study

On a recent project to provide a change control system for
the European arm of a very large corporation, we were able to use their MDMS to
provide a significant amount of the data required by the new system. This data
was already stored and managed and controlled, so we did not need to do much
more than consume it. This saved a large amount of time, resources, and money
on the project, which was used to deliver major improvements and new
functionality in the new system.

For those data items that were not yet held within the MDMS,
we stored them in our own little database and as they became available within
the MDMS, we investigated the best possible refresh method for this data. As a
result, almost all the data required by this system is now being sourced from
the MDMS, some as a live feed, others via a data load taking place at varying
times.

Lessons learned

We are now realizing the sheer scope and implication of the
Pandora’s box that we have opened in this project. Our key learning thus far:

Planning

  • Ensure
    that you understand how and when the data will flow from the MDMS to each
    system and what impact a change to it will have.
  • Look
    at your IT team’s plans for the year and try to best match up the
    inclusion of data within the MDMS with development of similar systems.
  • If
    you have a major data change, plan it independently.

Development

  • Develop
    the MDMS only as much as you need to; remember it is a data source and
    will not be directly used by a significant number of end users.
  • Try
    to provide a common data access infrastructure between the MDMS and the
    other systems; this reduces complexity and increases ease of maintenance
    and support.

Communication

  • Keep
    talking with the IT Team to ensure that the data they need from the MDMS
    is available when they need it, where possible.
  • Keep
    talking to the end user community to ensure that they understand the
    benefits that the MDMS brings for the extra time and money they are
    investing.

Looking to the future

Over the next few months, we are looking to identify and
include as much common data in the MDMS as possible and encourage systems that
currently use this data to take it from the MDMS. We are also looking at the
relationships between items of data that are held in each system (if you have a
product of type A then your subproduct types can only be 1, 2, or 3—for example),
as well as the validation rules used in each system for each data item to
ensure a consistent approach. Some of these relationships and rules may end up in
the MDMS itself, extending it from being a simple data repository.

We are also considering what other types of data we could
store within our MDMS system, for example: translations of common phrases such
as Hazard Phrases, usage instructions, and digital asset management for things
such as product logos.

The analysis of data flow and usage to other systems is the
main project for the team at the moment; the effort involved is consuming the
majority of the team resources as well as involving many other people in the
analysis of all the systems within the company.

Linking to the MDMS is now a required feature of any new
development work, including upgrades, and a good reason is required not to do
so. In some cases, IT is pushing back to ensure that the sufficient funding and
time is provided for each project and to ensure that this is correctly done.
Also, a separate team has been created to develop data-sharing tools within the
company such as ETL, EAI, etc. to ensure that intersystem connections are
created in as common and manageable a way as possible.

We have also begun looking at software in the new MDMS
arena—such as SAP’s
MDM
—which could be used instead of our homegrown MDMS solution.