-
Marketplace
-
Channel Resources
Articles from this Site
Master Data Management is a Program
Eighteen months ago, master data management was a new concept in our enterprise, but there were any number of critics of our key systems' data quality. IT executives had long been socializing the benefits of integrated systems, and our integration "plumbing" - the infrastructure that moves data from place to place - is robust and well built out. But as any plumber will tell you, even the best pipes don't prevent clogs - for that you need to keep your drains clean.Master data management is the processes and technologies applied to continuously cultivate a set of valuable corporate information. Quite simply, it is how you keep your drains clean.
R.R. Donnelley is a company, like many, that has been built up by a long series of acquisitions, mergers, and other arcane SEC transactions. Many of our key applications are completely custom-built, and they service only a subset of the enterprise. This context, coupled to the inherent process disintegration that comes as a result of large-scale non-organic growth, is a failsafe formula for endemic data quality problems.
The first major decision to be made was how to plan our approach. We decided early on that data quality (the problem) or master data management (a solution) could not be effectively addressed in our organization by a single project. So we instead set up a continuous improvement program and began to socialize its long-term goals among key executives. This took the initial form of interviews: we talked to senior directors and VP's from areas and functions that depend on high quality data to make decisions that span divisional boundaries. We asked three pivotal questions:
- Which "entities" of master data, i.e. customer, vendor/supplier, part/product, etc are of most interest?
- What information about those entities would provide the most value if reliably available?
- What cannot currently be done as a result of not having that information; or conversely, what new value could be achieved/attained if the information was easily available?
The point of these interviews was to confirm the value proposition and business case for a MDM program, and identify the key stakeholders who would sponsor the initiative. This exercise produced two key findings. First, customer data would be the top priority. Second, due to the large number of stakeholders, an executive steering committee would be required to focus and prioritize the program.
After this beachhead was established, we began to plan out the program tracks. Three key areas emerged: politics, technology and data.
Politics
Political strategizing is a requirement for any emergent discipline. For our MDM program, it began in our initial interview activities. During this phase, we spoke with about 50 individuals, and as these were members of a difficult to schedule crowd, this took nearly four months. We were not idle during the downtime; rather, we used it to refine our messaging, and to get started on the upfront work for the other two tracks.
After the steering committee was established, the political angle took a different turn. We were responsible for providing regular status updates to maintain the hard-earned executive interest in the program. This interest guaranteed, and continues to guarantee our funding. We have tried to meet the needs of as many members of the committee as possible, but in a program of this breadth, universal appeasement is not possible.
One of the largest costs to the business in doing MDM is not the up-front investment in technology, software, or even consulting services. It is the ongoing burden of active, careful data stewardship that really hits hardest. Every hour spent maintaining customer data or hierarchy/relationships is time not spent on credit checks, presales, marketing or other valuable activity. We have found that this time loss is felt more keenly than the monetary cost. As a result, our solution is designed and marketed as being as automated as possible, requiring human intervention only when absolutely necessary.
Technology
This track was begun with two key activities: market research and internal cataloging. We called in leading software and solutions integrator vendors and had them describe their solutions, and then measured this back against the direct requirements we were hearing in our executive interviews. Unfortunately, the results left us displeased with the available solutions pool, members of which inevitably were very expensive, limited/inadequate to our needs, or both.
Internally, we looked at all of our systems that we knew or suspected dealt with customer information. In doing so, we created CRUD and RACI matrices recording the functional and technical roles that our systems/functions play with respect to customer data. We also worked with business SME's and information analysts to understand (and in some cases document for the first time) customer data related processes. This gave us a much more complete understanding of our environment, and allowed us to make informed scope and architectural decisions.
While there are several competing ideals for master data implementation architectures, we chose a "hybrid hub" style. This architecture is characterized by bringing just the key elements of the customer data together, creating a cross-reference that points out duplicates, applying quality improvement processes to the intersection points to create Master Customers, and then publishing that data back out to subscribers.
Our data sourcing pipeline applies a three-step process to deal with the incoming information: standardization/cleansing, identification, and matching/merging.
- First, the data elements from all various sources undergo a process to make their data similar, e.g. all instances of "Incorporated" in customer names is changed to "INC." or city names are stored in all-caps. This removes system-specific inconsistencies and provides a better baseline for the following steps. We also apply USPS delivery-point validation at this stage to increase the data quality of the customer address(es).
- Second, we attempt to identify the customer. This is done by looking at the source system specific identifiers that may come across with the data event, and involving an external customer data provider in real-time to ID the data if they are able. Those ID numbers (and additional data if found in the external database) are then fed into the next step.
- Third, the identified and cleansed customer record must now be de-duplicated in comparison to the thousands of other records in the master system. We do this comparison by looking at all the elements in the record that may be common with others - the name, the legal address, perhaps the DUNS number. We sometimes find an exact match, but usually the duplicates are more subtle - different abbreviations and misspellings make this comparison quite complex.
Data
It sounds banal to say that data is of key importance to the MDM program since it literally is in the name, but we cannot stress enough how complex and crucial this aspect of the program is. We broke this track down into two components: data quality and information architecture.
Data quality starts with the question, "How useful is this data to the enterprise?" The answer when looking at our customer data is interesting: generally the data is quite useful to individual systems or even business units, but when considered from an outside of, or over the silo perspective, its value drops off sharply due to identification difficulties. In our organization, like any other, we found countless data quality problems, but two stood out: intra-system duplicates and inter-system duplicates. Our technical architecture aimed directly at these as key quality problems that our master data hub could address.
Information architecture is the way in which data is organized. For example, a customer has a legal name, a contact-at address, perhaps an optional phone number. In our business, many customers have DUNS numbers since we sell primarily to organizations instead of individuals. But what if a customer's sales team has a name they prefer to use that differs from the legal name - should the architecture account for this? This, along with a thousand similar questions each interested party requests/requires of the information, helps to define the information model appropriate for your organization.
One way that these two aspects come together is in data profiling. Profiling is the process of looking very carefully at data in existing databases and asking frank questions about it: In how many instances are "required" attributes of a customer in fact missing? How often and recently have data records been updated with current information? What processes created and continue to create the customer data? How is it used at both the line-of-business and executive reporting levels?
Sometimes even simple-sounding questions are deceptively difficult to answer, for example, "How many customers are represented in this database?" The answer will depend entirely on how you define what a customer is, and that refers back to the data architecture. Profiling one's source system data is possibly the single most important activity for any MDM project; it identifies and addresses some of the riskiest problems early and efficiently.
Conclusion
This is not a completed project at R.R. Donnelley; rather, we have made a strategic decision that we are going to continuously cultivate our customer data and treat it as the valuable asset that it is. The value we have already delivered in terms of increased visibility into our data is very real, and as we begin to move this new Master Data out into the enterprise - through reporting solutions and eventually back into the operational arena - that value will increase rapidly.
Our story is certainly not the only way to do MDM. We have things we'd like to have done better or more completely. But as the overall quality of our success is very pleasing to me personally, and measurably valuable to our enterprise, I am confident that this is one good way to go about it.
Over the longer term we hope to move this competency of customer data cultivation into other arenas that could benefit from it, such as vendor, press/bind equipment and perhaps internal resources such as employee skill sets, paper and other materials. We sincerely hope that our customer MDM strategy becomes the prototype for a wave of change in data management processes that we can leverage to create real competitive advantage.
(Editor's note: Scott Lee has started a MDM workgroup for interested parties at http://tech.groups.yahoo.com/group/master-data-management.)
Scott Lee is the senior architect for the Information Architecture group at Chicago-based printing company R.R Donnelley. He can be reched at scott.lee@rrd.com. Lee has started an MDM workgroup for interested parties at http://tech.groups.yahoo.com/group/master-data-management.
For more information on related topics, visit the following channels:


