Discussion on:

11
Comments

Join the conversation!

Follow via:
RSS
Email Alert
The problem of counting students is similar to the problem faced by organizations when counting customers. The first question to ask is ???who???s asking??? and the second is ???what is the definition of customer???? An important consideration is that the answers to both these questions are subjective and temporal. Subjective because the context constrains the answers while over time the answers will change. Within sales, a prospect is considered a customer. Within finance, a customer maybe defined as only those who did business within the past year.
The standardized definitions are subject to change (sometimes without notice). For example are individuals who registered but haven???t paid considered studnets? For promotional purposes the marketing department might consider these as ???students???.

Updating historic data to include new classifications is a challenge. The attributes needed to classify the data may not have been captured so going back and attempting to classify the data according to some new codes may not be possible. Many databases contained codes for gender as male/female. However in today???s world a further classification may be needed. How do you do you go back and determine gender for historic records to code them according to new classifications. It is important to ask how important is this historic data to the decision making process and is it possible to ???derive??? these classifications from the data previously captured.

Data mining of data to determine correlations require extensive planning and execution. Having the data is but one consideration. Having the ???right??? data is the other. Correlations between data such as ???living in a particular building??? and retention may be discovered but correlation is not causation. How many data points for each student will be collected and what is the relevance of this data? It is costly to collect and manage data. Will this data provide insights that will impact the outcomes in a measurable way?

What has been the impact of the data quality efforts and what are the expectations for the future impacts? Is the investment in a bureaucracy paying off? Has progress been affected by the bureaucracy? Has performance of the organization improved as a result of these efforts? Is this effort sustainable?
Hi,
Thanks to share those valuable initiatives.
During the process of Data cleansing which tools you have used?

rgds
0 Votes
+ -
First I want to apologize for my bluntness. I don't normally write in responds to articles but you caught me at a rare moment. I can appreciate the issue addressed by the author (data cleansing) and I can't say I have a in depth knowledge of the problem but a quick visit to Westminster's website show a liberal arts college of 1100 to 1200 students. Westminster is ranked #166 in liberal arts colleges in the U.S. That means there are 165 Lliberal arts colleges of similar size, and most likely some of them have had to address issues that you are addressing now.
Are you sure the genesis of your problem isn't more system or should I say systems related. Maybe getting senior staff out on a field trip to see how other schools of similar size handle their data is in order. Your going over issues that have been addressed before by many other institutions of higher learning with more resources to resolve your issues than W.C. has.
Your senior staff needs to ask "What are small liberal arts colleges like Aherst, Middlebury, Williams, ranking in the top 10, charging 40K - 50K per year doing that that Westminster is not? What kind of data systems are they using? How are they structuring their systems inter-departmentally? How are they extracting the data? Go do a panty raid on some of these top 10 and bring back ideas that will work for your college. Best of luck
It's getting there, the more balkanised your systems, with a wide range of age and quality the harder it is.
0 Votes
+ -
Contributr
Other schools
Scott Lowe 26th Apr 2011
StaticFish,

One of the great things about higher ed is the inter-school collaboration that takes place. These kinds of issues are discussed far and wide but, even among schools on the exact same system, the "drift" that takes place is incredible, so there is definitely not a one size fits all opportunity. In speaking with others at other institutions - not necessarily the elites with resources that far outstrip our own - in our class, we're far from unique but we did have one massive disadvantage - poor coordination between departments when it comes to data changes. That's the primary issue we're trying to solve now and this group will also be charged with a good chunk of the cleanup. Regardless of what's happening at other colleges, we will have to do the hard work of the cleanup.

In June, I'll be at a conference with a presentation on this very topic and I intend to ask others in person exactly the same kinds kinds of questions you raised.

Honestly, I think our primary challenge is cultural. Just today, we had one division that, to solve a problem internal to their group, was creating duplicate records in our ERP rather than handling the situation in a way that makes more sense. It was only sheer luck that we even found out this was taking place so that we could stop it. The end result of duplicate records is never good, particularly for those that use the data later in the lifecycle.

Believe it or not, it's better than it used to be, but we have a very long way to go.

I appreciate the comment and can say with certainty that we are looking at others' best practices.

Scott
All too often, a few compromises are allowed to creep in sacrificing the goal for some short term benefit. The main rule is no one gets to corrupt good data, the hard bit is identifing good. sad
I've been on the technical end of a few of these efforts, near everyone was scuppered by lack of leadership in terms of conflicting priorities.
0 Votes
+ -
Contributr
Tony,

I agree with you on all counts. I've started taking a hard line when it comes to tolerance for data integrity and it's a constant challenge. It seems like I'm notified (sometimes too late) of something taking place that, although it might make an individual process take 5 less seconds, creates massive problems down the road. As I stated elsewhere, there is a huge cultural component to this and that's the hardest kind of change to effect.

At every opportunity - whenever one of these situations arises - I don't start with the staff member violating data integrity rules. I start with the division VP explaining why the activity cannot continue and I try to start the conversations with some kind of relatable analogy. That actually worked today in a situation I had and the VP immediately got what I was saying. Although most of the people on the executive team "get" data quality, educating the rest is key in the integrity goal. Fortunately, I have the full support of the president, so that always helps.

Rarely do I simply say "no" to a request -- it might be "not right now" or "we'll put it on the list" but twice in the past year, I've simply said "no" to multiple requests that have come in to IT. These requests were intended to develop systems by which offices could work around data integrity issues. i.e." just add a field for us to do "x" and we don't need to track that data anymore... don't worry... we'll remember it." One of the requestors even had the division VP come and make the request. Again, I said no and told the VP that the only person that would change my mind on it would be the President and then, only over my objection.

Eventually, the VP understood where I was coming from... the needs of one particular office need to be weighed against everyone else's. Keep data clean needs to be a goal for every office... that's going to be hard.

Sorry I ramble.

Scott
0 Votes
+ -
Not rambling at all
Tony Hopkinson Updated - 27th Apr 2011
I specialise in client server database, so I;m the guy who gets forced to screw things up and then the one who gets the blame for it being screwed up.

Ran into a minor one today, legacy database no referential integrity, implied foriegn key to another table with a surrogate identity contraint seeded at one.
So why do some of the furkers have zero in them then?

Didn't see it in my test data, must have been symptom of an old bug, only cost us a few hours, they add up though, in fact experience suggest it's exponential.
0 Votes
+ -
Data Dictator
TAPhilo 27th Apr 2011
Sometimes you really do need a dictator to ensure that everyone marches to the same data definations and values. this is where a top down military style DOES work.
It also requires a very large wall to show all the mappings between systems and to ensure that when a new field is created in ONE system it is make available to all the OTHER systems too so they can use it and it is now locked into existance.
0 Votes
+ -
I assume you have not heard of the latest data fad, Data Governance? Data governance is targeted at solving all the data errors of the past by establishing an elaborate bureaucracy to take over data. There are councils, committees and stewards. There are policies and meetings and meetings and even more meetings.

X-spurts will profess that all organizations have to be mature and embrace data governance. Without data governance your business will face the apocalypse. Data governance will solve business and IT relationship and trust issues. It will clean up bad data and it will ensure excellent customer service.
Of course you could bypass this democratic process and dictate improvements. Or you can permit the current state of data anarchy to prevail.

I suggest a hybrid. Select data management best practices such as data quality that will benefit the business and dictate their adoption. This is the ???low hanging fruit??? and ???just do it??? approach. Then identify preventive practices such as improved database design and data usage practices and embed these into the culture, behavior and processes (e.g. SDLC, BPM etc.) of the organization through enlightenment (not training and education) by showing by doing. This will take years.
Even if you started right, staying right is a constant battle between short and long term, technical and business priorities.
If you didn't start right or didn't stay right, well debts always get called in the real world.
Thing is, this isn't arcane technomage bibble-babble, anyone vaguely competent can understand the fundamental issues, this is 'business' not evaluating the long term effects of a short term bodge correctly, or even at all.

You can explain it, you can demonstrate it, you can prove it, but they needed something yesterday at zero cost, it is what is is.

Low hanging fruit, is an approach to be wary of, if you don't use the gains to build ladders and resource pickers with a head for heights, it's a cop out.
Your data governor gets promoted to CIO, and the next guy looks for a better opportunity ( a new data island hurrah) to get promoted themselves.
Keyboard Shortcuts:
Prev
Next
Toggle
Join the conversation
Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

Join the TechRepublic Community and join the conversation! Signing-up is free and quick, Do it now, we want to hear your opinion.