Young man drinking juice while using laptop at table in house

I recently attended the IRM Enterprise Data and BI conference held from 4th to 6th November in London. This conference was a great opportunity to not only present but also to gather knowledge from many data practitioners.

One of my favourite sessions was Defining Data Quality Dimensions presented by Nicola Askham and Denise Cook, which gave the audience an opportunity to understand and review their recent white paper covering the six primary dimensions for data quality assessment.

It was interesting that whilst this was the last slot of the three-day conference, the debate that it sparked was unprecedented! I will let you read the whitepaper at your own leisure but was keen to put my thoughts forward on the six dimensions, using the definitions provided by DAMA.

1. Completeness, Uniqueness and Validity of data

We start with the three dimensions I think are relatively easy to understand and measure.

Data that is not unique can waste time and money. Duplicate data delivers multiple letters to the same customer creating a negative impact. It hides the true view of inventory held on a product, wreaking havoc on your purchasing strategy.

My Interpretation

Impact if not met

Completeness

The proportion of stored data against the potential of “100% complete”.

We know when a field has a value and when it does not. Completeness easily tells us how much we know about a customer, how identifiable is a location, or how well a product is defined.

Not having a telephone or mobile number means you cannot call the customer. Not defining product attributes means your customer does not understand enough about what they are trying to buy.

Uniqueness

Nothing will be recorded more than once based on how that thing is identified.

Uniqueness tells you what makes a data entity one of its kind when it is not maintained, we get duplicates. People, products, suppliers are all entities that you expect to be unique.

Data that is not unique can waste time and money. Duplicate data delivers multiple letters to the same customer creating a negative impact. It hides the true view of inventory held on a product, wreaking havoc on your purchasing strategy.