Improving data collection

Following on from my recent Experian Data Quality webinar appearance, I wanted once again to respond to one of the many questions posted by attendees during the session.

This week I’m sharing some tips for one attendee who wanted to know how to improve their data collection approach across multiple departments of the organisation.

Tip 1: Create a master data strategy for your reference data

One of the big problems facing organisations trying to improve accuracy is ensuring that they have consistency of common reference information across different departments and functional areas within the business.

For example, one utilities organisation had 3 different departments (Power Management, Communications, Planning and Design) that all stored the same type of information on equipment location. However, each system created its own standard for floor levels, suite locations and rack placements. This led to total confusion whenever the systems needed to be reconciled by field engineers.

The solution was to create a single, trusted source, of reference data to commonly identify equipment locations. This was a painful exercise to go through but the returns were huge. Historically, every time an engineer searched for a piece of equipment and failed to find spare capacity they would order more unnecessary equipment, adding further costs to the bottom line.

By standardising a master reference list across all applications you can prevent search failures from creating stranded assets and operational waste.

Tip 2: Create a central library of data quality rules for validation at source

More and more organisations are creating data quality rules to validate whether data collected upstream is of sufficient quality to support downstream processes. The problem is that these rules are not being re-used in the actual data collection and validation phase to ensure high quality information at source.

The more modern data quality tools enable ‘point-of-source’ data validation using sophisticated data quality rules that are executed in real-time via web services and API ‘hooks’.

This means you create a master set of rules that can be shared by all departments operationally.

Tip 3: Analyse your data profiling results regularly to determine opportunities for defect prevention

A lot of companies deploy data profiling and data quality analysis but rely on periodic data cleansing to resolve the issues they’ve discovered.

The smarter, more profitable, approach to long-term data quality is to analyse why repeatable defects are occurring and re-engineer your data collection processes to prevent defects at source. You can use the techniques mentioned in Tip 2 or it may be something as simple as tweaking your application logic to add extra validations.

Another tip is to analyse your data using the Pareto (80/20) principle so that you find the primary source of defect types that are causing the bulk of the issues.

Tip 4: Capture user, timestamp and information chain identifiers to enable defect tracking

Whilst this tip doesn’t prevent defects it definitely makes it easier to trap the root-cause and apply some of the prevention remedies listed in the other tips.

In many organisations it’s impossible to ascertain the source and flow of data across the organisation. By adding a small amount of information to each table such as user identifier, date and time of data creation, sequence number and information chain locator you can track the precise origination of your collected data.

I’ve used this technique to perform simple improvements such as identifying the team members that require additional training and the information chain locations that are ‘repeat offenders’ for data defects and therefore ripe for preventative measures.

Tip 5: Improve application search capabilities

Poor search usage during data collection is one of the biggest causes of data collection errors. When users can’t find a required value they simply create a duplicate entry or add their own reference data.

There are no excuses these days not to employ more flexible, ‘fuzzy style’ search algorithms for common data types such as products, names and addresses. By leveraging data matching technology within your application you can radically improve data collection performance.