What would you expect to see included in a data quality issues log?

This is a great question because although some modern data quality tools will allow you to track data quality issues, a lot of data quality defects are first reported by everyday business users. We therefore need a method of tracking from both a business and technical perspective.

Here is some of the information I have tracked in the past:

Initial Reporting

  • Who raised the fault? This is the name of the business user who originally observed the issue.
  • Nature of issue reported? This is a simple description of the problem from a business perspective; you can get into more technical details later in the form.
  • Primary application reported? Which application was the source of the issue reported?
  • Secondary applications impacted? Which other systems (either upstream or downstream) are also impacted by this particular defect?
  • Support contact? This is the name of the person who has been assigned to investigate the fault further.
  • Date and time of issue? When was the fault reported?
  • Frequency and history? Is the problem intermittent or a one-off? Has it ever been witnessed before?

Assessment

  • Information chain location(s)? I like to keep a note of which information chains are impacted by the fault; this also helps determine whether there are further upstream or downstream issues to analyse.
  • Data quality to assessment script? When tracking faults I like to create a data quality assessment routine that evaluates the issue to see if there are further occurrences and also to monitor the issue moving forward. Some tools allow you to create custom scripts that can easily be found from your main data quality log. It also helps if you can create a snapshot of what the data looked like at the time of the fault and some data quality tools will allow this.
  • Severity? Following your assessment of the issue you will have a much clearer indication of how big an impact the fault is having on the organisation. You may choose to add another form field to indicate whether the fault is localised (for example to a single record) or more wide-ranging (across possibly thousands of records).
  • Analyst contact? Some organisations will have a dedicated data quality analyst team that investigates the faults so add their details here. It’s always useful to see what faults your team has been assigned to in order to review performance and help plan training more effectively.
  • Data Owner/Steward? Who is responsible for the data? What level of communication is required with them? Both of these things need to be added to your issue logging process.

Resolution

In this section you want to include information that relates to how the problem was solved and what ongoing solutions are in place to prevent the issue recurring.

  • Resolution date/time? Useful for tracking the lead time to completion to ensure your team are meeting service level agreements for example.
  • Resolution method? A detailed explanation of how the problem was resolved.
  • Data quality routines? As in the assessment phase, provide a list of any defect resolution or monitoring scripts that are to remain in place.
  • Status? Is the fault open? Unassigned? Pending assessment? Create some status values that are meaningful to your workflow.

I hope you find these issue log suggestions useful. If you can think of any additional information items that are missing then please post them in the comments section below.