Overcoming the 3 obstacles of Open Data

Due to recent increased attention on Open Data quality, Experian has been working witthe ODI to investigate the quality of several UK Government open datasets.

For many organisations in the Public Sector, “Open Data” is not a new phenomenon. It’s something that has been enshrined in many people’s daily activities for some time now, and is growing in importance as commercial organisations see value in this data for their own purposes.

However, with budgets under ever increasing strain combined with the time and money it takes to produce Open Data – it has become more of a luxury than a mission-critical part of the public task. This is amplified by the fact the data has to be of a high enough quality to see a strong ROI for the bodies releasing it and the economy at large.

For example, if you want to help people find a local GP or dental surgery by packaging NHS data in a new and innovative manner; the addresses, phone numbers and websites need to be accurate.

During our recent study with the Open Data Institute (ODI), we identified 3 key obstacles that were making tasks such as this, harder than it should be [1]:

1. Data Schemas & Standards do not always exist – even Local Authority spending data in the UK (for which a schema and standards already exist [2] ) is subject to over 80,000 different column headers [3], making simple comparisons difficult and re-use very challenging.

2. Where Data Standards do exist, they are difficult to meet – without significant manual intervention to transform & standardise data from a range of business applications, these standards are unlikely to be met.

3. Balancing the effort required with the reward – with realistic budgets of money and time, the output will often not be high enough quality data to make it actionable and therefore useful to the target audience.

With these 3 data obstacles in mind, we worked with the ODI to try to help. We used our data quality management platform Experian Pandora to transform the data exported by the public bodies’ various business applications into the schemas required for open release. The tool also standardised and improved the quality of the data so information such as phone numbers, postcodes and currencies always appear in the most appropriate form.

The time and money saved by this process, as well as the reduction in effort required to meet transparency requirements, meant that high quality Open Data could be produced but for a much more realistic level of effort and budget – increasing the likelihood of further useful Open Data studies or cost savings for other activities.

With the recent attention on open data quality, we decided to undertake a small investigation looking at the quality of 3 datasets with Experian, using their Experian Pandora data quality tool. The study helped us to discover several important quality issues with Open Data. ”

Leigh Dodds, Associate at The Open Data Institute

You can read more about Experian’s work with the ODI below:

Four things you should know about open data quality

Exploring open data quality