Oct 2018 | Data Quality | Data Validation

Data Standards – devices, languages, formats

In my last blog we looked at the rise of IoT and some of the challenges that organisations are facing around the privacy and security of data. But what about the data itself? Whilst there might be more data out there than ever before, for it to be useful there’s some very important groundwork to be done. Let me explain.

Any complex process or chain of events (be it supply chains, manufacturing, price comparison and so on) require data from locations, services or devices to be brought together. Often, this data is in different formats and even if standards exist there can be problems with the collection, interpretation or manipulation that lead to an unexpected entry or error.

As an example, UK public authority open spending data ‘should’ be published in a particular standard (order of columns, contents, currency etc) but there are an estimated 81,000 column names alone. This makes even basic analytics of trends time-consuming – a data scientist will need to fix these issues first before they can get to work finding potential efficiencies across our public bodies (ironic, I know). Whilst some bodies are now using tools like Experian Aperture Data Studio to auto-standardise their data and prevent these problems at source, you can easily imagine how similar issues in the IoT could negate many of the opportunities offered by this new paradigm.

If my car can’t talk to my house because the house wants distances in KM but my car sends them in ‘time to destination’ then how will the oven know to switch on when I’m 5 minutes from home? We’re certainly seeing improvements in this area thanks to platforms such as Apple’s HomeKit, Amazon’s Alexa and Google Home but with any nascent ‘standard’ it’s already becoming difficult for consumers to buy things that connect to one or multiple platforms seamlessly (remember Blu-Ray v HD DVD?) due to differences in technical and data standards.

Whether true interoperability emerges remains to be seen but history shows how important this will be in order for the benefits of choice, competition and quality to be felt. What will be critical in the meantime though will be the ability to work with multiple standards – in my world, this is data standards. Organisations looking to take advantage of the IoT will need business-friendly tools that can help them transform data from different sources so they can use it at pace.

Data Quality – timeliness, integrity, linkability

A while ago, I tried to construct my own IoT sensor for the office – the temperature is a constant bugbear of mine so I wanted to measure it and see if I could find any trends that we could act upon (for example, if I linked the sensor data to phone logs, could I link temperature to productivity?).

Unfortunately, my shaky memory of resistors meant that I burnt out the temperature sensor immediately. All I could discern was a temperature of 999oC (even though our office was often too hot, I suspected this to be incorrect).

What I did learn though was that this simple sensor would have given me questionable data anyway – reading up on the particular model I had chosen not only identified how easy it was to burn out (thanks) but that it often went wrong if you requested readings on too regular a basis. There were also differences in readings from this model to others.

Clearly, if an organisation is considering using IoT sensors then, just like any other data collection method, they should first understand what data is being collected and at what level of integrity. Is one device as accurate as another? Does the timeliness of data affect its usefulness? How can you tell if the data collected is ultimately trustworthy?

It’s also important to consider how to link one dataset to another. If, like me, you want to look for how an environmental factor influences productivity (human or otherwise) then you’ll need data elements that can be compared easily – such as time periods that are either identical or at least comparable (for example, a sensor working once per hour and a call log system in minutes can be linked but seconds to days would be more difficult).

What the IoT may also lack for the short term is external validation of data. In a more traditional data domain such as party data (otherwise known as contact data), there are address validation tools and datasets of common names to check spellings and email validation services. In the IoT, there is not yet a shared resource to help validate whether data from a sensor or device matches an agreed standard. Standards and data quality go hand in hand so it will be important for open and/or shared resources to be made available to check whether (for example) a temperature sensor is correctly calibrated.

Towards a connected future

Whilst challenges are there now and new ones will emerge in the years to come, the promise of smart cities, smart energy grids, connected cars, smart homes and a plethora of new services to help us live our lives are immense. Clearly, though, this is a jigsaw puzzle where we’re missing a couple of the corners.

From a data management point of view, the IoT presents challenges not only in the areas of volumes, variety, velocity and veracity but also in the areas of trust, transparency and ethics.

If a consumer can’t trust that their smart meter / thermostat combo is working for their benefit (and their benefit alone) then they may be reticent to share their data with providers or even to use the very devices that are meant to make their lives easier (and in this example, save energy).

So what can we, as organisations, start doing now to build trust in IoT data both for our colleagues and the consumers who will be interacting with devices and ultimately seeing the output of them in the decisions that impact their lives? Here are a few ideas to get you started:

  • Ensure your existing data is trustworthy. Basic validation and maintenance of customer, staff and, supplier data needs to meet your business requirements. If this isn’t up to scratch, then moving into new forms of data will be much more complex and risky. Tools like Experian Aperture can help organisations maximise the quality of their data.
  • Agree on your business data strategy. With existing and new forms of data powering your growth, you need to define how you manage the people, processes and, technology that support your data, analytics and decisions as well as the basic data standards, rules and formats.
  • Set up a data governance team. Whether it be a data council, data stewards group, chief data officer or something else, it’s important to have a focal point for your efforts made up of the experts in your business – both technical and non-technical. Use this group to explore the data quality requirements for an IoT strategy.
  • Test and learn. If you’re not yet sure whether the IoT will be of benefit to you, start with some simple hypotheses that make sense to your colleagues or customers, test them out and then report back to the business. Specialists in IoT like The Things Network or the Raspberry Pi Foundation offer some great ideas and support.
  • Work with your industry. If you’re thinking about IoT then it’s pretty certain that your competitors and partners are. Work with industry bodies (or set one up!) to explore the impact of IoT on your market and start exploring the data formats and standards that will be critical to your success.

From my very basic and unsuccessful attempt to create an IoT device, I have at least learnt that the technology needs focus as well as the data. For anyone thinking about stepping into the IoT arena it’s this point that they should take away – the technology is exciting but the impact can only be realised when the data is managed effectively. Oh yes, and if you build a fridge that stops me eating cheese I will be very, very unhappy.