What is Data Discovery?
Data Discovery describes a range of techniques designed to collect and consolidate information before analysing it to find relationships and outliers between entities (or data items) that may exist. This process may be done on data from the same database or across multiple, disparate databases.
Why is Data Discovery important?
With Big Data being maintained by more and more companies from an ever-increasing number of data sources – the ability to actually drive decisions from this mass of data has become more important than ever. Until the raw data has been analysed using techniques such as Data Discovery, it has very little value for a business.
Data Discovery can help by:
- Identifying outlying data – allowing you to spot potential mistakes.
- Analysing trends – giving you a good indication of why they are happening.
- Spotting dependencies between data – helping you to create validation rules.
What is an example of Data Discovery?
An example of Data Discovery would be discovering which systems are connected by certain keys or identifiers. This is important because understanding connectivity between systems is useful for building accurate data models and a true account of what business services depend on certain data sources.
Data Discovery can also refer to the discovery of dependencies between data elements, both within the same table and across disparate tables. We say that two attributes are dependent when the value of one attribute has a possible influence on values of another (or more) attributes. Dependency analysis is a valuable technique for uncovering hidden data quality rules that require ongoing management and control.
Data Discovery and Dependency Analysis require complex analytical processing functionality to be executed and ideally a correlated architecture for performance reasons.