What is Data Preparation?
Data Preparation describes a range of processing activities that take place in order to transform a source of data into a format, quality and structure suitable for further analysis or processing. It is often referred to as Data Pre-Processing due to the fact it is an activity that organises the data for a follow-on processing stage.
Examples of Data Preparation activities are:
- Dealing with unstandardised, unstructured or inconsistent data (from scraped PDFs or manually inputted data for example).
- Combining data from different sources with different formats.
When is Data Preparation required?
The most common situation where Data Preparation is necessary is when a business intelligence platform needs source data to be adapted and transformed to meet the needs of data mining and data visualisation tools. Any instance where 2 data sets are being combined (due to 2 separate data storing systems within a company being brought together for example), there will be a requirement to prepare the data into one, useable format and structure.
Why is Data Preparation important?
Combining 2 sets of inconsistent, poor quality data will likely result in inconsistent, poor quality insights from any future data mining or visualisation attempts. The better the quality data, and the smoother the transformation process – the more likely you will be able to turn the data into usable information.