What is Data Munging?
Munging is the standard definition for irrevocably changing or damaging data beyond its original state. The term is thought to have originated as a backronym for “Mash Until No Good”.
However, when referring specifically to Data Munging (or Data Wrangling), it means preparing your data for a dedicated purpose - taking the data from its raw state and transforming and mapping into another format, normally for use beyond its original intent.
Why is Data Munging useful?
Often ‘raw’ data can be hard, even impossible, to analyse and gain useful insights from. This is where somebody will transform the data entires, fields, rows and columns into a more useful format. Activities to achieve this might include:
The final data can then be sent to the relevant data analyst or stored, ready to be analysed at a later date.
What is an example of Data Munging?
A specific example of data munging might be used in Machine Learning, in order to restructure data in a way that could be used by a learning algorithm.
A common example of damaging data is with email addresses. Typically, to prevent spam, a user will destroy the valid format of an email address by writing it in a way that humans understand but computers do not, such as:
JohnDOTdoeATJohnDoeDOTcom or John(dot)doe(at)John(dot)doe(dot)com