Data from outside sources often comes in messy formats that can’t be used right away. To fix this, data wrangling is used. This means cleaning, organizing, and preparing the data so it can be analyzed properly.
Here’s how it works:
- First, the raw data is stored as it was received.
- Then, after cleaning and organizing, the prepared data is stored again for use.
Typically, storage is required whenever the following occurs:
- We get new data from outside or plan to use it in big data systems.
- The data is changed to make it easier to analyze.
- We process data through ETL (Extract, Transform, Load) or get results from an analysis.