Exercise 1:
-
Parallel Data Processing:
Explain what it means and give one simple real-life example.
-
Distributed Data Processing:
How is it different from parallel processing?
-
Hadoop:
What is Hadoop used for? Mention one of its main components.
-
Workload Processing:
Why is it important to manage workloads in data processing?
-
Cluster:
What is a cluster and why is it used in big data?
-
Stream vs. Batch Processing:
Give one example for each.
Exercise 2:
Imagine your university library wants to analyze all the books borrowed in the last 10 years to improve services:
-
Decide if you would use stream processing or batch processing, and explain why.
-
Suggest if a cluster would be useful for this project, and justify your answer.