Section outline

    • Exercise 1:

      1. Parallel Data Processing:
        Explain what it means and give one simple real-life example.

      2. Distributed Data Processing:
        How is it different from parallel processing?

      3. Hadoop:
        What is Hadoop used for? Mention one of its main components.

      4. Workload Processing:
        Why is it important to manage workloads in data processing?

      5. Cluster:
        What is a cluster and why is it used in big data?

      6. Stream vs. Batch Processing:
        Give one example for each.

      Exercise 2:

      Imagine your university library wants to analyze all the books borrowed in the last 10 years to improve services:

      • Decide if you would use stream processing or batch processing, and explain why.

      • Suggest if a cluster would be useful for this project, and justify your answer.