Section : Sharding | Big Data and Cloud Computing | UDBKM

Résumé de section

- Sélectionner l’activité Definition
  
  Sharding is the process of horizontally partitioning a large dataset into a collection of smaller, more manageable datasets called shards.
  
  The shards are distributed across multiple nodes, where a node is a server or a machine.
  
  Each shard
  
  It is stored on a separate node and each node 1s responsible for only the data stored on it.
  
  It shares the same schema, and all shards collectively represent the complete dataset.
- Sélectionner l’activité Extra
  
  Sharding allows the distribution of processing loads across multiple nodes to achieve horizontal scalability.
  
  Horizontal scaling is a method for increasing a system’s capacity by adding similar or higher capacity resources alongside existing resources.
  
  Since each node is responsible for only a part of the whole dataset, read/write times are greatly improved.
  
  How sharding works in practice:
  
  Each shard can independently service reads and writes for the specific subset of data that it is responsible for.
  
  Depending on the query, data may need to be fetched from both shards.
  
  A benefit of sharding is that it provides partial tolerance toward failures.
  
  Incase of a node failure, only data stored on that node is affected.