Course topics
Module 1. Why Distributed DB and Distributed Systems? Remote Procedure Call- Network socket
- RPC
- Sync/Async call
- Messaging
- gRPC
- App architecture
- Business transaction vs System transaction. Distributed transactions
- ACID – properties of database transactions
- Transaction isolation levels
- Pessimistic vs optimistic locking. Lost update problem
- 2PC protocol
- 3PC protocol
- RDBMS problems. ORM (Object-relational mapping)
- SQL vs NoSQL
- NoSQL properties (schemaless, aggregate orientation, transactions, …)
- Types of NoSQL databases: KV, Document, Column-family, Graph
- Consistency problem
- Sharding
- Replication
- Consistency models: eventual consistency, monotonic reads, read your writes, strong consistency. Consistency guarantee
- MongoDB and Cassandra parameters for consistency guarantee
- CAP theorem. BASE.
- CAP theorem with SQL and NoSQL DBs
- Polyglot Persistence
- Data locality. Phases
- Standard algorithms: Word count, Inverted index, Top N
- Map/Reduce/Combine functions requirements
- MapReduce alternatives
- RDBMS vs NoSQL vs MapReduce
- Consensus problem. Split-brain problem. Byzantine Generals problem.
- Distributed systems: Communication, Failure Modes, Leader, Consensus, Quorums, Time, Order
- Vector/Lamport Clock
- Replicated state machine
- Raft protocol
- Paxos protocol
Preliminary practical tasks
- Map/Reduce implementation*
- 2PC protocol
- MongoDB basics
- Neo4J basics
- Cassandra data model basics
- MongoDB replication
- Cassandra replication
- MongoDB Map/Reduce
- Raft protocol*