Distributed Databases and Distributed Systems

Course topics

Module 1. 

Why Distributed DB and Distributed Systems?

Remote Procedure Call

  • Network socket
  • RPC
  • Sync/Async call
  • Messaging
  • gRPC

RDBMS

  • App architecture
  • Business transaction vs System transaction. Distributed transactions
  • ACID – properties of database transactions
  • Transaction isolation levels
  • Pessimistic vs optimistic locking. Lost update problem

Distributed transactions

  • 2PC protocol
  • 3PC protocol

Module 2. 

NoSQL

  • RDBMS problems. ORM (Object-relational mapping)
  • SQL vs NoSQL
  • NoSQL properties (schemaless, aggregate orientation, transactions, …)
  • Types of NoSQL databases: KV, Document, Column-family, Graph

Distribution Models

  • Consistency problem
  • Sharding
  • Replication
  • Consistency models: eventual consistency, monotonic reads, read your writes, strong consistency. Consistency guarantee
  • MongoDB and Cassandra parameters for consistency guarantee

CAP theorem

  • CAP theorem. BASE.
  • CAP theorem with SQL and NoSQL DBs
  • Polyglot Persistence

Module 3

MapReduce

  • Data locality. Phases
  • Standard algorithms: Word count, Inverted index, Top N
  • Map/Reduce/Combine functions requirements
  • MapReduce alternatives
  • RDBMS vs NoSQL vs MapReduce

Module 4

Distributed systems

  • Consensus problem. Split-brain problem. Byzantine Generals problem.
  • Distributed systems: Communication, Failure Modes, Leader, Consensus, Quorums, Time, Order
  • Vector/Lamport Clock

Consensus protocols

  • Replicated state machine
  • Raft protocol
  • Paxos protocol

Preliminary practical tasks

  • Map/Reduce implementation*
  • 2PC protocol
  • MongoDB basics
  • Neo4J basics
  • Cassandra data model basics
  • MongoDB replication
  • Cassandra replication
  • MongoDB Map/Reduce
  • Raft protocol*

Prerequisites

None