Architecture Design

Course overview

Quite often engineers are the experts of one tool or another. They know how to use it effectively to solve some particular tasks. However, it is much rarer the case, when data engineer knows how all those tools come into the big picture. In this course, you’ll learn exactly that. 

We’ll teach you how to look at the systems both from the high-level architecture perspective and in the details. We’ll show how to break down and analyze requirements for the system and design effective solutions. In the scope of this course, we’ll cover typical components of distributed systems, how they work and connect to each other, and, most importantly – how to use them as building blocks to compose the whole system. We’ll review common architecture patterns and cases where to use them.

Course topics

 Introductory part

  • What is Software Architecture and why it is important
  • Contexts of Software Architecture

Topic 1: Quality attributes

  • Quality attributes
  • Data solution quality attributes: scalability, performance (latency, throughput), security & compliance, resilience, portability, supportability, COST
  • Other Quality attributes
  • Architectural Tactics and Patterns
  • Quality attribute modeling and Analysis

Topic 2: Typical data processing architectures

  • Data Warehouse and Data Lake
  • Batch ETL pipeline
  • Streaming data processing
  • Lambda architecture and Kappa architecture
  • Data privacy in modern architectures

Topic 3: Architecture in the life cycle

  • Architectural in Agile projects
  • Architecture and Requirements
  • Tradeoff Analysis method
  • Designing & Documenting an Architecture

Topic 4: Typical data architectures tradeoffs

  • RDBMS vs NoSQL
  • ETL tool vs ETL code
  • Hadoop & Spark vs MPP database
  • Open-source vs proprietary product, specific cloud vs multi-cloud
  • PaaS/SaaS & serverless vs hosted solutions

Prerequisites

  • Distributed Databases
  • Big Data Hadoop & Spark
  • Distributed Systems
  • Some basic knowledge of streaming and message queues will be helpful