Brief summary of the course
Best practice for performance engineering with C/C++ and Python. Topics include measuring performance, profiling in C/C++ and Python, extending Python with C/C++, application development with GPGPU (NVIDIA CUDA).
Course topics
Single thread programming (12 hours):
Topic | Hours | |
1. | Performance engineering basics. When we need it. | 2 |
2. | Measuring performance, profiling in C/C++ and Python. | 2 |
3. | Libraries in C/C++. Hacker delights (bit-hacks). C to the assembler. | 2 |
4. | Extending Python with C/C++ | 2 |
5. | Python + Pandas, Numpy, and C/C++ libraries data exchange. | 2 |
6. | Cache efficient algorithms. Memory issues. | 2 |
Multicore programming (16 hours) :
Topic | Hours | |
1. | Python parallel applications. Python global interpreter lock. | 2 |
2. | Multithreading and multiprocessing in Python and C/C++ | 4 |
3. | Synchronization and locks | 2 |
4. | NVIDIA CUDA programming for C and Python | 6 |
5. | OpenCL | 2 |
Distributed programming (4 hours).
Topic | Hours | |
1. | Scaling applications with multiple workers | 2 |
2. | Distributed networking algorithms | 2 |
Homework assignments:
- Homework #1 – performance engineering for the single thread application.
- Homework #2 – Development of the multi-thread application.
- Homework #3 – Development of the application based on GPGPU
- Homework #4 – Development of the network scaled application.
Prerequisites
- Basic understanding of computer hardware.
- Basic understanding of networking.
- Basic software programming skills C/C++ and Python.
- Basic OS knowledge.
- Linux/Unix knowledge, bash scripting understanding.