1. |
Parallel Programming Prerequisites |
|
2 hours |
|
Introduction to the nomenclature and architecture of parallel programming. Programming multiple threads vs. programming multiple processes. Memory architecture models. Topics include:
- The case for Parallel Programming
- The case against Parallel Programming
- Flynn's Taxonomy
- Parallel Architectures:
- Multi-Thread
- Multi-Process
- Multi-Processor
- Multi-Node
- Evaluating algorithms:
- Simplicity
- Efficiency
- Portability
- Scalability
|
2a. |
Parallelism with OpenMP |
|
1 hours |
|
Presenting the OpenMP framework for multi threading and parallel programming. This framework, implemented as compiler #pragmas, is widely supported - both in GCC and Visual Studio
Exercises include:
- Open MP 3.0 multithreading sample
|
|
Presenting the Message Passing Interface (MPI) with its latest specification, for multi-core and multi-node programming
Exercises include:
- Sample client/server using MPI
|
2c. |
Introduction to GPU programming |
|
3 hours |
|
GPU programming takes advantage of the Graphics Processor found in many servers, and on dedicated daughter boards. This enables offloading tasks from the CPU to the GPU - as the latter is mostly idle, as well as harnessing its vast computational power, especially for 2D/3D rendering and floating point operations.
- GPU architectures
- NVidia's CUDA frameworks
- OpenCL
- Programming basic kernels
- Data Transfer from CPU to GPu
Exercises include:
- Sample image processing using OpenCL or CUDA
|
3. |
Parallel Design |
|
2 hours |
|
Design concepts involved with parallel programming. What speedups can we expect from a parallelized algorithm? What is the associated overhead? How do we take a serial algorithm and break it down to parallelizable components?
- Algorithmic order of growth and parallel ramifications
- Amdahl's Law
- Decomposing tasks
- Decomposing data access
- Grouping and ordering tasks
|
3. |
Algorithmic Patterns |
|
2 hours |
|
Mapping tasks to actual threads by using algorighmic patterns and methodologies
- Organizing by Tasks
- Task Parallelism - Optimal for CPUs
- Divide & Conquer
- Organizing by data
- Data Parallelism - Optimal for GPUs
- Geometric Decomposition and MapReduce
- Recursive Data Access
|
4. |
Supporting Structures |
|
2 hours |
|
- Program Structures
- Master/Worker
- SPMD
- Loop Parallism
- Fork/Join
- Data Structures
- Shared Data
- Shared Queue
- Distributed Array
|
5. |
Implementation Details |
|
3 hours |
|
Putting all the pieces together, we cover the last remaining aspects and challenges of deploying a parallel program.
- Thread Management
- Thread/Worker pools
- Synchronization
- IPC
|
6. |
Parallel Frameworks and Hadoop |
|
2 hours |
|
This module introduces the Apache Hadoop initiative, meant to provide a parallel programming framework which can be used from Java, Python, or just about any language, to launch parallel programming tasks on massive datasets.
- What is Hadoop?
- The HadoopFS
- Controlling Jobs
- Sample Hadoop programs
|