Parallel Programming Paradigms
Duration: 2 days

Back to course list Home
Synopsis Parallel programming is a rapidly evolving realm, allowing for efficient utilization of multi-processing and clustering architectures. This course is an advanced engineering course dealing with the efficient design and construction of parallel programs. We consider examples from massive parallel architectures, such as the Search for Extra Terrestrial Intelligence (SETI), MapReduce, and even spam delivering botnets.
Examples in this course are provided in pseudocode, C++ and Java.

The course can serve as an overview, or as a detailed explanation of several leading technologies - GPU programming, OpenMP, MPI and others. These modules are optional, depending on the programming paradigm you or your company choose to implement. The discussion of these technologies presented in this syllabus is kept to an introduction, but can be expanded as desired. The demonstrations of theoretical principles in code can be presented in any (or all) of these frameworks.

Target Audience Application developers, working on multi-threaded or other distributed code
Prerequisites
Objectives
  • Understand the differences between serial and parallel code
  • Effectively analyze and isolate parallelizable sections of code
  • Become familiar with common parallel programming libraries: OpenMP and MPI
  • Understand and employ threading design patterns
  • Efficiently use synchronization mechanisms and messaging to reduce overhead of parallelized code
  • Effectively code multi-threaded and parallel programs
Exercises This course allocates plenty of time for hands-on practice, which is included in the time alloted for each module. The hands-on exercises include:
  • Taking a serial algorithm (Fourier Transform) optimizing it and parallelizing it (FFT)
  • Optimizing and distributing sorting and searching algorithms
  • Optimizing and distributing graph and/or matrix algorithms
Suggested Reading: The following books are suggested as additional references for this course:
Modules
1. Parallel Programming Prerequisites
2 hours
Introduction to the nomenclature and architecture of parallel programming. Programming multiple threads vs. programming multiple processes. Memory architecture models. Topics include:
  • The case for Parallel Programming
    • The case against Parallel Programming
      • Flynn's Taxonomy
        • Parallel Architectures:
          • Multi-Thread
            • Multi-Process
              • Multi-Processor
                • Multi-Node
                • Evaluating algorithms:
                  • Simplicity
                    • Efficiency
                      • Portability
                        • Scalability
                        2a. Parallelism with OpenMP
                        1 hours
                        Presenting the OpenMP framework for multi threading and parallel programming. This framework, implemented as compiler #pragmas, is widely supported - both in GCC and Visual Studio

                        Exercises include:
                        • Open MP 3.0 multithreading sample
                        2c. MPI
                        2 hours
                        Presenting the Message Passing Interface (MPI) with its latest specification, for multi-core and multi-node programming

                        Exercises include:
                        • Sample client/server using MPI
                        2c. Introduction to GPU programming
                        3 hours
                        GPU programming takes advantage of the Graphics Processor found in many servers, and on dedicated daughter boards. This enables offloading tasks from the CPU to the GPU - as the latter is mostly idle, as well as harnessing its vast computational power, especially for 2D/3D rendering and floating point operations.
                        • GPU architectures
                          • NVidia's CUDA frameworks
                            • OpenCL
                              • Programming basic kernels
                                • Data Transfer from CPU to GPu


                                  Exercises include:
                                  • Sample image processing using OpenCL or CUDA
                                  3. Parallel Design
                                  2 hours
                                  Design concepts involved with parallel programming. What speedups can we expect from a parallelized algorithm? What is the associated overhead? How do we take a serial algorithm and break it down to parallelizable components?
                                  • Algorithmic order of growth and parallel ramifications
                                    • Amdahl's Law
                                      • Decomposing tasks
                                        • Decomposing data access
                                          • Grouping and ordering tasks
                                            3. Algorithmic Patterns
                                            2 hours
                                            Mapping tasks to actual threads by using algorighmic patterns and methodologies
                                            • Organizing by Tasks
                                              • Task Parallelism - Optimal for CPUs
                                                • Divide & Conquer
                                                • Organizing by data
                                                  • Data Parallelism - Optimal for GPUs
                                                    • Geometric Decomposition and MapReduce
                                                      • Recursive Data Access
                                                      4. Supporting Structures
                                                      2 hours

                                                      • Program Structures
                                                        • Master/Worker
                                                          • SPMD
                                                            • Loop Parallism
                                                              • Fork/Join
                                                              • Data Structures
                                                                • Shared Data
                                                                  • Shared Queue
                                                                    • Distributed Array
                                                                    5. Implementation Details
                                                                    3 hours
                                                                    Putting all the pieces together, we cover the last remaining aspects and challenges of deploying a parallel program.
                                                                    • Thread Management
                                                                      • Thread/Worker pools
                                                                        • Synchronization
                                                                          • IPC
                                                                            6. Parallel Frameworks and Hadoop
                                                                            2 hours
                                                                            This module introduces the Apache Hadoop initiative, meant to provide a parallel programming framework which can be used from Java, Python, or just about any language, to launch parallel programming tasks on massive datasets.
                                                                            • What is Hadoop?
                                                                              • The HadoopFS
                                                                                • Controlling Jobs
                                                                                  • Sample Hadoop programs