GPU Programming with OpenCL and CUDA
Duration: 2 days

Back to course list Home
Synopsis Today's Graphics Programming Units (GPUs) do a lot more than displaying pixels at ever increasing resolutions. They are capable of on-chip rendering, texture-mapping, and are just as powerful as (and sometimes more so than) the CPUs.
This course explains two of the leading GPU programming architectures: The first, CUDA, is specific to NVIDIA devices. The second - OpenCL - is a layer of abstraction, supported by both NVIDIA and AMD/ATI Radeon's. By demonstrating how to build kernels (execution units), this course provides a reusable infrastructure and templates for offloading complicated tasks to the GPU.
Examples in this course are provided in OpenCL and CUDA.

Target Audience Application developers, working on multi-threaded or other distributed code, who would like to make the transition to GPU programming
  • Understand how to access GPU functionality with Apple's OpenCL or NVidia's CUDA
  • Understand how to construct kernels for GPUs
  • Explain the differences between CPUs and GPUs
Exercises This course allocates plenty of time for hands-on practice, which is included in the time alloted for each module. The hands-on exercises include:
  • Taking a serial algorithm (Fourier Transform) optimizing it and parallelizing it (FFT)
  • Optimizing and distributing image rendering algorithms
Exam: An exam is available for this course!
Suggested Reading: The following books are suggested as additional references for this course:
1. GPU Architectures explained
2 hours
An introduction to Graphics Programming Units.
  • Motivation
    • GPU basic concepts
      • Programming Standards:
        • CUDA - NVidia's GPU programming language
          • OpenCL - Khronos (supported by Apple
          • The Programming Model
            2. Hello, GPU
            2 hours
            This module presents a simple GPU program - or kernel - and covers the steps required to create, build and execute it
            • The Build Environemnt - Setting up CUDA or OpenCL
              • A simple kernel
                • OpenCL/CUDA C modifications
                  • Executing a kernel on one or more GPU cores
                    • Basic debugging techniques

                      Exercises include:
                      • Creating a simple combined CPU/GPU program (driver + kernel) and executing it
                      3. Runtime considerations
                      2 hours
                      Using CUDA, more than one core may be available on the GPU for processing. Likewise, with OpenCL, multiple cores on GPUs and CPUs may be harnessed for computation. A necessary step is to establish devices and context in runtime, to know how to best use the available resources, and divide the work load between them.
                      • The OpenCL Platform layer
                        • Runtime detection of devices
                          • Determining runtime capabilities
                            • Binding to specific devices using contexts

                              Exercises include:
                              • Creating a simple CPU program to query available GPUs and print out statistics
                              5. Memory and advanced aspects
                              2 hours
                              Going deeper, this module explains the types of memory available in an OpenCL or CUDA program, and how they vastly affect system performance
                              • Host/Device Communication
                                • Memory Types
                                  • Register memory
                                    • Shared memory
                                      • Global memory
                                        • Texture-mapped memory
                                        • Synchronization primitives

                                          Exercises include:
                                          • Applying an image filter using different memory modes, and benchmarking