Principles of CUDA Parallel Processing for C/C++ Programmers
Overview
This workshop introduces C/C++ programmers to the elements needed to access the features and functionality of the NVidia CUDA runtime to enable parallel processing large volumes of data.
Who Should Attend
This lecture and lab-based workshop is intended for software software developers who have tasked with building new libraries, or enhancing existing libraries, to be able to offload computationally-intensive operations from the main system CPU to the GPU when one or more CUDA-compatible GPUs is present.
Workshop Highlights
The role of Compute support in software development.
CUDA unit testing strategies.
Memory management of applications, CPUs, and GPUs.
GPU Threads, streams, and events.
Parallel execution strategies.
Memory coalescing and performance.
Multi-GPU data management.
GPU Occupancy.
Optimizing performance using tensor cores.
Performance Objectives
At workshop completion, attendees will be able to...
Describe how to identify a compute candidate within existing software.
Setup a development workstation and create a Hello Cuda program.
Describe three strategies for maximizing computational, memory, and instruction throughput.
Write code to access the CUDA device interface.
Write code to set up unit testing for CUDA-based software.
Describe the organization of application, CPU, and GPU memory.
Describe the difference between CPU threads and GPU threads.
Write code using streams and events to optimize compute throughput.
Write code to identify Warp Divergence.
Setup a development workstation with profiling tools and profile CUDA execution.
Write code to detect multi-GPU systems and collect KPIs for multi-GPU usage.
Describe the rationale and strategies for memory coalescing.