Course Outline


Understanding the Fundamentals of Heterogeneous Computing Methodology

Why Parallel Computing? Understanding the Need for Parallel Computing

Multi-Core Processors - Architecture and Design

Introduction to Threads, Thread Basics and Basic Concepts of Parallel Programming

Understanding the Fundamentals of GPU Software Optimization Processes

OpenMP - A Standard for Directive-Based Parallel Programming

Hands on / Demonstration of Various Programs on Multicore Machines

Introduction to GPU Computing

GPUs for Parallel Computing

GPUs Programming Model

Hands on / Demonstration of Various Programs on GPU

SDK, Toolkit and Installation of Environment for GPU

Working with Various Libraries

Demonstration of GPU and Tools with Sample Programs and OpenACC

Understanding the CUDA Programming Model

Learning the CUDA Architecture

Exploring and Setting Up the CUDA Development Environments

Working with the CUDA Runtime API

Understanding the CUDA Memory Model

Exploring Additional CUDA API Features

Accessing Global Memory Efficiently in CUDA: Global Memory Optimization

Optimizing Data Transfers in CUDA Using CUDA Streams

Using Shared Memory in CUDA

Understanding and Using Atomic Operations and Instructions in CUDA

Case Study: Basic Digital Image Processing with CUDA

Working with Multi-GPU Programming

Advanced Hardware Profiling and Sampling on NVIDIA / CUDA

Using CUDA Dynamic Parallelism API for Dynamic Kernel Launch

Summary and Conclusion


  • C Programming
  • Linux GCC
 21 Hours

Testimonials (1)

Related Categories