Course Outline


  • What is OpenACC?
  • OpenACC vs OpenCL vs CUDA vs SYCL
  • Overview of OpenACC features and architecture
  • Setting up the development environment

Getting Started

  • Creating an OpenACC project in Visual Studio Code
  • Exploring project structure and files
  • Compiling and running the program
  • Displaying output with printf and fprintf

OpenACC Directives and Clauses

  • Understanding OpenACC directives and clauses
  • Using parallel directives for creating parallel regions
  • Using kernels directives for compiler-managed parallelism
  • Using loop directives for parallelizing loops
  • Managing data movement with data directives
  • Synchronizing data with update directives
  • Improving data reuse with cache directives
  • Creating device functions with routine directives
  • Synchronizing events with wait directives


  • Understanding the role of OpenACC API
  • Querying device information and capabilities
  • Setting device number and type
  • Handling errors and exceptions
  • Creating and synchronizing events

OpenACC Libraries and Interoperability

  • Understanding OpenACC libraries and interoperability
  • Using math, random, and complex libraries
  • Integrating with other models (CUDA, OpenMP, MPI)
  • Integrating with GPU libraries (cuBLAS, cuFFT)

OpenACC Tools

  • Understanding OpenACC tools in development
  • Profiling and debugging OpenACC programs
  • Performance analysis with PGI Compiler, NVIDIA Nsight Systems, Allinea Forge


  • Factors affecting OpenACC program performance
  • Optimizing data locality and reducing transfers
  • Optimizing loop parallelism and fusion
  • Optimizing kernel parallelism and fusion
  • Optimizing vectorization and auto-tuning

Summary and Next Steps


  • An understanding of C/C++ or Fortran language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors


  • Developers who wish to learn how to use OpenACC to program heterogeneous devices and exploit their parallelism
  • Developers who wish to write portable and scalable code that can run on different platforms and devices
  • Programmers who wish to explore the high-level aspects of heterogeneous programming and optimize their code productivity
 28 Hours

Testimonials (2)

Related Categories