Course Code:
CEID_NE5407
Type:
Period:
Winter Semester
Division:
Credit Points:
5
- Software development issues in HPC environments
- Cost of Software Development in HPC Environments
- Scalability and transferability of code
- Data complexity and parallel algorithms
- Batch Systems
- Usage restrictions of resources in HPC environments
- SLURM
- Nvidia GPU architecture
- Streaming Processor (SP)
- Streaming Multiprocessor (SM)
- SM features per GPU generation
- Other architectural features
- The concept of Compute Capability
- The CUDA Programming Model
- What is CUDA?
- The concept of a Host and a Device
- Grid and blocks of threads
- Limitations on grid and block sizes
- Computational kernels
- Embedded variables of CUDA
- CUDA Runtime Flow
- Workload distribution
- Memory access optimization
- Exploitation of the CUDA memory hierarchy
- CUDA shared memory programming strategy
- Splitting data into smaller tiles
- Data reuse
- Performance optimization issues
- Barriers
- DRAM bursting and exploitation in CUDA applications
- CUDA shared memory programming strategy
- Exploitation of the CUDA memory hierarchy
- Program flow control
- Warp divergence
- Warp divergence avoidance
- Warp divergence
- Atomic instructions
- Atomic instructions in CUDA
- The Compare-And-Swap (CAS) atomic instruction
- Implementing other atomic instructions using CAS
- Atomic instructions in CUDA
- CUDA Streams
- Synchronous & Asynchronous execution
- CUDA Streams
- Assigning calculations to a stream
- Scheduling within a stream
- Asynchronous data transfer from/to the GPU
- Events
- Synchronous & Asynchronous execution
- Architecture of the Xeon Phi coprocessor
- Native and offload programming modes
11. Application of the OpenMP programming model on the Xeon Phi coprocessor
Related Announcements
Oct 10 2020
Oct 10 2021