Skip to content

ECE408

ECE408 (Applied Parallel Programming) is a 4-credit-hour senior level course that satisfies the Advanced Computing elective requirement for Computer Engineering majors.

ECE408 is cross-listed as CS483.

Content Covered:

  • CUDA C Programming
  • NVIDIA Nsight Compute / Nsys profile tools
  • GPU vs CPU Architecture
  • State of the industry
  • Memory Bandwith Optimizations
  • Reductions and Convolution
  • Matrix Multiplication Techniques
  • Sparse Matrix Formats
  • Neural Networks
  • Atomics and Tiling

The primary framework used in this class is CUDA, a C-like language maintained by NVIDIA that supports programming on GPUs and GPU clusters.

This course has also included a few guest lectures in past semesters, from people who use parallel programming in their work or research. Guest lectures include those from NVIDIA, AMD, Intel and others.

Prerequisites

ECE 220 is listed as a prerequisite, as most of this course is taught in C and requires a good understanding of pointers and memory management. For students not confident in their C skills, it may be wise to wait until taking ECE 391 or CS 225 in order to improve their programming skills first.

When To Take It

The course is typically offered during both the Fall and Spring semesters. Most students take this technical elective during their Junior or Senior year.

The class does not require a very high workload, so don't worry much about taking this with other hard ECE classes. Taking this class before 411 may be a good introduction to the differences between GPUs and CPUs.

In general, ECE 408 is a valuable class to take as someone wanting to work with low-level software or computer architecture. Parallel Programming and the differences between GPUs and CPUs are important topics to understand for any Computer Engineering student. For those interested in Nvidia internships, knowing CUDA will definitely give you an edge, but the concepts taught in the class extend to all companies doing anything with parallel programming.

Course Structure

Lecture is traditionally twice a week for 75 minutes each.

The assignments consist of 6-7 MPs that are all written in CUDA and executed on a GPU cluster (WebGPU). Each MP has a set of questions for each problem, worth 10% of the MP. The quiz questions are usually on canvas and either multiple choice or short answer.

The exams have multiple choice, short answer, and code problems where you fill in the missing parameters. Do not underestimate the exams. The content of the exams covers algorithms used in MPs as well as unused options brought up in lecture. Another significant topic is the effect of certain parameters and algorithms on the performance of the program.

This course also includes a final project where one designs and optimizes the forward propagation stage of a neural network. The final project is primarily graded on the checkpoint reports and optimizations used. There is a final competition at the end, with opportunities for extra credit for those with the fastest kernels.

Grade breakdown (for Spring 2023):

  • Exam 1: 20%
  • Exam 2: 20%
  • Lab Assignments: 35% (5% each)
  • Correctness (number of data sets passed) 90%
  • Quiz Questions 10% (Lab with the lowest score is dropped.)

  • Final Project: 25%

Instructors

Applied Parallel Programming has been taught by Professor Wen-Mei Hwu in the past, who designed the class material. It is currently being taught by Professor Sanjay Patel and Professor Volodymyr Kindratenko in Spring 2023. There are usually a few TAs who grade the MPs and Exams as well as holding office hours.

Course Tips

For 4 credit hours, this class is not terribly work intensive during the semester. The lecture material prepares students very well for the MPs, with important portions of the project are typically explained and analyzed thoroughly in lecture before the due date. Having a strong understanding of the problem and the parallel solution helps incredibly in test and MP performance.

Life After:

If you really enjoyed this course, check out ECE 493 (CS 420) - Parallel Progrmg: Sci & Engrg or ECE ECE 508 - Manycore Parallel Algorithms.

If you are interested in working on the CUDA language itself or contributing to open parallelism platforms such as OpenMP, consider applying for internships at companies such as NVIDIA, AMD, and Intel.

If you are open to applying CUDA or parallel computing in industry, there are a wide variety of opportunities available.

In addition, expertise in parallel programming is highly valued in the academic community, particularly in research areas that deal with large amounts of data such as computational genomics and biomedical imaging. There are many opportunities for undergraduate and graduate students to get involved with research on campus if they are competent in parallel software design.

Infamous Topics

  • The Midterms: Do not underestimate the difficulty of the midterms.
  • The Textbook: As of Spring 2023, there were various errors in the textbook provided, be careful trusting the textbook too much.

Resources

  • [CUDA C++ Programming Guide] (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)