The course for engineers building the infrastructure that ML runs on. Covers distributed training communication (NCCL, MPI), custom CUDA kernels for attention and matmul, model parallelism strategies, and building a parameter server in C++. Used by engineers at ML infrastructure teams.
Duration
20 hours
Students
480+
Rating
⭐ 5.0
“Practical, rigorous, and immediately applicable. Velmio courses are genuinely different.”
Mark T.
Head of Data, NHS Digital
Order Summary
C++ for ML Infrastructure
By enrolling you agree to our Terms and Privacy Policy.