Machine Learning applications are dominated by computation in
user-space. However, such as any other application, they are still
subject to Operating System resources and services such as memory and
process scheduling. The preferred programming language for AI research
is Python because of its flexibility and ease of use. The role of Python
is to hide most of this low-level details while providing a flexible
interface suitable for most applications. However, such abstraction
comes at a performance cost. Our work focused on studying the
performance of the popular PyTorch framework at the kernel level.
We find that a number of optimizations can boost training performance
above 2x.

In this seminar, we will summarize the current status of this work
and walk through the steps followed to optimize the performance of
both the Linux Kernel and IHK+McKernel on AI workloads. Additionally,
we introduce the tools used for such purpose: LTTng and Perf.