Machine Learning applications are dominated by computation in user-space. However, such as any other application, they are still subject to Operating System resources and services such as memory and process scheduling. The preferred programming language for AI research is Python because of its flexibility and ease of use. The role of Python is to hide most of this low-level details while providing a flexible interface suitable for most applications. However, such abstraction comes at a performance cost. Our work focused on studying the performance of the popular PyTorch framework at the kernel level. We find that a number of optimizations can boost training performance above 2x. In this seminar, we will summarize the current status of this work and walk through the steps followed to optimize the performance of both the Linux Kernel and IHK+McKernel on AI workloads. Additionally, we introduce the tools used for such purpose: LTTng and Perf.