March 3rd, Computer History Museum CODING AGENTS CONFERENCE, come join us while there are still tickets left.
https://luma.com/codingagents
Chris Fregly is currently focused on building and scaling high-performance AI systems, writing and teaching about AI infrastructure, helping organizations adopt generative AI and performance engineering principles on AWS, and fostering large developer communities around these topics.
Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs // MLOps Podcast #363 with Chris Fregly, Founder, AI Performance Engineer, and Investor
Join the Community: https://go.mlops.community/YTJoinIn
Get the newsletter: https://go.mlops.community/YTNewsletter
MLOps GPU Guide: https://go.mlops.community/gpuguide
// Abstract
In today’s era of massive generative models, it's important to understand the full scope of AI systems' performance engineering. This talk discusses the new O'Reilly book, AI Systems Performance Engineering, and the accompanying GitHub repo (https://github.com/cfregly/ai-performance-engineering).
This talk provides engineers, researchers, and developers with a set of actionable optimization strategies. You'll learn techniques to co-design and co-optimize hardware, software, and algorithms to build resilient, scalable, and cost-effective AI systems for both training and inference.
// Bio
Chris Fregly is an AI performance engineer and startup founder with experience at AWS, Databricks, and Netflix. He's the author of three (3) O'Reilly books, including Data Science on AWS (2021), Generative AI on AWS (2023), and AI Systems Performance Engineering (2025). He also runs the global AI Performance Engineering meetup and speaks at many AI-related conferences, including Nvidia GTC, ODSC, Big Data London, and more.
// Related Links
AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch 1st Edition by Chris Fregly: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
Coding Agents Conference: https://luma.com/codingagents
~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~
Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore
Join our Slack community [https://go.mlops.community/slack]
Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]
Sign up for the next meetup: [https://go.mlops.community/register]
MLOps Swag/Merch: [https://shop.mlops.community/]
Connect with Demetrios on LinkedIn: /dpbrinkm
Connect with Chris on LinkedIn: /cfregly
Timestamps:
[00:00] SageMaker HyperPod Resilience
[00:27] Book Creation and Software Engineering
[04:57] Software Engineers and Maintenance
[11:49] AI Systems Performance Engineering
[22:03] Cognitive Biases and Optimization / "Mechanical Sympathy"
[29:36] GPU Rack-Scale Architecture
[33:58] Data Center Reliability Issues
[43:52] AI Compute Platforms
[49:05] Hardware vs Ecosystem Choice
[1:00:05] Claude vs Codex vs Gemini
[1:14:53] Kernel Budget Allocation
[1:18:49] Steerable Reasoning Challenges
[1:24:18] Data Chain Value Awareness