How Can We Solve Observability's Data Capture and Spending Problem?
DevOps practitioners — whether developers, operators, SREs or business stakeholders — increasingly rely on telemetry to guide decisions, yet face growing complexity, siloed teams and rising observability costs. In a conversation at KubeCon + CloudNativeCon North America, IBM’s Jacob Yackenovich emphasized the importance of collecting high-granularity, full-capture data to avoid missing critical performance signals across hybrid application stacks that blend legacy and cloud-native components. He argued that observability must evolve to serve both technical and nontechnical users, enabling teams to focus on issues based on real business impact rather than subjective judgment.AI’s rapid integration into applications introduces new observability challenges. Yackenovich described two patterns: add-on AI services, such as chatbots, whose failures don’t disrupt core workflows, and blocking-style AI components embedded in essential processes like fraud detection, where errors directly affect application function.Rising cloud and ingestion costs further complicate telemetry strategies. Yackenovich cautioned against limiting visibility for budget reasons, advocating instead for predictable, fixed-price observability models that let organizations innovate without financial uncertainty.Learn more from The New Stack about the latest in observability: Introduction to ObservabilityObservability 2.0? Or Just Logs All Over Again?Building an Observability Culture: Getting Everyone OnboardJoin our community of newsletter subscribers to stay on top of the news and at the top of your game. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
--------
22:21
--------
22:21
How Kubernetes Became the New Linux
Major banks once built their own Linux kernels because no distributions existed, but today commercial distros — and Kubernetes — are universal. At KubeCon + CloudNativeCon North America, AWS’s Jesse Butler noted that Kubernetes has reached the same maturity Linux once did: organizations no longer build bespoke control planes but rely on shared standards. That shift influences how AWS contributes to open source, emphasizing community-wide solutions rather than AWS-specific products.Butler highlighted two AWS EKS projects donated to Kubernetes SIGs: KRO and Karpenter. KRO addresses the proliferation of custom controllers that emerged once CRDs made everything representable as Kubernetes resources. By generating CRDs and microcontrollers from simple YAML schemas, KRO transforms “glue code” into an automated service within Kubernetes itself. Karpenter tackles the limits of traditional autoscaling by delivering just-in-time, cost-optimized node provisioning with a flexible, intuitive API. Both projects embody AWS’s evolving philosophy: building features that serve the entire Kubernetes ecosystem as it matures into a true enterprise standard.Learn more from The New Stack about the latest in Kube Resource Orchestrator and Karpenter: Migrating From Cluster Autoscaler to Karpenter v0.32How Amazon EKS Auto Mode Simplifies Kubernetes Cluster Management (Part 1) Kubernetes Gets a New Resource Orchestrator in the Form of KroJoin our community of newsletter subscribers to stay on top of the news and at the top of your game. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
--------
20:28
--------
20:28
Keeping GPUs Ticking Like Clockwork
Clockwork began with a narrow goal—keeping clocks synchronized across servers—but soon realized that its precise latency measurements could reveal deeper data center networking issues. This insight led the company to build a hardware-agnostic monitoring and remediation platform capable of automatically routing around faults. Today, Clockwork’s technology is especially valuable for large GPU clusters used in training LLMs, where communication efficiency and reliability are critical. CEO Suresh Vasudevan explains that AI workloads are among the most demanding distributed applications ever, and Clockwork provides building blocks that improve visibility, performance and fault tolerance. Its flagship feature, FleetIQ, can reroute traffic around failing switches, preventing costly interruptions that might otherwise force teams to restart training from hours-old checkpoints. Although the company originated from Stanford research focused on clock synchronization for financial institutions, the team eventually recognized that packet-timing data could underpin powerful network telemetry and dynamic traffic control. By integrating with NVIDIA NCCL, TCP and RDMA libraries, Clockwork can not only measure congestion but also actively manage GPU communication to enhance both uptime and training efficiency. Learn more from The New Stack about the latest in Clockwork: Clockwork’s FleetIQ Aims To Fix AI’s Costly Network Bottleneck What Happens When 116 Makers Reimagine the Clock? Join our community of newsletter subscribers to stay on top of the news and at the top of your game. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
--------
27:08
--------
27:08
Jupyter Deploy: the New Middle Ground between Laptops and Enterprise
At JupyterCon 2025, Jupyter Deploy was introduced as an open source command-line tool designed to make cloud-based Jupyter deployments quick and accessible for small teams, educators, and researchers who lack cloud engineering expertise. As described by AWS engineer Jonathan Guinegagne, these users often struggle in an “in-between” space—needing more computing power and collaboration features than a laptop offers, but without the resources for complex cloud setups. Jupyter Deploy simplifies this by orchestrating an entire encrypted stack—using Docker, Terraform, OAuth2, and Let’s Encrypt—with minimal setup, removing the need to manually manage 15–20 cloud components. While it offers an easy on-ramp, Guinegagne notes that long-term use still requires some cloud understanding. Built by AWS’s AI Open Source team but deliberately vendor-neutral, it uses a template-based approach, enabling community-contributed deployment recipes for any cloud. Led by Brian Granger, the project aims to join the official Jupyter ecosystem, with future plans including Kubernetes integration for enterprise scalability. Learn more from The New Stack about the latest in Jupyter AI development: Introduction to Jupyter Notebooks for DevelopersDisplay AI-Generated Images in a Jupyter Notebook Join our community of newsletter subscribers to stay on top of the news and at the top of your game. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
--------
22:10
--------
22:10
From Physics to the Future: Brian Granger on Project Jupyter in the Age of AI
In an interview at JupyterCon, Brian Granger — co-creator of Project Jupyter and senior principal technologist at AWS — reflected on Jupyter’s evolution and how AI is redefining open source sustainability. Originally inspired by physics’ modular principles, Granger and co-founder Fernando Pérez designed Jupyter with flexible, extensible components like the notebook format and kernel message protocol. This architecture has endured as the ecosystem expanded from data science into AI and machine learning. Now, AI is accelerating development itself: Granger described rewriting Jupyter Server in Go, complete with tests, in just 30 minutes using an AI coding agent — a task once considered impossible. This shift challenges traditional notions of technical debt and could reshape how large open source projects evolve. Jupyter’s 2017 ACM Software System Award placed it among computing’s greats, but also underscored its global responsibility. Granger emphasized that sustaining Jupyter’s mission — empowering human reasoning, collaboration, and innovation — remains the team’s top priority in the AI era. Learn more from The New Stack about the latest in Jupyter AI development: Introduction to Jupyter Notebooks for Developers Display AI-Generated Images in a Jupyter Notebook Join our community of newsletter subscribers to stay on top of the news and at the top of your game. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
The New Stack Podcast is all about the developers, software engineers and operations people who build at-scale architectures that change the way we develop and deploy software.
For more content from The New Stack, subscribe on YouTube at: https://www.youtube.com/c/TheNewStack