Powered by RND
PodcastsTecnologíaLatent Space: The AI Engineer Podcast

Latent Space: The AI Engineer Podcast

swyx + Alessio
Latent Space: The AI Engineer Podcast
Último episodio

Episodios disponibles

5 de 131
  • ⚡️Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect
    In an otherwise heavy week packed with Microsoft Build, Google I/O, and OpenAI io, the worst kept secret in biglab land was the launch of Claude 4, particularly the triumphant return of Opus, which many had been clamoring for. We will leave the specific Claude 4 recap to AINews, however we think that both Gemini’s progress on Deep Think this week and Claude 4 represent the next frontier of progress on inference time compute/reasoning (at last until GPT5 ships this summer). Will Brown’s talk at AIE NYC and open source work on verifiers have made him one of the most prominent voices able to publicly discuss (aka without the vaguepoasting LoRA they put on you when you join a biglab) the current state of the art in reasoning models and where current SOTA research directions lead. We discussed his latest paper on Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment and he has previewed his AIEWF talk on Agentic RL for those with the temerity to power thru bad meetup audio. Chapters 00:00 Introduction and Episode Overview 02:01 Discussion on Cloud 4 and its Features 04:31 Reasoning and Tool Use in AI Models 07:01 Extended Thinking in Claude and Model Differences 09:31 Speculation on Claude's Extended Thinking 11:01 Challenges and Controversies in AI Model Training 13:31 Technical Highlights and Code Trustworthiness 16:01 Token Costs and Incentives in AI Models 18:31 Thinking Budgets and AI Effort 21:01 Safety and Ethics in AI Model Development 23:31 Anthropic's Approach to AI Safety 26:01 LLM Arena and Evaluation Challenges 28:31 Developing Taste and Direction in AI Research 31:01 Recent Research and Multi-Turn RL 33:31 Tools and Incentives in AI Model Development 36:01 Challenges in Evaluating AI Model Outputs 38:31 Model-Based Rewards and Future Directions 41:01 Wrap-up and Future Plans
    --------  
    39:57
  • ChatGPT Codex: The Missing Manual
    ChatGPT Codex is here - the first cloud hosted Autonomous Software Engineer (A-SWE) from OpenAI. We sat down for a quick pod with two core devs on the ChatGPT Codex team: Josh Ma and Alexander Embiricos to get the inside scoop on the origin story of Codex, from WHAM to its future roadmap. Follow them: https://github.com/joshma and https://x.com/embirico Chapters - 00:00 Introduction to the Latent Space Podcast - 00:59 The Launch of ChatGPT Codex - 03:08 Personal Journeys into AI Development - 05:50 The Evolution of Codex and AI Agents - 08:55 Understanding the Form Factor of Codex - 11:48 Building a Software Engineering Agent - 14:53 Best Practices for Using AI Agents - 17:55 The Importance of Code Structure for AI - 21:10 Navigating Human and AI Collaboration - 23:58 Future of AI in Software Development - 28:18 Planning and Decision-Making in AI Development - 31:37 User, Developer, and Model Dynamics - 35:28 Building for the Future: Long-Term Vision - 39:31 Best Practices for Using AI Tools - 42:32 Understanding the Compute Platform - 48:01 Iterative Deployment and Future Improvements
    --------  
    53:31
  • Claude Code: Anthropic's CLI Agent
    More info: https://docs.anthropic.com/en/docs/claude-code/overview The AI coding wars have now split across four battlegrounds: 1. AI IDEs: with two leading startups in Windsurf ($3B acq. by OpenAI) and Cursor ($9B valuation) and a sea of competition behind them (like Cline, Github Copilot, etc). 2. Vibe coding platforms: Bolt.new, Lovable, v0, etc. all experiencing fast growth and getting to the tens of millions of revenue in months. 3. The teammate agents: Devin, Cosine, etc. Simply give them a task, and they will get back to you with a full PR (with mixed results) 4. The cli-based agents: after Aider’s initial success, we are now seeing many other alternatives including two from the main labs: OpenAI Codex and Claude Code. The main draw is that 1) they are composable 2) they are pay as you go based on tokens used. Since we covered all three of the first categories, today’s guests are Boris and Cat, the lead engineer and PM for Claude Code. If you only take one thing away from this episode, it’s this piece from Boris: Claude Code is not a product as much as it’s a Unix utility. This fits very well with Anthropic’s product principle: “do the simple thing first.” Whether it’s the memory implementation (a markdown file that gets auto-loaded) or the approach to prompt summarization (just ask Claude to summarize), they always pick the smallest building blocks that are useful, understandable, and extensible. Even major features like planning (“/think”) and memory (#tags in markdown) fit the same idea of having text I/O as the core interface. This is very similar to the original UNIX design philosophy: Claude Code is also the most direct way to consume Sonnet for coding, rather than going through all the hidden prompting and optimization than the other products do. You will feel that right away, as the average spend per user is $6/day on Claude Code compared to $20/mo for Cursor, for example. Apparently, there are some engineers inside of Anthropic that have spent >$1,000 in one day! If you’re building AI developer tools, there’s also a lot of alpha on how to design a cli tool, interactive vs non-interactive modes, and how to balance feature creation. Enjoy! Timestamps [00:00:00] Intro [00:01:59] Origins of Claude Code [00:04:32] Anthropic’s Product Philosophy [00:07:38] What should go into Claude Code? [00:09:26] Claude.md and Memory Simplification [00:10:07] Claude Code vs Aider [00:11:23] Parallel Workflows and Unix Utility Philosophy [00:12:51] Cost considerations and pricing model [00:14:51] Key Features Shipped Since Launch [00:16:28] Claude Code writes 80% of Claude Code [00:18:01] Custom Slash Commands and MCP Integration [00:21:08] Terminal UX and Technical Stack [00:27:11] Code Review and Semantic Linting [00:28:33] Non-Interactive Mode and Automation [00:36:09] Engineering Productivity Metrics [00:37:47] Balancing Feature Creation and Maintenance [00:41:59] Memory and the Future of Context [00:50:10] Sandboxing, Branching, and Agent Planning [01:01:43] Future roadmap [01:11:00] Why Anthropic Excels at Developer Tools
    --------  
    1:17:21
  • ⚡️The Rise and Fall of the Vector DB Category
    Note from your hosts: we were off this week for ICLR and RSA! This week we’re bringing you one of the top episodes from our lightning podcast series, the shorter format, Youtube-only side podcast we do for breaking news and faster turnaround. Please support our work on YouTube! https://www.youtube.com/playlist?list=PLWEAb1SXhjlc5qgVK4NgehdCzMYCwZtiB The explosion of embedding-based applications created a new challenge: efficiently storing, indexing, and searching these high-dimensional vectors at scale. This gap gave rise to the vector database category, with companies like Pinecone leading the charge in 2022-2023 by defining specialized infrastructure for vector operations. The category saw explosive growth following ChatGPT's launch in late 2022, as developers rushed to build AI applications using Retrieval-Augmented Generation (RAG). This surge was partly driven by a widespread misconception that embedding-based similarity search was the only viable method for retrieving context for LLMs!!! The resulting "vector database gold rush" saw massive investment and attention directed toward vector search infrastructure, even though traditional information retrieval techniques remained equally valuable for many RAG applications. https://x.com/jobergum/status/1872923872007217309 Chapters 00:00 Introduction to Trondheim and Background 03:03 The Rise and Fall of Vector Databases 06:08 Convergence of Search Technologies 09:04 Embeddings and Their Importance 12:03 Building Effective Search Systems 15:00 RAG Applications and Recommendations 17:55 The Role of Knowledge Graphs 20:49 Future of Embedding Models and Innovations
    --------  
    27:16
  • Why Every Agent needs Open Source Cloud Sandboxes
    Vasek Mlejnsky from E2B joins us today to talk about sandboxes for AI agents. In the last 2 years, E2B has grown from a handful of developers building on it to being used by ~50% of the Fortune 500 and generating millions of sandboxes each week for their customers. As the “death of chat completions” approaches, LLMs workflows and agents are relying more and more on tool usage and multi-modality. The most common use cases for their sandboxes: - Run data analysis and charting (like Perplexity) - Execute arbitrary code generated by the model (like Manus does) - Running evals on code generation (see LMArena Web) - Doing reinforcement learning for code capabilities (like HuggingFace) Timestamps: 00:00:00 Introductions 00:00:37 Origin of DevBook -> E2B 00:02:35 Early Experiments with GPT-3.5 and Building AI Agents 00:05:19 Building an Agent Cloud 00:07:27 Challenges of Building with Early LLMs 00:10:35 E2B Use Cases 00:13:52 E2B Growth vs Models Capabilities 00:15:03 The LLM Operating System (LLMOS) Landscape 00:20:12 Breakdown of JavaScript vs Python Usage on E2B 00:21:50 AI VMs vs Traditional Cloud 00:26:28 Technical Specifications of E2B Sandboxes 00:29:43 Usage-based billing infrastructure 00:34:08 Pricing AI on Value Delivered vs Token Usage 00:36:24 Forking, Checkpoints, and Parallel Execution in Sandboxes 00:39:18 Future Plans for Toolkit and Higher-Level Agent Frameworks 00:42:35 Limitations of Chat-Based Interfaces and the Future of Agents 00:44:00 MCPs and Remote Agent Capabilities 00:49:22 LLMs.txt, scrapers, and bad AI bots 00:53:00 Manus and Computer Use on E2B 00:55:03 E2B for RL with Hugging Face 00:56:58 E2B for Agent Evaluation on LMArena 00:58:12 Long-Term Vision: E2B as Full Lifecycle Infrastructure for LLMs 01:00:45 Future Plans for Hosting and Deployment of LLM-Generated Apps 01:01:15 Why E2B Moved to San Francisco 01:05:49 Open Roles and Hiring Plans at E2B
    --------  
    1:06:38

Más podcasts de Tecnología

Acerca de Latent Space: The AI Engineer Podcast

The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space
Sitio web del podcast

Escucha Latent Space: The AI Engineer Podcast, Acquired y muchos más podcasts de todo el mundo con la aplicación de radio.net

Descarga la app gratuita: radio.net

  • Añadir radios y podcasts a favoritos
  • Transmisión por Wi-Fi y Bluetooth
  • Carplay & Android Auto compatible
  • Muchas otras funciones de la app
Aplicaciones
Redes sociales
v7.18.2 | © 2007-2025 radio.de GmbH
Generated: 5/24/2025 - 10:47:07 AM