Vasek Mlejnsky from E2B joins us today to talk about sandboxes for AI agents. In the last 2 years, E2B has grown from a handful of developers building on it to being used by ~50% of the Fortune 500 and generating millions of sandboxes each week for their customers. As the “death of chat completions” approaches, LLMs workflows and agents are relying more and more on tool usage and multi-modality.
The most common use cases for their sandboxes:
- Run data analysis and charting (like Perplexity)
- Execute arbitrary code generated by the model (like Manus does)
- Running evals on code generation (see LMArena Web)
- Doing reinforcement learning for code capabilities (like HuggingFace)
Timestamps:
00:00:00 Introductions
00:00:37 Origin of DevBook -> E2B
00:02:35 Early Experiments with GPT-3.5 and Building AI Agents
00:05:19 Building an Agent Cloud
00:07:27 Challenges of Building with Early LLMs
00:10:35 E2B Use Cases
00:13:52 E2B Growth vs Models Capabilities
00:15:03 The LLM Operating System (LLMOS) Landscape
00:20:12 Breakdown of JavaScript vs Python Usage on E2B
00:21:50 AI VMs vs Traditional Cloud
00:26:28 Technical Specifications of E2B Sandboxes
00:29:43 Usage-based billing infrastructure
00:34:08 Pricing AI on Value Delivered vs Token Usage
00:36:24 Forking, Checkpoints, and Parallel Execution in Sandboxes
00:39:18 Future Plans for Toolkit and Higher-Level Agent Frameworks
00:42:35 Limitations of Chat-Based Interfaces and the Future of Agents
00:44:00 MCPs and Remote Agent Capabilities
00:49:22 LLMs.txt, scrapers, and bad AI bots
00:53:00 Manus and Computer Use on E2B
00:55:03 E2B for RL with Hugging Face
00:56:58 E2B for Agent Evaluation on LMArena
00:58:12 Long-Term Vision: E2B as Full Lifecycle Infrastructure for LLMs
01:00:45 Future Plans for Hosting and Deployment of LLM-Generated Apps
01:01:15 Why E2B Moved to San Francisco
01:05:49 Open Roles and Hiring Plans at E2B
--------
1:06:38
⚡️GPT 4.1: The New OpenAI Workhorse
We’ll keep this brief because we’re on a tight turnaround: GPT 4.1, previously known as the Quasar and Optimus models, is now live as the natural update for 4o/4o-mini (and the research preview of GPT 4.5). Though it is a general purpose model family, the headline features are:
Coding abilities (o1-level SWEBench and SWELancer, but ok Aider)
Instruction Following (with a very notable prompting guide)
Long Context up to 1m tokens (with new MRCR and Graphwalk benchmarks)
Vision (simply o1 level)
Cheaper Pricing (cheaper than 4o, greatly improved prompt caching savings)
We caught up with returning guest Michelle Pokrass and Josh McGrath to get more detail on each!
Chapters
00:00:00 Introduction and Guest Welcome
00:00:57 GPC 4.1 Launch Overview
00:01:54 Developer Feedback and Model Names
00:02:53 Model Naming and Starry Themes
00:03:49 Confusion Over GPC 4.1 vs 4.5
00:04:47 Distillation and Model Improvements
00:05:45 Omnimodel Architecture and Future Plans
00:06:43 Core Capabilities of GPC 4.1
00:07:40 Training Techniques and Long Context
00:08:37 Challenges in Long Context Reasoning
00:09:34 Context Utilization in Models
00:10:31 Graph Walks and Model Evaluation
00:11:31 Real Life Applications of Graph Tasks
00:12:30 Multi-Hop Reasoning Benchmarks
00:13:30 Agentic Workflows and Backtracking
00:14:28 Graph Traversals for Agent Planning
00:15:24 Context Usage in API and Memory Systems
00:16:21 Model Performance in Long Context Tasks
00:17:17 Instruction Following and Real World Data
00:18:12 Challenges in Grading Instructions
00:19:09 Instruction Following Techniques
00:20:09 Prompting Techniques and Model Responses
00:21:05 Agentic Workflows and Model Persistence
00:22:01 Balancing Persistence and User Control
00:22:56 Evaluations on Model Edits and Persistence
00:23:55 XML vs JSON in Prompting
00:24:50 Instruction Placement in Context
00:25:49 Optimizing for Prompt Caching
00:26:49 Chain of Thought and Reasoning Models
00:27:46 Choosing the Right Model for Your Task
00:28:46 Coding Capabilities of GPC 4.1
00:29:41 Model Performance in Coding Tasks
00:30:39 Understanding Coding Model Differences
00:31:36 Using Smaller Models for Coding
00:32:33 Future of Coding in OpenAI
00:33:28 Internal Use and Success Stories
00:34:26 Vision and Multi-Modal Capabilities
00:35:25 Screen vs Embodied Vision
00:36:22 Vision Benchmarks and Model Improvements
00:37:19 Model Deprecation and GPU Usage
00:38:13 Fine-Tuning and Preference Steering
00:39:12 Upcoming Reasoning Models
00:40:10 Creative Writing and Model Humor
00:41:07 Feedback and Developer Community
00:42:03 Pricing and Blended Model Costs
00:44:02 Conclusion and Wrap-Up
--------
41:52
SF Compute: Commoditizing Compute
Evan Conrad, co-founder of SF Compute, joined us to talk about how they started as an AI lab that avoided bankruptcy by selling GPU clusters, why CoreWeave financials look like a real estate business, and how GPUs are turning into a commodities market.
Chapters:
00:00:05 - Introductions
00:00:12 - Introduction of guest Evan Conrad from SF Compute
00:00:12 - CoreWeave Business Model Discussion
00:05:37 - CoreWeave as a Real Estate Business
00:08:59 - Interest Rate Risk and GPU Market Strategy Framework
00:16:33 - Why Together and DigitalOcean will lose money on their clusters
00:20:37 - SF Compute's AI Lab Origins
00:25:49 - Utilization Rates and Benefits of SF Compute Market Model
00:30:00 - H100 GPU Glut, Supply Chain Issues, and Future Demand Forecast
00:34:00 - P2P GPU networks
00:36:50 - Customer stories
00:38:23 - VC-Provided GPU Clusters and Credit Risk Arbitrage
00:41:58 - Market Pricing Dynamics and Preemptible GPU Pricing Model
00:48:00 - Future Plans for Financialization?
00:52:59 - Cluster auditing and quality control
00:58:00 - Futures Contracts for GPUs
01:01:20 - Branding and Aesthetic Choices Behind SF Compute
01:06:30 - Lessons from Previous Startups
01:09:07 - Hiring at SF Compute
Chapters
00:00:00 Introduction and Background
00:00:58 Analysis of GPU Business Models
00:01:53 Challenges with GPU Pricing
00:02:48 Revenue and Scaling with GPUs
00:03:46 Customer Sensitivity to GPU Pricing
00:04:44 Core Weave's Business Strategy
00:05:41 Core Weave's Market Perception
00:06:40 Hyperscalers and GPU Market Dynamics
00:07:37 Financial Strategies for GPU Sales
00:08:35 Interest Rates and GPU Market Risks
00:09:30 Optimal GPU Contract Strategies
00:10:27 Risks in GPU Market Contracts
00:11:25 Price Sensitivity and Market Competition
00:12:21 Market Dynamics and GPU Contracts
00:13:18 Hyperscalers and GPU Market Strategies
00:14:15 Nvidia and Market Competition
00:15:12 Microsoft's Role in GPU Market
00:16:10 Challenges in GPU Market Dynamics
00:17:07 Economic Realities of the GPU Market
00:18:03 Real Estate Model for GPU Clouds
00:18:59 Price Sensitivity and Chip Design
00:19:55 SF Compute's Beginnings and Challenges
00:20:54 Navigating the GPU Market
00:21:54 Pivoting to a GPU Cloud Provider
00:22:53 Building a GPU Market
00:23:52 SF Compute as a GPU Marketplace
00:24:49 Market Liquidity and GPU Pricing
00:25:47 Utilization Rates in GPU Markets
00:26:44 Brokerage and Market Flexibility
00:27:42 H100 Glut and Market Cycles
00:28:40 Supply Chain Challenges and GPU Glut
00:29:35 Future Predictions for the GPU Market
00:30:33 Speculations on Test Time Inference
00:31:29 Market Demand and Test Time Inference
00:32:26 Open Source vs. Closed AI Demand
00:33:24 Future of Inference Demand
00:34:24 Peer-to-Peer GPU Markets
00:35:17 Decentralized GPU Market Skepticism
00:36:15 Redesigning Architectures for New Markets
00:37:14 Supporting Grad Students and Startups
00:38:11 Successful Startups Using SF Compute
00:39:11 VCs and GPU Infrastructure
00:40:09 VCs as GPU Credit Transformators
00:41:06 Market Timing and GPU Infrastructure
00:42:02 Understanding GPU Pricing Dynamics
00:43:01 Market Pricing and Preemptible Compute
00:43:55 Price Volatility and Market Optimization
00:44:52 Customizing Compute Contracts
00:45:50 Creating Flexible Compute Guarantees
00:46:45 Financialization of GPU Markets
00:47:44 Building a Spot Market for GPUs
00:48:40 Auditing and Standardizing Clusters
00:49:40 Ensuring Cluster Reliability
00:50:36 Active Monitoring and Refunds
00:51:33 Automating Customer Refunds
00:52:33 Challenges in Cluster Maintenance
00:53:29 Remote Cluster Management
00:54:29 Standardizing Compute Contracts
00:55:28 Unified Infrastructure for Clusters
00:56:24 Creating a Commodity Market for GPUs
00:57:22 Futures Market and Risk Management
00:58:18 Reducing Risk with GPU Futures
00:59:14 Stabilizing the GPU Market
01:00:10 SF Compute's Anti-Hype Approach
01:01:07 Calm Branding and Expectations
01:02:07 Promoting San Francisco's Beauty
01:03:03 Design Philosophy at SF Compute
01:04:02 Artistic Influence on Branding
01:05:00 Past Projects and Burnout
01:05:59 Challenges in Building an Email Client
01:06:57 Persistence and Iteration in Startups
01:07:57 Email Market Challenges
01:08:53 SF Compute Job Opportunities
01:09:53 Hiring for Systems Engineering
01:10:50 Financial Systems Engineering Role
01:11:50 Conclusion and Farewell
--------
1:12:01
The Creators of Model Context Protocol
Today’s guests, David Soria Parra and Justin Spahr-Summers, are the creators of Anthropic’s Model Context Protocol (MCP). When we first wrote Why MCP Won, we had no idea how quickly it was about to win.
In the past 4 weeks, OpenAI and now Google have now announced the MCP support, effectively confirming our prediction that MCP was the presumptive winner of the agent standard wars. MCP has now overtaken OpenAPI, the incumbent option and most direct alternative, in GitHub stars (3 months ahead of conservative trendline):
For protocol and history nerds, we also asked David and Justin to tell the origin story of MCP, which we leave to the reader to enjoy (you can also skim the transcripts, or, the changelogs of a certain favored IDE). It’s incredible the impact that individual engineers solving their own problems can have on an entire industry.
Timestamps
00:00 Introduction and Guest Welcome
00:37 What is MCP?
02:00 The Origin Story of MCP
05:18 Development Challenges and Solutions
08:06 Technical Details and Inspirations
29:45 MCP vs Open API
32:48 Building MCP Servers
40:39 Exploring Model Independence in LLMs
41:36 Building Richer Systems with MCP
43:13 Understanding Agents in MCP
45:45 Nesting and Tool Confusion in MCP
49:11 Client Control and Tool Invocation
52:08 Authorization and Trust in MCP Servers
01:01:34 Future Roadmap and Stateless Servers
01:10:07 Open Source Governance and Community Involvement
01:18:12 Wishlist and Closing Remarks
--------
1:19:56
Unsupervised Learning x Latent Space Crossover Special
Unsupervised Learning is a podcast that interviews the sharpest minds in AI about what’s real today, what will be real in the future and what it means for businesses and the world - helping builders, researchers and founders deconstruct and understand the biggest breakthroughs.
Top guests: Noam Shazeer, Bob McGrew, Noam Brown, Dylan Patel, Percy Liang, David Luan
https://www.latent.space/p/unsupervised-learning
Timestamps
00:00 Introduction and Excitement for Collaboration
00:27 Reflecting on Surprises in AI Over the Past Year
01:44 Open Source Models and Their Adoption
06:01 The Rise of GPT Wrappers
06:55 AI Builders and Low-Code Platforms
09:35 Overhyped and Underhyped AI Trends
22:17 Product Market Fit in AI
28:23 Google's Current Momentum
28:33 Customer Support and AI
29:54 AI's Impact on Cost and Growth
31:05 Voice AI and Scheduling
32:59 Emerging AI Applications
34:12 Education and AI
36:34 Defensibility in AI Applications
40:10 Infrastructure and AI
47:08 Challenges and Future of AI
52:15 Quick Fire Round and Closing Remarks
Chapters
00:00:00 Introduction and Collab Excitement
00:00:58 Open Source and Model Adoption
00:01:58 Enterprise Use of Open Source Models
00:02:57 The Competitive Edge of Closed Source Models
00:03:56 DeepSea and Open Source Model Releases
00:04:54 Market Narrative and DeepSea Impact
00:05:53 AI Engineering and GPT Wrappers
00:06:53 AI Builders and Low-Code Platforms
00:07:50 Innovating Beyond Existing Paradigms
00:08:50 Apple and AI Product Development
00:09:48 Overhyped and Underhyped AI Trends
00:10:46 Frameworks and Protocols in AI Development
00:11:45 Emerging Opportunities in AI
00:12:44 Stateful AI and Memory Innovation
00:13:44 Challenges with Memory in AI Agents
00:14:44 The Future of Model Training Companies
00:15:44 Specialized Use Cases for AI Models
00:16:44 Vertical Models vs General Purpose Models
00:17:42 General Purpose vs Domain-Specific Models
00:18:42 Reflections on Model Companies
00:19:39 Model Companies Entering Product Space
00:20:38 Competition in AI Model and Product Sectors
00:21:35 Coding Agents and Market Dynamics
00:22:35 Defensibility in AI Applications
00:23:35 Investing in Underappreciated AI Ventures
00:24:32 Analyzing Market Fit in AI
00:25:31 AI Applications with Product Market Fit
00:26:31 OpenAI's Impact on the Market
00:27:31 Google and OpenAI Competition
00:28:31 Exploring Google's Advancements
00:29:29 Customer Support and AI Applications
00:30:27 The Future of AI in Customer Support
00:31:26 Cost-Cutting vs Growth in AI
00:32:23 Voice AI and Real-World Applications
00:33:23 Scaling AI Applications for Demand
00:34:22 Summarization and Conversational AI
00:35:20 Future AI Use Cases and Market Fit
00:36:20 AI Education and Model Capabilities
00:37:17 Reforming Education with AI
00:38:15 Defensibility in AI Apps
00:39:13 Network Effects and AI
00:40:12 AI Brand and Market Positioning
00:41:11 AI Application Defensibility
00:42:09 LLM OS and AI Infrastructure
00:43:06 Security and AI Application
00:44:06 OpenAI's Role in AI Infrastructure
00:45:02 The Balance of AI Applications and Infrastructure
00:46:02 Capital Efficiency in AI Infrastructure
00:47:01 Challenges in AI DevOps and Infrastructure
00:47:59 AI SRE and Monitoring
00:48:59 Scaling AI and Hardware Challenges
00:49:58 Reliability and Compute in AI
00:50:57 Nvidia's Dominance and AI Hardware
00:51:57 Emerging Competition in AI Silicon
00:52:54 Agent Authentication Challenges
00:53:53 Dream Podcast Guests
00:54:51 Favorite News Sources and Startups
00:55:50 The Value of In-Person Conversations
00:56:50 Private vs Public AI Discourse
00:57:48 Latent Space and Podcasting
00:58:46 Conclusion and Final Thoughts
The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0.
We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al.
Full show notes always on https://latent.space