How do I measure AI startup benchmarks for my company?

Track ARR growth, gross margin, ARR/FTE and cohort retention, and run private evals for model performance and lineage.

What distinguishes a Supernova from a Shooting Star?

Supernova = explosive top-line growth with possible weak retention and low margins; Shooting Star = faster-than-SaaS growth with healthier margins and retention.

Why is memory a competitive moat for AI startups?

Persistent memory enables deeper personalization and higher switching costs, making products stickier.

What technical priorities should an enterprise AI product have?

Private evals, data lineage, RAG/vector DBs, MCP integration, and observability for model drift.

How to prepare for incumbent M&A interest?

Protect data assets, demonstrate customer ROI, and document integrations and APIs to preserve negotiating leverage.

AI Startup Benchmarks 2025 — growth, margins, practical tips

Introduction

By 2025 the AI ecosystem has moved from the initial shock of rapid innovation into a phase Bessemer calls “First Light”: clearer clusters of companies and patterns are forming, even as significant unknowns remain. This expanded summary paraphrases "The State of AI 2025" (Bessemer’s Atlas) and deepens the original synthesis with additional specifics on benchmarks, infrastructure evolution, developer tooling, vertical and consumer opportunities, unresolved "dark matter," and the five predictions shaping 2025–2026.

Context

Since ChatGPT pushed AI into public consciousness, capital and product focus have surged: Bessemer reports having deployed over $1B in AI-native startups since 2023. The landscape today is defined by intense competition, fast adoption cycles, and new success metrics—what counted as great in the SaaS era no longer maps cleanly to AI-native businesses.

AI benchmarks: two archetypes in detail

AI Supernovas

Supernovas exhibit historical-scale top-line growth: in Bessemer’s sample, ~40M ARR in year one of commercialization and ~125M ARR in year two on average. Yet they often show thin gross margins (~25%) as they sacrifice margin for distribution, and retention can be fragile where switching costs are low. Remarkably, Supernovas show ~ $1.13M ARR per FTE, signaling exceptional revenue efficiency that could translate into sustained scale if retention and unit economics improve.

AI Shooting Stars

Shooting Stars look more like accelerated SaaS winners: roughly ~3M ARR in year one, quadrupling year-over-year growth trajectories, and healthier gross margins (~60%) with ARR/FTE near $164K in early years. Bessemer frames their 5-year path as Q2T3 (quadruple, quadruple, triple, triple, triple)—faster than classic SaaS (T2D3) but still grounded in durable retention and margins.

Roadmaps: model layer and infrastructure

The model layer remains dominated by a few large labs—OpenAI, Anthropic, Google/DeepMind, Gemini, xAI—yet open-source entrants like Kimi, Qwen, Mixtral and others continue to push efficiency and domain-specialized performance. Research trends include adaptive-depth approaches (Mixture-of-Recursions), revived Mixture-of-Experts, and inference-time techniques such as test-time RL and adaptive reasoning that improve few-shot accuracy without linear compute scaling.

"The second half of AI—starting now—will shift focus from solving problems to defining them."

Shunyu Yao, OpenAI (quoted in Bessemer’s report)

Infrastructure's Second Act

The next chapter emphasizes connecting models to real-world experience: RL environments and task curation (Fleet, Matrices, Kaizen), continuous evaluation frameworks (Bigspin.ai, Kiln AI, Judgment Labs), and compound systems combining retrieval, memory, planning and optimized inference. The emphasis shifts from pure scale to grounded systems that define, measure, and act on problems in production.

Developer platforms: MCP and memory

Model Context Protocol (MCP) is emerging as a universal spec for agent integration—persistent memory, multi-tool flows, and permissioning—analogous to USB-C for agentic AI. FastMCP (Prefect), Arcade and Keycard are early ecosystem pieces that make agentic integration tractable. For developers, MCP lowers integration friction and unlocks true agentic products that act on users' behalf across systems.

Memory as product primitive

Memory—short-term via huge context windows and long-term via vector DBs and memory OSes (e.g., MemOS)—is framed as the future moat: persistent, semantic, multi-session memory turns functionality into a personal, sticky asset. Trade-offs remain: cost and latency for long contexts, brittleness of naive persistent memory, and the need for dynamic selection, compression and task isolation. Startups like mem0, Zep and LangMem are actively exploring solutions.

Enterprise & Horizontal AI

AI is enabling challengers to threaten traditional Systems of Record by offering Systems of Action: platforms that ingest unstructured signals, generate code, automate mappings and act on data. Key unlocks include AI Trojan horse features that auto-capture data flow, dramatically faster implementation through codegen, and data translation that enables one-day migrations—making vendor lock-in less permanent.

Areas still hard to disrupt

Large-scale enterprise ERP and the long tail of domain-specific SoRs (identity platforms, dispatch systems, specialized CMS) remain hard to fully replace due to breadth and regulatory complexity. Bessemer suggests disruption here will be a multi-year, possibly decade-long effort.

Vertical AI

Vertical AI adoption is accelerating across healthcare, legal, education, real estate and home services where language and multimodal tasks were previously underserved. Pattern for winners: start with an embedded wedge solving a high-friction, high-value task (often voice-enabled), capture proprietary vertical data, and expand into broader workflow automation. Examples include Abridge (clinical notes), SmarterDx (coding), EvenUp (legal demand packages), and EliseAI (property management).

Consumer AI

General-purpose assistants remain central to consumer behavior, but modality shifts (voice) and richer memory enable deeper habitual use. Perplexity and agentic browsers like Comet point to new UX paradigms. Generative creative tools are proliferating across music, video, and multi-modal production, but specialized consumer apps must deliver persistent, differentiated value to supplant generalist assistants.

Five predictions (summary)

Agentic browsers become a dominant interface layer for autonomous workflows
Generative video crosses the chasm in 2026 with commercial-scale quality and tooling
Private, business-grounded evals and data lineage become essential for enterprise adoption
A new AI-native social giant could emerge around agents, memory and multimodal expression
Incumbents accelerate M&A to buy capabilities, driving consolidation especially in vertical AI

Founder guidance and final takeaways

Design for memory and context as product primitives, instrument private evals and lineage from day one, pick an AI wedge with clear 10x ROI, and balance go-to-market speed with durable unit economics. Expect acquisition interest from incumbents and plan defensibility around data, integrations and unique workflows. Success in this era favors teams that combine rapid execution with clarity about which real-world problems they are solving.

FAQ

How do I measure AI startup benchmarks for my company? Track ARR growth, gross margin, ARR/FTE and cohort retention, and run private evals for model performance and lineage.
What distinguishes a Supernova from a Shooting Star? Supernova = explosive top-line growth with possible weak retention and low margins; Shooting Star = faster-than-SaaS growth with healthier margins and retention.
Why is memory a competitive moat for AI startups? Persistent memory enables deeper personalization and higher switching costs, making products stickier.
What technical priorities should an enterprise AI product have? Private evals, data lineage, RAG/vector DBs, MCP integration, and observability for model drift.
How to prepare for incumbent M&A interest? Protect data assets, demonstrate customer ROI, and document integrations and APIs to preserve negotiating leverage.

```

Introduction

Context

AI benchmarks: two archetypes in detail

AI Supernovas

AI Shooting Stars

Roadmaps: model layer and infrastructure

"The second half of AI—starting now—will shift focus from solving problems to defining them."

Shunyu Yao, OpenAI (quoted in Bessemer’s report)

Infrastructure's Second Act

Developer platforms: MCP and memory

Memory as product primitive

Enterprise & Horizontal AI

Areas still hard to disrupt

Vertical AI

Consumer AI

Five predictions (summary)

Agentic browsers become a dominant interface layer for autonomous workflows
Generative video crosses the chasm in 2026 with commercial-scale quality and tooling
Private, business-grounded evals and data lineage become essential for enterprise adoption
A new AI-native social giant could emerge around agents, memory and multimodal expression
Incumbents accelerate M&A to buy capabilities, driving consolidation especially in vertical AI

Founder guidance and final takeaways

FAQ

How do I measure AI startup benchmarks for my company? Track ARR growth, gross margin, ARR/FTE and cohort retention, and run private evals for model performance and lineage.
What distinguishes a Supernova from a Shooting Star? Supernova = explosive top-line growth with possible weak retention and low margins; Shooting Star = faster-than-SaaS growth with healthier margins and retention.
Why is memory a competitive moat for AI startups? Persistent memory enables deeper personalization and higher switching costs, making products stickier.
What technical priorities should an enterprise AI product have? Private evals, data lineage, RAG/vector DBs, MCP integration, and observability for model drift.
How to prepare for incumbent M&A interest? Protect data assets, demonstrate customer ROI, and document integrations

What Great AI Startups Look Like in 2025

Introduction

Context

AI benchmarks: two archetypes in detail

AI Supernovas

AI Shooting Stars

Roadmaps: model layer and infrastructure

Infrastructure's Second Act

Developer platforms: MCP and memory

Memory as product primitive

Enterprise & Horizontal AI

Areas still hard to disrupt

Vertical AI

Consumer AI

Five predictions (summary)

Founder guidance and final takeaways

FAQ

Introduction

Context

AI benchmarks: two archetypes in detail

AI Supernovas

AI Shooting Stars

Roadmaps: model layer and infrastructure

Infrastructure's Second Act

Developer platforms: MCP and memory

Memory as product primitive

Enterprise & Horizontal AI

Areas still hard to disrupt

Vertical AI

Consumer AI

Five predictions (summary)

Founder guidance and final takeaways

FAQ

Related links: