Written by

Paulina Laba

June 2, 2026

9 min read

Lore and Braintrust are often compared because both live under the broad label of "AI observability." They are not substitutes. Lore captures the AI coding sessions your engineers run to build software. Braintrust evaluates and monitors the LLM features that run inside the software you ship. Most teams that look at both end up needing one, and a few need both for different jobs.

This guide tells the two apart in about five minutes, with an honest account of what each tool is good at and a simple decision rule at the end.

The one-sentence difference

Braintrust watches the AI inside your product. Lore captures the AI your team uses to build the product.

If your application calls an LLM API at runtime and you need to know whether those responses are good, fast, and cheap, that is Braintrust's job. If your engineers spend their day inside Claude Code, Codex, or Cursor and that reasoning is vanishing into closed terminals, that is Lore's job. The unit Braintrust tracks is one API call or trace. The unit Lore tracks is one coding session.

At a glance

	Lore	Braintrust
Category	AI coding session capture and sharing	LLM evaluation and observability
What it tracks	Coding-agent sessions (Claude Code, Codex, Cursor, Cowork)	LLM API calls, traces, and evals inside your app
Primary user	The whole engineering team and its managers	The engineer or ML team who owns an LLM feature
Core unit	One session	One trace or eval run
Main question answered	"How was this actually built, and can the team reuse it?"	"Is the AI in our product good, fast, and cheap enough to ship?"
Where it sits	Next to your coding agent	Between your application and the LLM API
Output	A searchable, shareable URL for each session	Dashboards, eval scores, traces, and quality gates
Free tier	Yes ($0)	Yes (Starter)
Paid entry	Team $20/seat/mo (min 2 seats)	Pro $249/mo

What Braintrust is

Braintrust is an AI observability and evaluation platform for teams building LLM-powered products. Its own headline is "Ship quality AI at scale," and its three pillars are observability, evals, and automation. It is used by teams at companies including Vercel, Notion, Coursera, Dropbox, and Replit to test prompts before release and monitor model behavior in production.

In practice, Braintrust does three things well:

Tracing and monitoring. It captures a full trace for every request your app makes to a model: prompts, retrieved documents, tool calls, latency, token usage, cost, and errors. Its purpose-built store, Brainstore, is designed to query millions of traces quickly.
Evaluation. You define what "good" looks like and score outputs with code, an LLM judge, or humans. Braintrust ships more than 25 built-in scorers through its open-source autoevals library, covering dimensions like factuality and relevance, and lets you turn a production trace into an eval test case in one click.
Quality gates and automation. Native GitHub Actions run offline evals on pull requests and block merges when quality drops below a threshold. Online scoring catches regressions in production, and its Loop feature proposes improved prompts and scorers automatically.

Braintrust is framework-agnostic with SDKs for Python, TypeScript, Go, Ruby, and C#, and it carries enterprise compliance (SOC 2 Type II, GDPR, HIPAA-ready, SSO, RBAC). If you ship a product that calls an LLM at runtime, this is the category you are looking for.

What Lore is

Lore is the home for your team's AI coding sessions: it turns every Claude Code, Codex, Cursor, and Cowork session into a searchable, shareable URL the whole team can read. Think GitHub, but for the AI sessions behind your code rather than the code itself. The reasoning that used to happen in Slack threads and design reviews now happens inside agent sessions, and Lore makes those sessions legible to everyone.

The workflow is one command. Run /lore:share inside a Claude Code or Cowork session, or /share-codex inside a Codex session, and you get a URL. The full thread renders in any browser: prompts, tool calls, diffs, and the moment a hard problem finally clicked. From there it is searchable across your workspace, linkable from a PR, and open to block-level comments.

Lore exists because the AI era split engineering work into two surfaces. The first is the LLM calls a product makes. The second is the AI sessions an engineer runs to write the code that ships. Braintrust owns the first surface. Lore owns the second.

The distinction that actually matters

The cleanest way to separate these tools is to ask where the AI lives.

Is the AI inside your product, or inside your team? If your application calls an LLM API as part of what users experience, you have a product-runtime problem, and you want Braintrust. If your engineers use an AI agent to author the code that ships, you have a team-knowledge problem, and you want Lore.

What is the unit you want to capture? If you are tracking individual API calls and their cost, latency, and quality, that is Braintrust. If you are tracking multi-hour sessions and the reasoning inside them, that is Lore.

These rarely overlap. The two tools can run side by side without knowing the other exists, because there is nothing to integrate: one sits between your app and a model API, the other sits next to your coding agent.

Feature comparison

Capability	Lore	Braintrust
Capture coding-agent sessions	Yes, automatic via CLI	No
Share a session as a URL	Yes	No
Team-wide search over sessions	Yes	No
Workspace and public visibility model	Yes	No
Block-level comments and review	Yes	No
LLM API trace logging	No	Yes
Prompt and model evals	No	Yes (25+ built-in scorers)
Cost and latency dashboards	No	Yes
CI quality gates on pull requests	No	Yes
Online scoring of production traffic	No	Yes
SDKs for app instrumentation	No	Yes (Python, TS, Go, Ruby, C#)

The table is mostly two columns of opposites on purpose. These products do not compete on features; they cover different parts of the AI engineering stack.

Pricing comparison

Plan	Lore	Braintrust
Free	$0, share threads from Claude Code, Codex, and Cowork; shared links never expire	Starter: 1 GB processed data/mo, 10,000 scores, unlimited users
Paid	Team: $20/seat/mo (minimum 2 seats), workspace-wide sharing, Review (beta), SSO-ready	Pro: $249/mo, 5 GB data, 50,000 scores, 30-day retention
Enterprise	Not a separate tier; the Team plan is SSO-ready	Enterprise: custom, SSO/SAML, RBAC, HIPAA/BAA, on-prem

The pricing models differ in shape, which tells you something about the products. Lore charges per seat on its Team plan, because its value scales with how many engineers share and read sessions, and its free tier lets anyone start. Braintrust charges by data volume and number of scores with unlimited users, because its value scales with how much production traffic you evaluate, not how many people log in. Figures are current as of June 2026; check each vendor's pricing page for the latest.

When to use Braintrust

Choose Braintrust if any of these are true:

You ship a product or feature that calls an LLM API at runtime.
You need to know whether a prompt or model change improved or regressed output quality before you deploy.
You want cost, latency, and error dashboards for your AI feature in production.
You want CI checks that block a merge when an eval score drops.

When to use Lore

Choose Lore if any of these are true:

Your team writes code with Claude Code, Codex, Cursor, or Cowork every day.
The reasoning behind your codebase keeps disappearing into individual agent sessions nobody else sees.
New hires take weeks to learn how your team actually works with AI tools.
You want to send a teammate the whole session behind a change, not a cropped screenshot or a one-line "here's what I prompted."

When you need both

A company building an LLM-powered product needs both, for two different jobs. Braintrust evaluates and monitors the AI features inside the product. Lore captures the Claude Code and Codex sessions the engineers run to build those features. There is no overlap and no integration to set up; you point each tool at the surface it was built for.

If you have to pick one to start with and your team mostly lives inside coding agents, Lore tends to have more immediate leverage. The reasoning behind your codebase is more valuable to capture, and harder to recover later, than any single API call.

Frequently asked questions

Is Lore a Braintrust alternative?

Not directly. Braintrust evaluates and monitors the LLM calls inside the product you ship. Lore captures and shares the AI coding sessions your engineers run to build software. They solve different problems, so for most teams one does not replace the other.

Can Lore and Braintrust be used together?

Yes. They cover different layers of the AI engineering stack and never touch the same data. You can run Braintrust on your application's LLM API calls and Lore on your engineers' coding sessions at the same time, with nothing to integrate between them.

Does Lore do LLM evals or cost tracking?

No. Lore does not evaluate prompts, score model outputs, or track API cost and latency. If you need those, use an evaluation and observability platform like Braintrust. Lore captures coding-agent sessions and makes them searchable and shareable across your team.

Does Braintrust capture Claude Code or Cursor sessions?

No. Braintrust instruments the LLM API calls your application makes in production. It is not built to capture, share, or search the coding-agent sessions your engineers run while writing code. That is what Lore does.

Which is better for an engineering team that uses AI to write code?

Lore. It is built for teams whose day-to-day work happens inside Claude Code, Codex, Cursor, and Cowork, turning each session into a searchable URL the rest of the team can read, link, and comment on. Braintrust is built for teams shipping LLM features inside a product.

How much do Lore and Braintrust cost?

Lore is free to start, with shared links that never expire on any tier. Its Team plan is $20/seat per month (minimum 2 seats) and adds workspace-wide sharing. Braintrust offers a free Starter tier, Pro at $249/month, and custom Enterprise pricing. Pricing is current as of June 2026.

The short version

Both tools matter in an AI-first engineering org, but for different reasons. Braintrust answers "is the AI in our product good enough to ship?" Lore answers "how was this actually built, and can the rest of the team learn from it?" Match the tool to the surface: Braintrust for the AI inside your product, Lore for the AI your team uses to build it.

If your engineers have moved from typing assistance to thinking assistance over the past year, the gap that is about to bite you is not your application's prompt logs. It is your team's session reasoning. That is what Lore is built to make legible.