Surplus.ai — Explain Like I'm Six

🎪 Agent Orchestration 6 concepts

🔄

ToolLoopAgent

Vercel AI SDK 6

A robot chef who keeps cooking. Think about what to make → grab a tool (knife, pan) → look at what happened → repeat until dinner is ready.

┌──────────────────────────────────┐ │ ToolLoopAgent │ │ │ │ 🤔 Think ──→ 🔧 Use Tool │ │ ↑ ↓ │ │ └──── 👀 Observe ──┘ │ │ │ │ Loop until: "I'm done!" ✅ │ └──────────────────────────────────┘

Tomorrow: This is how every agent in Surplus works. It thinks, uses tools (search, fetch), checks the result, and keeps going until it has a complete answer.

👨‍👧‍👦

Sub-agents / Agents as Tools

agent composition

A boss who delegates. The parent agent says "hey, you're good at competitor research, handle this" and the child does the work and hands back results. Just like calling any other tool, except the tool is another brain.

🧠 Parent Agent ├── 🔧 webSearch(...) ← regular tool ├── 🔧 fetchPage(...) ← regular tool └── 🧠 competitorAgent(...) ← agent-as-tool! ├── 🔧 webSearch(...) └── 🔧 fetchPage(...) └── returns: {competitors: [...]}

Tomorrow: Surplus has 25 agents. Some could call others, letting a "research orchestrator" delegate to specialist agents in parallel.

🎰

Promise.allSettled

JavaScript built-in

You send 5 kids to buy groceries at 5 different stores. Some come back with stuff, one gets lost. Instead of canceling everything because one kid failed, you just use what the other 4 brought home.

Promise.allSettled([ searchCompetitors(), → ✅ {status: "fulfilled", value: [...]} searchAlternatives(), → ✅ {status: "fulfilled", value: [...]} fetchPricing(), → ❌ {status: "rejected", reason: "timeout"} enrichProduct(), → ✅ {status: "fulfilled", value: {...}} ]) // 3 out of 4 succeeded. Use those. Don't crash.

Tomorrow: When running 25 agents in parallel, some will fail (rate limits, timeouts). This lets you get results from everything that worked.

🚪

p-limit

npm: p-limit

A bouncer at a nightclub. Only lets 5 people in at a time. When someone leaves, the next person in line gets in. Prevents the club (API) from getting overwhelmed.

const limit = pLimit(5); // max 5 at once 🚪 Door (concurrency = 5) [🏃🏃🏃🏃🏃] ← inside (running) [🧍🧍🧍🧍...] ← waiting in line Agent 1 finishes → next in line enters

Tomorrow: OpenRouter has rate limits. You can't fire 25 agents simultaneously, so p-limit lets you run, say, 5 at a time without hitting 429 errors.

🔢

Semaphore

concurrency primitive

Same as the bouncer, but fancier word. A counter that goes down when someone enters and up when they leave. When it hits 0, everyone waits. p-limit is basically a semaphore with a nice API.

Semaphore(3): counter = 3 acquire() → counter = 2 (let someone in) acquire() → counter = 1 acquire() → counter = 0 acquire() → ⏳ WAIT (full!) release() → counter = 1 (someone left, next!)

Tomorrow: Same idea as p-limit. You might see this term in architecture discussions. It's the CS term for "limited slots."

🗺️

Map-Reduce Pattern

parallel processing

You have 1000 surveys to count. Split them into 10 piles (map), give each pile to a friend, they each count theirs, then add all the totals together (reduce). Faster than doing it alone.

10 workspace sections ↓ MAP (split + parallel work) [Products] [Segments] [Pricing] [Competitors] ... Agent 1 Agent 2 Agent 3 Agent 4 ↓ ↓ ↓ ↓ result 1 result 2 result 3 result 4 ↓ REDUCE (combine) 📋 Complete workspace analysis

Tomorrow: Each workspace section can be enriched by a different agent in parallel, then results combined into one view. This is the core pattern for Task 1.

🔭 Observability 4 concepts

✈️

OpenTelemetry (OTel)

open standard

A flight recorder for your app. Records everything that happens: what was called, how long it took, what failed. Everyone agrees on the format, so any tool can read the recordings.

Your App ──→ OTel SDK ──→ 📦 Export to: ├── Jaeger (free, self-hosted) ├── Honeycomb (cloud) ├── Datadog (enterprise) └── Console (just print it)

Tomorrow: When 25 agents are running in parallel, you need to see what happened. OTel is the industry standard way to instrument this.

📏

Spans

OTel unit of work

One task on a to-do list, with a start time and end time. Tasks can have sub-tasks. "Run competitor agent" is a span. Inside it: "call OpenRouter" is a child span. "Parse response" is another child span.

┌─ Span: runCompetitorAgent (1200ms) ─────────────┐ │ ┌─ Span: callLLM (800ms) ──────────────┐ │ │ │ ┌─ Span: httpRequest (750ms) ──┐ │ │ │ │ └──────────────────────────────┘ │ │ │ └───────────────────────────────────────┘ │ │ ┌─ Span: parseResponse (50ms) ─┐ │ │ └───────────────────────────────┘ │ └──────────────────────────────────────────────────┘

Tomorrow: Each agent run, each LLM call, each tool use becomes a span. You can see exactly where time is spent and what's slow.

🧵

Trace

OTel: collection of spans

The complete story of one request, from start to finish. User clicks "Enrich Product" → trace captures every agent, every LLM call, every tool use along the way. One trace = many spans strung together.

Trace: "Enrich Product X" (trace_id: abc123) │ ├── Span: productAgent (parent) │ ├── Span: webSearch "Product X specs" │ ├── Span: callLLM (extract features) │ └── Span: webFetch (manufacturer page) │ └── Span: saveToWorkspace └── Span: writeYAML

Tomorrow: When debugging "why did this agent give a bad answer?", the trace shows you the entire chain of events.

📝

JSONL Trace Logging

.jsonl files

A notebook where you write one fact per line, and each fact is a tiny structured document. Easy to read, easy to search, easy to stream. No complex database needed.

traces/2026-03-11.jsonl: {"ts":"09:01","agent":"competitor","action":"start","input":"Acme Corp"} {"ts":"09:02","agent":"competitor","action":"tool","tool":"webSearch"} {"ts":"09:03","agent":"competitor","action":"complete","tokens":1847} {"ts":"09:03","agent":"pricing","action":"start","input":"Acme Corp"} ...

Tomorrow: Surplus already uses this. Every agent run is logged as JSONL. Simple, greppable, no database required. Perfect for the current "no DB" stage.

📊 Evaluation 4 concepts

👨‍⚖️

LLM-as-Judge

eval pattern

You hire a senior teacher to grade homework done by a junior teacher. GPT-4o grades what Haiku 4.5 wrote. "Is this competitor analysis complete? Score 1-5." Costs more per judgment, but way cheaper than a human.

Haiku 4.5 (cheap, fast) │ └── output: "Acme Corp has 3 competitors..." │ ▼ GPT-4o (expensive, smart) "Score: 3/5 — missing pricing data"

Tomorrow: With 25 agents producing outputs, you need automated quality checks. LLM-as-judge can grade agent outputs at scale.

✅

Deterministic Judge

rule-based eval

Simple pass/fail rules. Did it return at least 3 competitors? Is the price a positive number? Does the JSON have all required fields? No AI needed, just code.

function judge(output) { if (output.competitors.length < 3) return FAIL if (output.competitors.some(c => !c.price)) return FAIL if (!output.summary) return FAIL return PASS ✅ }

Tomorrow: The first line of defense. Catch obviously broken outputs before even bothering with expensive LLM judging.

🔁

Regression Testing

eval pattern

You take a photo of your room today. Next week, you take another photo. Compare them. Did anything move that shouldn't have? Same idea: save a known-good output, then check new outputs against it.

v1 output: "Acme has 5 competitors: A, B, C, D, E" ← saved v2 output: "Acme has 2 competitors: A, B" ← uh oh Regression detected! 🚨 - Lost 3 competitors (C, D, E) - Quality decreased after prompt change

Tomorrow: When you change a prompt or switch models, regression tests catch if things got worse. Essential for confident iteration.

🏆

Golden Output

reference data

The "correct answer" sheet that a teacher uses to grade tests. You manually verify one perfect output, save it, and compare all future outputs against it. If they drift too far, something's wrong.

golden/competitor-acme.json ← verified by human ✓ { "competitors": ["X","Y","Z","W","V"], "pricing": {"X": 99, "Y": 149, ...}, "features": [...] } New output → diff against golden → flag changes

Tomorrow: Start with a few golden outputs for key agents. They become your safety net as you iterate on prompts and models.

🏗️ Architecture Patterns 4 concepts

🔌

tRPC

typesafe API layer

Imagine your frontend and backend speak the same language perfectly. No translation needed. You change a function on the backend, and TypeScript instantly tells you everywhere the frontend needs to update. No REST, no GraphQL, no API docs.

// Backend defines: router({ getProduct: query(z.string(), async (id) => {...}) }) // Frontend calls (fully typed, autocomplete works): const product = trpc.getProduct.useQuery("abc") // ^ TypeScript knows the exact shape! Change backend return type → frontend gets red squiggly instantly

Tomorrow: Surplus uses tRPC. This is how the UI talks to agents. Type-safe end-to-end means fewer bugs when you change things.

🏭

defineAgent Pattern

factory function

A cookie cutter for agents. You define the shape once (name, model, tools, prompt), and it stamps out a fully configured agent. Like how defineComponent works in Vue or how you'd create a React component.

const competitorAgent = defineAgent({ name: "competitor-discovery", model: "haiku-4.5", // per-agent model choice tools: [webSearch, webFetch], systemPrompt: "You find competitors...", outputSchema: z.object({...}), // Zod validated }) // Type-safe! TypeScript knows input/output shapes. // 25 agents, all defined the same way. Consistent.

Tomorrow: This is how all 25 Surplus agents are defined. Consistent pattern, type-safe configs, easy to add new agents.

📖

Agent Registry

central lookup

A phone book for agents. Every agent registers itself by name. When you need the "competitor-discovery" agent, you look it up in the registry instead of importing it directly. Makes it easy to list all agents, swap implementations, or add new ones.

const registry = new Map<string, Agent>() registry.set("competitor-discovery", competitorAgent) registry.set("pricing-analysis", pricingAgent) registry.set("product-enrichment", productAgent) ... // 25 total // Usage: const agent = registry.get("competitor-discovery") agent.run({ company: "Acme" })

Tomorrow: The registry is how Surplus organizes 25 agents. Lookup by name, iterate over all agents, or run a subset.

🌊

NDJSON Streaming

newline-delimited JSON

A live news ticker. Instead of waiting for the whole answer, you get one update at a time as they happen. Each line is a complete JSON object. The UI can show progress in real-time.

// Stream response (one line per event): {"type":"start","agent":"competitor","ts":"09:01:00"} {"type":"progress","message":"Searching for competitors..."} {"type":"tool","name":"webSearch","query":"Acme competitors"} {"type":"result","competitor":"RivalCo","confidence":0.92} {"type":"result","competitor":"AltCorp","confidence":0.87} {"type":"done","total":5,"duration":"3.2s"} // UI updates live as each line arrives 🎯

Tomorrow: This is how Surplus streams agent results to the UI. Users see progress in real-time instead of waiting for a spinner.