Build or Buy Agentic AI? A Decision Framework From Someone Who Has Done Both

The most common question I get on a first call is some version of: "Should we just buy Copilot and ChatGPT licenses, or do we need something custom?"

I've done both sides of this. I've rolled out commercial AI tools across an enterprise, and I've built custom multi-agent systems — research pipelines, document copilots, and an intake-to-delivery agent factory that takes a work request and ships a reviewed deliverable with a human approving the final merge. So I don't have a horse in this race. Some weeks I'm the guy telling a team to stop building and just buy the thing.

Here's the framework I actually use.

## The short version

Buy when the workflow is generic. Build when the agent touches your proprietary data or a process that only exists inside your organization.

That's the whole decision in two sentences. The rest of this post is about why that line sits where it does, what it costs to be on either side of it in 2026, and how to score your own use case before you spend money.

## Buy when the workflow is generic

Chat, summarization, drafting emails, transcribing meetings, generating first-pass slides — these are commodity workflows. Every vendor does them, the quality differences are shrinking, and the switching costs are low. Building your own chat interface in 2026 is like building your own email client. You can, but why?

I run a platform with dozens of AI tools on it, and the honest truth is that a chunk of them are thin wrappers: a model, a prompt, a clean interface. For those, off-the-shelf tools or a lightweight internal hub gets you most of the value at a fraction of the cost. If your team just needs to summarize documents and draft content, buy licenses, write a short usage policy, and move on. You'll be productive in a week.

Buy signals, concretely:

- The workflow looks the same at your company as it does at any other company
- The output is a starting point a human will rework anyway
- No integration with your internal systems is required for the output to be useful
- A vendor demo already does 90% of what you imagined

## Build when the agent touches what makes you, you

The build case starts the moment the agent needs your data, your systems, or your judgment encoded as process.

Three patterns from my own work, described at concept level:

**Research pipelines.** I built a trend-scouting agent that monitors external sources on a schedule, filters against what the organization actually cares about, and produces a briefing. No off-the-shelf tool knows what "relevant" means to a specific business. That relevance model — the filters, the scoring, the format leadership actually reads — is the product. You can't buy it.

**Intake-to-delivery factories.** The most ambitious system I run is an agent pipeline that watches for incoming work requests, triages them, writes a requirements doc, produces the deliverable — decks, reports, sometimes working code in an isolated git branch — and then routes everything through an adversarial reviewer agent before a human approves anything that ships. Every stage encodes how *this* organization defines done. A vendor can sell you an agent framework. They cannot sell you your own definition of acceptable work.

**Domain copilots.** I built a system that lets a model search a large proprietary technical library and answer questions with citations, and another that lets a user upload a dense technical document and have the AI navigate and explain it live, by voice. The entire value is the proprietary corpus and the domain-specific behavior. Generic tools hallucinate confidently in exactly the places where these systems have to be right.

Build signals:

- The agent needs to read from or write to your internal systems
- The workflow encodes process knowledge that took your team years to develop
- Being wrong has real cost, so you need your own review gates and audit trail
- The capability would be a competitive advantage if it worked well

## What building actually costs in 2026

This is where the conversation has changed the most in two years, and where most people's intuition is stale.

Model spend is no longer the expensive part. In my agent factory, jobs run under hard budget caps that I set deliberately low: a requirements-doc run is capped around a dollar of model spend, a full code build — where an agent writes, compiles, and type-checks real code over dozens of turns — is capped at a few dollars. Those caps trip mid-stream if a job runs hot. We almost never hit them.

Think about what that means. A piece of work that would take a person half a day runs as a twenty-minute agent job for less than the cost of a coffee. The marginal cost of agentic work has collapsed.

What hasn't collapsed is the engineering around it. The harness is the hard part: the job queue, the budget enforcement, the review agent that hunts for security and scope problems, the kill switch, the human approval gates. That took weeks of real engineering, and it's where almost all the project risk lives. The honest 2026 cost model is:

- **Buying:** per-seat licenses forever, near-zero setup, capped upside
- **Building:** dev-weeks up front for the harness, then agent-runs that cost single-digit dollars, with the ceiling set by your ambition rather than a vendor's roadmap

If a workflow runs hundreds of times a month and touches your data, the build math gets very good, very fast. If it runs occasionally and any vendor could do it, the build math never works.

## The 80% trap

The most expensive mistake I see isn't buying or building. It's buying a tool that gets you 80% of the way and assuming the last 20% is a configuration problem.

It never is. The last 20% is your data model, your permissions, your approval chain, your edge cases — precisely the parts the vendor couldn't have known about. Teams burn months bending a generic product around a proprietary process, end up with brittle workarounds, and then build anyway, a year late.

The tell is in the demo meeting. If you hear "we can probably script around that" more than twice, you're looking at a build problem wearing a buy costume. The 20% you'd be scripting around is usually the entire reason the project matters.

The inverse trap exists too: engineers who build a custom chat interface because building is fun. If the vendor demo genuinely covers the workflow, the 80% you get on day one beats the 95% you'd ship next quarter.

## A three-question scorecard

When someone brings me a use case, I score it with three questions:

**1. Would this workflow look identical at your competitor?**
If yes, that's a buy signal. Commodity workflows deserve commodity tools. If your version is different because of your data, your process, or your standards, that's a build signal.

**2. Does the output need to land somewhere?**
A summary a human reads is a buy. An output that has to flow into your systems — create the ticket, update the record, open the pull request, route to the right approver — is a build, because integration depth is where off-the-shelf tools quietly give up.

**3. Is this a cost or a moat?**
If the goal is shaving minutes off a routine task, buy. If the capability working well would change what your business can offer — faster delivery, a service competitors can't match, institutional knowledge made queryable — build, because you don't rent a moat.

Zero or one build signals: buy, and revisit in a year. Two or three: build, but start small. Which brings me to how I de-risk that.

## What a four-week pilot looks like

I don't recommend anyone sign up for a six-month custom build on faith. Every build engagement I run starts as a four-week pilot scoped to one workflow. With Global Quest, it looks like this:

**Week 1 — Map and pick.** We sit with the people who do the work and map the real workflow, not the org-chart version. We pick one slice with clear inputs, a checkable output, and a human who can judge quality. We run the scorecard honestly — sometimes week one ends with "buy this instead," and that's a successful pilot too.

**Week 2 — Working skeleton on real data.** An agent doing the actual task on your actual data, end to end, ugly. No slideware. The goal is to find where the workflow fights back, because it always fights back somewhere, and finding that spot in week two is the whole point of piloting.

**Week 3 — Hardening.** Evaluation cases from real examples, budget caps on every run, logging, and a human approval gate at the point of consequence. This is the same pattern I run in production: agents propose, people approve. A pilot without review gates isn't a pilot, it's a liability with a demo.

**Week 4 — Decision.** You get a working system, the cost data from real runs — actual dollars per job, actual minutes saved — and a straight recommendation: scale it, adjust it, or stop. About the highest-value thing a pilot can produce is a cheap, fast, well-evidenced "no."

At the end of four weeks you know what this class of automation costs you, what it returns, and whether the build case holds. That beats any vendor comparison spreadsheet I've ever seen.

## Where to start

If you're weighing this decision right now: run the scorecard on your top three AI ideas before you talk to any vendor, including me. Generic workflow, human-read output, cost-saving goal? Buy it this quarter. Proprietary data, system integration, moat potential? That's worth a pilot.

If you want to see what the build side looks like in practice, the [Spark AI platform](/projects/spark-ai-platform) and [AI Business Empire](/projects/ai-business-empire) case studies show the kind of agentic systems I'm describing — including the agent factory with its review gates and budget caps. And if you'd rather just talk through your specific case, [get in touch](/contact). The first conversation is the scorecard, and sometimes the answer really is "just buy it."