20 April 2026·9 min read

AI·20 April 2026·9 min read

How to ship a production AI chatbot in 14 days

Fourteen days from zero to a live AI chatbot your company can actually use. The schedule we follow on every client project, down to what happens on each day.

Last verified20 April 2026

Listen

By Mező DezsőFounder, DField Solutions

ShareX LinkedIn#

How to ship a production AI chatbot in 14 days

The AI chatbot pitch is easy: drop it on your site, save the support team 20 hours a week. The reality is that most teams spend three months on this and never ship - not because it's hard, but because nobody ever wrote down the days. Here's our default 14-day schedule, every step calibrated on the last ten projects we did.

None of this is magic. Each day does one concrete thing, and if you can't do today's, you don't move to tomorrow.

Days 1-2: pick one measurable job

Don't say 'we want AI.' Say 'we want ticket volume down from 200/week to 50.' If you can't put a number on it, spend two more weeks measuring before you start. Every other decision - which data, which model, how to evaluate - flows from that one number.

Days 3-5: one folder for the knowledge

The quality of an AI chatbot is the quality of its data, not the quality of its model. Pull your FAQ, the last three months of support email threads, product specs, and pricing into one folder - Drive, Notion, doesn't matter. This is the actual moat. Spend two full days on this even if it feels boring.

Days 6-8: retrieval, not model

We build a hybrid retriever: BM25 keyword + vector + reranker. The chatbot may only answer from the retrieved chunks, always with source citations. If the retriever finds nothing, the bot refuses to answer. This is the part that lives through the next six model swaps.

# Simplified retrieval flow
from dfield.retrieval import HybridRetriever

retriever = HybridRetriever(
    bm25_weight=0.4,
    vector_weight=0.6,
    reranker="bge-reranker-v2-m3",
    refuse_below_score=0.55,
)
chunks = retriever.search(query, top_k=8)
if not chunks:
    return "I don't have an answer for that in the company docs."

Days 9-10: the eval set is the product

50 real historical questions, pulled from the actual support archive. For each, write down what a good answer looks like (one sentence). Run the bot on all 50, score each answer - pass / fail / almost. Gate the release on >85% pass. This is the step most teams skip, which is also why their bots silently rot.

If pass rate is below 85%, the fix is almost always in the knowledge folder, not the model. Hallucination usually means 'the right document wasn't in the retrieval index.'

Days 11-12: guardrails + cost routing

PII scrubber on input and output, prompt-injection detector, output schema validation. And cost routing: a small model handles easy questions (90% of traffic), a big model only the hard ones. In practice this cuts LLM spend by 3-5× with no quality impact.

Days 13-14: deploy + real-time dashboard

Deploy to the widget on your site and to Slack for the support team. Ship with a dashboard from day one: questions per day, pass rate (sampling 5% live), cost per active user, top-5 questions the bot refused. The dashboard is the thing that makes week 3 through week 20 better than week 2.

Where people lose three months

Fine-tuning before proving retrieval works - almost always a waste of money.
No eval set, so quality is 'vibes' and nobody dares to ship.
Only one model, locked in when prices and quality change monthly.
No source citations, so the support team can't verify answers quickly.
Ship without a dashboard, blind to what's actually going wrong.

What it actually costs

The 14 days cost €10-18k depending on scope. Monthly run costs €200-800 for most businesses (LLM usage, hosting, monitoring). If your support team spends even 15 hours a week answering repeat questions, the bot pays for itself in three months.

Want to see whether a 14-day ship is realistic for your setup? A 30-minute call is free. Tell us the measurable job, we'll tell you what's possible inside two weeks and what isn't.

ShareX LinkedIn#

Mező Dezső

Founder, DField Solutions

I'm a full-stack engineer and I build across the whole stack myself · AI agents, web and mobile apps, blockchain, backends, security, right down to the OS layer. If it's software, I've probably built it and broken it.

ABOUT Let's talk

Keep reading

22 Apr 2026·11 min read

pgvector at 10M+ rows: index, queries, real numbers

pgvector at 10M rows is not scary · if you pick the right index. HNSW vs IVFFlat, filter patterns, real numbers.

Read

22 Apr 2026·8 min read

LLM prompt caching in production · a 60-80% cost cut

Prompt caching is the single biggest LLM cost lever in 2026. 4 patterns, real savings numbers, 2 gotchas worth knowing.

Read

22 Apr 2026·11 min read

LLM evals-as-code · the CI gate we run on every RAG deploy

An eval that's not in CI is not an eval. Here's the evals-as-code workflow we run on every RAG project.

Read

RELATED PROJECTS

Websites, web apps & online shops · Custom software · everything else · AI solutions · 2026Vilya ProtectionVilya Protection · assassination-prevention software platform for public figures and large events. The demo shows the full operational dashboard.

Custom software · everything else · Websites, web apps & online shops · AI solutions · 2026AutoImportEU→HU car-import arbitrage platform - turns 'you can buy this car abroad and resell it at home' into a live, scored feed.

AI solutions · Websites, web apps & online shops · Custom software · everything else · 2026ClarixAIA misconception-pattern radar for teachers · open-ended student answers in, the reasoning errors dominating a cohort out.

Let's talk

Would rather build together?

Let's talk about your project. 30 minutes, no strings.

Let's talk