How to ship a production AI chatbot in 14 days
Fourteen days from zero to a live AI chatbot your company can actually use. The schedule we follow on every client project, down to what happens on each day.
Fourteen days from zero to a live AI chatbot your company can actually use. The schedule we follow on every client project, down to what happens on each day.
The AI chatbot pitch is easy: drop it on your site, save the support team 20 hours a week. The reality is that most teams spend three months on this and never ship — not because it's hard, but because nobody ever wrote down the days. Here's our default 14-day schedule, every step calibrated on the last ten projects we did.
None of this is magic. Each day does one concrete thing, and if you can't do today's, you don't move to tomorrow.
Don't say 'we want AI.' Say 'we want ticket volume down from 200/week to 50.' If you can't put a number on it, spend two more weeks measuring before you start. Every other decision — which data, which model, how to evaluate — flows from that one number.
The quality of an AI chatbot is the quality of its data, not the quality of its model. Pull your FAQ, the last three months of support email threads, product specs, and pricing into one folder — Drive, Notion, doesn't matter. This is the actual moat. Spend two full days on this even if it feels boring.
We build a hybrid retriever: BM25 keyword + vector + reranker. The chatbot may only answer from the retrieved chunks, always with source citations. If the retriever finds nothing, the bot refuses to answer. This is the part that lives through the next six model swaps.
# Simplified retrieval flow
from dfield.retrieval import HybridRetriever
retriever = HybridRetriever(
bm25_weight=0.4,
vector_weight=0.6,
reranker="bge-reranker-v2-m3",
refuse_below_score=0.55,
)
chunks = retriever.search(query, top_k=8)
if not chunks:
return "I don't have an answer for that in the company docs."50 real historical questions, pulled from the actual support archive. For each, write down what a good answer looks like (one sentence). Run the bot on all 50, score each answer — pass / fail / almost. Gate the release on >85% pass. This is the step most teams skip, which is also why their bots silently rot.
If pass rate is below 85%, the fix is almost always in the knowledge folder, not the model. Hallucination usually means 'the right document wasn't in the retrieval index.'
PII scrubber on input and output, prompt-injection detector, output schema validation. And cost routing: a small model handles easy questions (90% of traffic), a big model only the hard ones. In practice this cuts LLM spend by 3–5× with no quality impact.
Deploy to the widget on your site and to Slack for the support team. Ship with a dashboard from day one: questions per day, pass rate (sampling 5% live), cost per active user, top-5 questions the bot refused. The dashboard is the thing that makes week 3 through week 20 better than week 2.
The 14 days cost €10–18k depending on scope. Monthly run costs €200–800 for most businesses (LLM usage, hosting, monitoring). If your support team spends even 15 hours a week answering repeat questions, the bot pays for itself in three months.
Want to see whether a 14-day ship is realistic for your setup? A 30-minute call is free. Tell us the measurable job, we'll tell you what's possible inside two weeks and what isn't.

By
Founder, DField Solutions
I've shipped production products from fintech to creator-tooling · for startups and enterprises, from Budapest to San Francisco.
Keep reading
RELATED PROJECTS