---
title: "How to hire an AI development team in 2026 · 9 questions to ask before signing"
description: "A buyer's checklist for hiring an AI engineering studio in 2026. Nine concrete questions that separate production-grade engineering from agency overhead — covering eval harnesses, AI Act compliance, fix-PR delivery, on-call SLAs and senior-only staffing."
date: 2026-04-29
updated: 2026-04-29
author: "Dezső Mező"
tags: "AI, Hiring, Buyer guide, AI development, Procurement"
slug: how-to-hire-ai-development-team-2026
canonical: https://dfieldsolutions.com/blog/how-to-hire-ai-development-team-2026
---

# How to hire an AI development team in 2026 · 9 questions to ask before signing

Most AI agency pitches sound the same. Nine questions surface the difference between a production engineering team and a deck-driven sales motion · ask all nine before signing.
Most AI agency pitches sound the same: "production-grade · senior team · we ship outcomes not slides." Reading three of those decks back-to-back, you'd struggle to tell the studios apart. The nine questions below are how to tell.

**TL;DR**
- Two questions are about staffing (who writes the code, are they senior).
- Two are about deliverables (sample artifacts, eval harness in CI).
- Two are about compliance + ownership (paperwork timeline, who owns the weights).
- Two are about risk (NDA, exit clause).
- One is about scope honesty (will they take a small engagement).

## Why nine questions instead of "check the case studies"

Case studies on a studio's website tell you what they shipped, not how. A 14-day audit and a 14-month embedded retainer can both produce the same case-study bullet point ("shipped X, lifted Y by Z%") · the difference is in the *engagement shape* and the *staffing reality*. The nine questions surface the engagement shape · the case studies confirm it.

Below are the nine, with what a good answer sounds like and the specific red flags to listen for.

## 1. Who specifically writes the code?

Get names. Cross-reference on LinkedIn or GitHub. The good answer is a list of 1–4 senior engineers, plus the founder if it's a small studio (founders who don't ship code review every release · ask which they do). The red-flag answer is "a team" or "depends on the project" · those translate to junior subcontractors rotating through your codebase.

> **TIP:** Ask for one named senior engineer who'll be your primary contact · and the same name on every invoice. Studios that rotate engineers between accounts are running an agency, not engineering.

## 2. Can I see a sample deliverable from a similar engagement?

An NDA-redacted audit report. A fix-PR description. A piece of the eval harness with the test names visible. A studio that can't show you a real artifact is going to learn on your project · their first "sample" deliverable is going to be yours, and you'll pay for the learning curve.

What good looks like: a Markdown audit report, 8–15 pages, with reproduction steps for every finding, and a screenshot of the fix-PR diff against the client's actual repo (with the repo URL redacted). What bad looks like: a 30-page PDF deck with stock photos and zero reproducible artifacts.

## 3. Is the eval harness in CI, or a one-off?

Eval that runs once at hand-over is theatre · the model upgrade three months later silently breaks 18% of your test prompts and nobody notices until customer support drowns. A real eval pipeline runs on every release with regression tracking (giskard, promptfoo, ragas, custom suites), prompt-injection coverage, and cost telemetry so you see when the average request cost doubles overnight.

```yaml
# What 'eval in CI' looks like in practice
# .github/workflows/llm-eval.yml
name: LLM eval suite
on: [push, pull_request]
jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npx promptfoo eval --output results.json
      - run: npx promptfoo eval-compare --baseline main --threshold 0.92
      - if: failure()
        run: echo "Eval regression vs baseline · failing build"
```

## 4. On-call window + production-incident SLA

30-day on-call window with sub-1-hour response is the European-studio standard in 2026 for any production-touching engagement. "Best effort during business hours" is a polite way to say "you babysit it." Ask explicitly: how do I reach the on-call engineer, what's the response SLA, what happens if you breach it?

## 5. AI Act / GDPR / NIS2 paperwork timeline

The compliance work should be a line item from scoping. AI Act risk classification + DPIA at scoping; GDPR DPA template + processing inventory mid-build; NIS2 incident-response runbook before launch. If 'compliance is extra' shows up halfway through the build as a separate €4–8k charge, you're being price-hiked.

> **WARN:** If the studio doesn't even mention AI Act in the scoping doc, walk. Either they're not aware (which is bad) or they're hoping you won't ask (which is worse).

## 6. Who owns the model weights, eval data, prompts?

You should own all three. Ask explicitly: "At hand-over, do I own the fine-tuned weights, the eval test set, and the prompt history?" Some studios retain a license to resell the eval suite to a competitor of yours, or keep the fine-tuned weights as their IP. Read the IP assignment clause before signing · this is the single line item where mid-tier studios most commonly play games.

## 7. NDA before the first scoping call

Yes is the only acceptable answer. If the studio pushes back ("we'll sign after the first call"), the project is sensitive enough that you don't want to share it with someone who treats NDAs as friction. We sign yours; we send ours; whichever is faster.

## 8. What does the exit look like mid-build?

Three things to read in the contract: pro-rata refund of unconsumed budget; immediate code hand-over (working state, not just current snapshot); cancellation notice period (30 days for a build sprint is reasonable; 90 days is hostage-taking). If the SOW is silent on cancellation, push back · ambiguity here is rarely accidental.

## 9. What's the smallest engagement you'd take on?

If the answer is "six-month engagement, six-figure budget", the studio isn't built for the audit / sprint shape · they only sell large contracts and may inflate your scope to fit. A serious studio takes a fixed-price two-week audit because that's where most of their good work originates · a small engagement that proves the team, then expands. "We don't do projects under €100k" is a sales-strategy answer, not an engineering one.

## Putting it together

Run all nine questions on every studio you're talking to. If you get nine confident, specific answers · you have a serious studio. If three or more answers feel evasive, slow down. The cost of a wrong studio choice (eval-less prod system, no AI Act paperwork, weights you don't own) lands in your team six months later, when it's much more expensive to fix.

If you want to run these nine questions on us · the [contact page](/contact) takes a sentence and we reply within 24 hours. The [pricing page](/pricing) has our tiers and FAQ · most of the answers above are already there.

---

Source: https://dfieldsolutions.com/blog/how-to-hire-ai-development-team-2026
Author: Dezső Mező · Founder, DField Solutions
Site: https://dfieldsolutions.com
