26 April 2026·8 min read

Startup Experimentation Engineering management Product

Stop A/B testing the colour of buttons · start A/B testing the size of the team

Most A/B tests are theatre. Here is what actually moves a startup's metrics, and why team-shape is the experiment your competitors are afraid to run.

Last verified26 April 2026

By Dezso MezoFounder, DField Solutions

ShareX LinkedIn#

Every product team I have audited in the last three years has the same shape of experiment backlog. Eighty percent of the queued tests are 'colour of CTA', 'copy on the headline', 'order of these three steps'. The expected lift on each, by the team's own pre-experiment estimate, is 0.5-2 percent. The actual lift, after running, is somewhere between -0.5% and +0.7%, with confidence intervals that touch zero on both sides.

The remaining twenty percent of queued experiments · the ones about team shape, on-call rotation, who owns which feature · are never run. They are scary, political, and the team running them does not know how to measure. So they ship colour swaps and feel productive.

This is a counter take. The experiments that move startup metrics by 10-30% are about team-shape, not pixel-shape. Here are four we have helped clients run. Numbers are real, blurred enough to protect the customer.

Experiment 1 · Halve the product team

A B2B SaaS client, 11 product engineers, weekly velocity 14 story points (their measure). We split into two squads: a 4 person 'core product' squad, and a 7 person 'platform / migration' squad. Six weeks later the 4 person squad shipped twice as much customer facing change as the 11 person team had managed in the prior 6 weeks. The 7 person squad caught up on three years of accrued debt that was silently throttling everything.

The lesson is not 'fire half the team'. The lesson is that a single 11-person team is almost always wrong shape. Two specialised teams of 4-7 do more than one 'aligned' team of 11. The 'team A/B test' here is splitting the team in half by responsibility and seeing which side moves the metric you care about.

Experiment 2 · Move the support team into the engineering org

A consumer app client. Support reported into ops. Engineers shipped a feature, support discovered three weeks later that 8% of users could not get past the onboarding. Reproduction took two weeks because the support team was emotionally and organisationally distant from engineering. Time to fix: 9 weeks total per onboarding bug.

We moved support reporting into engineering, with a daily 15 minute sync between support lead and on-call engineer. Time to fix dropped to 4 days. NPS went from 31 to 47 in one quarter. Net product velocity did not change · the engineers were not 'distracted' by support, they were finally hearing reality.

Experiment 3 · Kill the standing meeting

A 30 person engineering org. 14 weekly standing meetings on the calendar, totalling 9.5 hours per attendee per week. We killed all of them for one month and replaced with: a single Monday 30 minute kickoff per squad, and a Friday async written update.

Output per engineer (PRs merged, story points, your favourite measure) went up 22% in the first month. After two months we re introduced two of the meetings (the ones people genuinely missed). Net: 6 hours per week per engineer reclaimed, no measurable downside.

Experiment 4 · Pair every senior with a junior for 6 weeks

Most teams treat pair programming as a special occasion. We ran a 6 week experiment where every senior had a pinned junior pair, working on the same ticket queue, 4 hours a day. Junior ramp time (time to ship first independent feature) dropped from a 14 week median to 6 weeks. Senior throughput dropped 12% in those 6 weeks but rose 18% afterwards because they stopped fielding 'how do I do X' questions all day.

Why nobody runs these tests

They are political. Splitting a team or moving a reporting line touches egos and contracts. Button colour does not.
They are slow to measure. 6-12 weeks per cycle, vs 2 weeks for a UI A/B test. PMs prefer fast results.
They are not in the standard A/B testing tool. Optimizely will not run an experiment for you on team structure.
They are scary. If team shape A wins by 30%, you have a hard conversation about why you ran shape B for 18 months.

How to actually run a team shape experiment

Pick one metric. Not 5. Customer facing PRs per week, or NPS, or revenue.
Define the change in one sentence. 'We are merging X and Y squads for 8 weeks' or 'support reports into engineering for 8 weeks'.
Pre commit to the duration. No mid experiment changes. 8-12 weeks is usually right.
Snapshot the metric before. Trailing 8 week average, not last week.
At the end · keep, revert, or escalate. Make the call in writing, with the data, in 48 hours.

If your last 10 experiments were all UI tweaks and the company metric did not move, the experiment to run is on yourself · stop optimising the wrong variable. Team shape is the variable.

ShareX LinkedIn#

Dezso Mezo

Founder, DField Solutions

I've shipped production products from fintech to creator-tooling · for startups and enterprises, from Budapest to San Francisco.

ABOUT Let's talk

Keep reading

30 Sept 2026·11 min read

DField Q3 2026 roundup · what shifted, what we shipped, what is broken

Three months in. SZEP 2.0 live, NAV v3 cutover, AI Act enforcement, OWASP LLM Top 10 v2. Hard numbers, one strong opinion on the consulting tier.

Read

01 Jul 2026·11 min read

DField Q2 2026 roundup · what shifted, what we shipped, what is broken

Four months in. Eleven shipped projects, real before/after numbers, one strong opinion on what the consulting tier got wrong this quarter.

Read

26 Apr 2026·11 min read

Postgres BRIN vs. B-tree · when each wins

BRIN is rumoured. Then someone tries it on a random column and concludes it is slow. We benchmarked the right shape · here is what actually wins.

Read

RELATED PROJECTS

AI solutions · Website & online shop · Custom software engineering · 2026AI Chatbot MakerOne-click SaaS to build a company's own AI chatbot · upload a folder, point at a site feed, everything stays on your server.

Cybersecurity · AI solutions · Custom software engineering · 2026PhisGuardAI-powered phishing simulation campaigns for companies · realistic scenarios, live tracking, automated awareness training.

AI solutions · Cybersecurity · Custom software engineering · 2026MCP Security LayerA security layer between an AI agent and its tools · checks every tool call at the intent level, blocks or approves, logs.

Let's talk

Would rather build together?

Let's talk about your project. 30 minutes, no strings.

Let's talk