★ Rated 4.9 by verified clients·Offices in 6 countries·hello@growwithba.comCase StudiesCareersContact
GROWWITHBA

Email A/B Testing: What to Test, in What Order, and How to Read Results

By Arjun Mehta · Updated June 2026 · Email & SMS

Most email 'testing' is coin-flipping with extra steps: tiny samples, opens as the verdict, and no record of what won. Real testing compounds — each result feeds a playbook that makes every future send smarter.

Here's the email testing system: what to test first, how to judge it, and how to keep the learning.

Key takeaways

  • Test in impact order: offers and value propositions, then subject/preview, then structure and CTA, then cosmetics.
  • Privacy-era opens are inflated — judge tests on clicks and revenue per recipient wherever possible.
  • Sample size and patience are the test: small lists need bigger differences and repeated confirmation.
  • The deliverable is a playbook of confirmed patterns — untracked wins are just anecdotes.

The priority ladder

Test big levers before small ones. First: the offer and core message — what you're actually proposing moves results more than how it's dressed. Second: subject line and preview text as a unit, since they gate everything. Third: email structure — long versus short, single CTA versus multiple, plain-text feel versus designed. Fourth: send timing and cadence for your list's rhythm. Last: button colors and image swaps — the classic first tests that belong dead last because their effects are usually noise.

Judge honestly

Pick the success metric before sending: clicks for engagement tests, conversion or revenue per recipient for offer tests — opens only for subject tests, and even then with privacy-inflation skepticism. Split randomly, test one variable, and size samples realistically: small lists detecting small differences is statistics fiction, so test bigger swings, or run the same test across multiple sends and look for consistency. A result that flips on rerun was never a result. Flows deserve testing too — welcome and cart sequences accumulate volume that one-off campaigns can't.

Bank the learning

Keep a simple test log: hypothesis, variants, sample, metric, result, decision. Patterns emerge across entries — your list's preference for direct subjects, short emails, Tuesday sends, whatever the data keeps saying — and that becomes the house playbook new campaigns start from instead of re-litigating settled questions. Re-test foundational findings yearly; lists evolve. The compounding is the point: programs that log tests get smarter every quarter, programs that don't run the same experiments forever.

Frequently asked questions

What should I test first in email?

The thing furthest from certain with the biggest stakes — usually the offer framing or core value proposition, not the subject line everyone defaults to.

How big a sample do I need for a valid email test?

Enough that the winner's margin couldn't plausibly be luck — thousands per variant for modest differences. Small lists: test bold differences and confirm across repeated sends.

Are opens useless for testing now?

Not useless — directional, and still the right metric for subject-line tests. Just confirm meaningful decisions with click and revenue data, since privacy proxies inflate opens unevenly.