Essay
Kill Criteria Up Front
- leadership
- intellectual honesty
- zero-to-one
- executive operating
The day we killed the Semantic Router at Simple Cortex, nobody was surprised. The gate had been 90 percent accuracy. The number came back at 77.5. We had written both numbers down at the start and agreed what each one would mean. When the eval landed, the conversation lasted minutes, not weeks. We shipped the postmortem and moved the resources.
The hardest conversation on a project is the one held at the end. The cheapest version is the one you hold at the beginning instead.
The default mode
I have watched the same pattern play out on team after team. The work ships. Sentiment turns. Someone, usually a quarter too late, asks the question that should have been asked on day one: is this working.
By then the question is political. Someone owns the project. Someone staked their roadmap on it. Someone is up for promotion on the back of it. The discussion stops being about evidence and starts being about face. The team that survives the cull learns the wrong lesson: keep your head down and never name your risk out loud.
Naming the off-ramp at the start removes the politics from the moment that actually matters. The decision becomes the plan you agreed to, not a verdict on the people who built the thing.
What a kill criterion actually says
In writing, before the work starts: we will ship at X. We will stop at Y. Here is what would tell us either.
Numbers and qualitative signals, both. "90 percent accuracy on the eval set" is a number. "Operators are changing their daily behavior because of this" is a signal. Real criteria use both, because real decisions use both.
The artifact is short. Two or three lines next to the goal in the project brief. The team reads it on day one and again at every checkpoint. It is the answer to the only question that matters at a review: what would change our minds.
Why people resist it
Naming the conditions for failure feels like inviting failure. Sponsors worry it telegraphs a lack of confidence. Builders worry it gives the org an excuse to cut their work. Both reactions are common and both are wrong.
A kill criterion is a vote of confidence in the team's ability to read evidence. A team without one has decided, implicitly, that the only signal it trusts is sunk cost.
Two from Simple Cortex
The Semantic Router was killed at 77.5 percent against a 90 percent gate. We did not redefine the metric or move the goalposts when the number came back. We shipped the postmortem, archived the work, and pointed the resources elsewhere.
ClawDeck was archived with a public postmortem. The archive is the artifact. Future hires, partners, and customers can read it and see how this team handles negative results: in public, on the record. It tells you more about the operating culture than any case study would.
Negative results are an asset. Name the conditions for shipping and the conditions for stopping at the start.
Nibbble: criteria before revenue
Nibbble is the live test case. Project started January 26, 2026. Production v1 launched April 22, 2026 at app.nibbble.io. As of May 10, 2026, zero paying customers. Two beta tester restaurants are in the loop. A customer council of four restaurants is reading the work.
For a multi-tenant SaaS at this stage, the kill criteria are not zero versus one customer. One early customer would not prove the model and a zero count does not disprove it. The bar I am holding the work to is behavior change: are operators doing something different in their day because of the product. The two active testers are where that becomes observable. The council is where it gets corroborated or contradicted.
The criteria were named at the start. When the signals come in, we will know what to do with them. The honesty is the asset.
Where this matters most
Zero-to-one is where kill criteria pay the highest dividends. The data is the noisiest. The politics are the youngest. The cost of carrying dead weight is the highest. At Samsung Ads we co-founded a new ad business inside a hardware company as a four-person team, and the first year generated $20M in profit. That is the number people remember. The number behind it is the count of features and bets we walked away from when the evidence said so. That is why the work that did ship had room to grow.
A leader who has never killed anything has not led at the scale where this matters. The hiring question is not "what have you shipped." It is "what did you kill, what was the gate, where did the resources go."
The bar
The work that survives kill criteria is the work worth defending. The work that does not survive becomes a postmortem and a faster path to the next bet. Both outcomes are wins.
Name the off-ramp before you start. Hold the team to it without flinching. Let the evidence carry the weight that hierarchy otherwise has to.
That is the part I would not run an org without.
