Feature-Flag-Driven Development

Feature flags are not technical debt by default, but in this codebase they became debt disguised as a feature.

They did not start as a problem. They started as the responsible thing to do.

New behavior? Put it behind a flag. New integration? Flag it. New HTTP header on an existing endpoint? Obviously, flag it.

At first, that felt like discipline. Smaller changes, gradual rollouts, one switch to turn things off when production disagreed with us. After a few burns, that kind of control is hard to argue with.

Then the flags stopped living at the edges.

The wrappers around the SDK grew assumptions nobody really questioned: there is always a session, there is always a user ID, every request belongs to a customer we can target. That made sense for logged-in product flows. It did not make sense for everything else.

Public pages did not fit. Health checks did not fit. Cron jobs and service-to-service calls did not fit. Anything session-less had become architecturally inconvenient.

So we created another path.

Instead of evaluating those cases through the official provider, we introduced JSON blobs in buckets, hand-rolled evaluation logic, and just enough glue code to make it feel like the same idea. Now there were two ways to say if (flag): the SaaS provider for logged-in paths, and a homegrown JSON setup for everything else. When the first one failed, it could fall back to the second. Fallbacks for fallbacks. More control planes than behavior.

At some point, someone started calling it feature-flag-driven development. FFDD.

It got a laugh because it was accurate. The architecture was no longer deciding where flags belonged. The existence of flags was deciding how the architecture evolved.

Feature flags are still not the villain. Kill switches have saved us. Rollout controls have turned bad integrations into minor blips instead of full rollbacks. When the dashboard is red, flipping a single flag and watching the error rate drop does feel like magic.

We even used flags to steer Sentry.

The requirement was reasonable: keep full-fidelity telemetry for internal and high-priority cohorts, then reduce sampling for everyone else when traffic or error volume spiked. That is a valid runtime control. The mistake was where we put it.

Instead of treating observability as its own control plane, we routed it through the same per-user flag stack that already assumed sessions, user IDs, and JSON fallbacks. The question “how much reality should we observe right now?” depended on the same machinery we were already struggling to reason about.

That is where the problem starts: when the magic becomes the default answer.

That is the cargo-cult version of feature flags: keeping the ceremony of safe delivery, while losing the judgment about which decisions actually belong behind a runtime switch.

At some point, the question stops being “where should this decision live?” and quietly becomes “where do we put the flag?” Flags move from safety net to dependency. They start deciding which endpoints get which headers, how observability is configured, which services talk to which backends, and which parts of the system are allowed to exist without a logged-in user.

Each new flag still feels cheap. One more toggle. One more escape hatch. One more “temporary” branch around an uncomfortable design decision.

The cost arrives later.

Flag rot and ghost branches

Most flags do not die. They just stop changing.

Some were created as “temporary” safety nets and then quietly stayed in place for years. The UI still shows them. The code still branches on them. The JSON still carries a default value “just in case”. The only thing that disappeared was the original context.

Pinned at 100% in production, they look harmless. The “off” path is never used, nothing is on fire, and leaving an extra if feels cheaper than proving you can delete it.

The cost is not in the if. It is in everything that has to pretend that dead branch might wake up again.

Incidents take longer, because nobody is sure whether a bug is in the active path or some forgotten combination of stale defaults and fallbacks. Refactors are heavier, because touching a module means understanding what happens if a flag that has not moved in three years suddenly flips. Even code review slows down, because you cannot read a file and instantly see which behavior is real and which is legacy.

A flag that has been stuck at 100% for years is not a toggle anymore. It is an if statement pretending to have a choice. One branch is real; the other is a ghost. The file does not tell you which is which.

On the frontend, those ghosts ship all the way to the browser. The bundle still carries UI states no one will ever see. On the backend, they survive as alternate flows for data shapes no caller sends anymore, but which nobody feels safe removing.

In a clean codebase, dead code is embarrassing but simple. You delete the unused function and move on. In a flag-driven codebase, dead code is social. Deleting a branch means deleting a story: who added this, what incident did it prevent, which dashboard is still wired to that boolean. That social cost is enough to keep a lot of ghost logic around.

It only takes a few years of “just in case” to get there. Every temporary branch that never got removed is still waiting for someone to prove, beyond doubt, that it can be deleted. That proof is expensive. Leaving the branch in feels free. That is how the rot spreads.

The interest you actually pay

By this point, the debt is not philosophical. It shows up as time, attention, and coordination.

Time first. Incidents take longer when there are multiple places a behavior might be controlled. Rollouts need more QA permutations when the system can be in “flag on with JSON off”, “flag off with JSON on”, or “both on but different values” states. Even simple changes come with an invisible surcharge: how many flags touch this area, and what does each environment think they should do?

Attention next. Over-flagged systems train engineers to distrust what the code says. You cannot glance at a function and know how it behaves in production, because there is always a chance that a flag or fallback is quietly changing the outcome. You have to stop and ask: what is the configuration right now, in this environment, for this path?

Then coordination.

If every team can add flags but no one feels responsible for retiring them, the backlog of “temporary” toggles only grows. New joiners learn that the safest move is to add one more flag, not to remove an old one. Product gets used to asking for switches instead of decisions. Operations inherits a flag system that looks like a nervous system diagram with no legend.

The platform is sold as a productivity tool. The dashboards are clean. The rollout graphs are pretty. The ability to “turn things off” is marketed as resilience.

The missing line item is the budget for cleanup.

If that work is not explicitly owned, it does not happen. The interest quietly compounds. Eventually you spend more engineering hours navigating the control plane than improving the product it was supposed to control.

Using flags on purpose

If feature-flag-driven development is what happens when every problem looks like a toggle, the antidote is not to swear off flags. It is to be more honest about which decisions actually deserve to be runtime-switchable.

Most of them do not.

In practice, flags earn their keep in three situations:

rolling out a risky change that you truly want to ramp from 1% to 100% without redeploying
running an experiment that needs a clean way to split and track traffic
wiring a very specific kill switch for a dependency that can take production down

Those all have something in common: they are temporary. The flag exists to get you from one state of the system to another, more permanent one. When you reach that state, the flag should go away.

Everything else is a candidate for a different answer.

Which endpoints care about a header? That is protocol and server logic. How often Sentry samples events? That is observability configuration; it might deserve a runtime knob, but not in the same per-user flag stack as product behavior. Whether a public page can render without a session? That is product design.

Flags can sit on top of those decisions. They should not be the place those decisions live.

Feature flags are too useful to give up. They let us move faster and recover more gracefully than we could with code and config alone.

The point is not to stop using them. The point is to stop letting them quietly choose the shape of the system.