Feature toggles are technical debt, but still great (Guest Blog)
Are there really any development teams that haven’t experienced the pressure of delivering “more” or quicker to production? In this case “More” often means functionality or new customer facing features. Developer efficiency is more important than ever, and feature toggles allow increased speed to market. Feature toggles allow a developer team to decouple release to product from the release to users. This has clear benefits to the development team on how they release a new feature:
- First the new feature is tested by the team themselves, in production
- Then the newly developed code to be tested only by a small subset of targeted users. This could be early beta users.
- The new feature may also be released to a small subset of the users, say 1%. In case of any issues, the issue will only impact these 1% and not all users
All of this is great. So what’s the challenge? First, as soon as a new feature is verified, that feature toggle represents technical debt that needs to be removed and cleaned up. Second, using feature toggles in production increases the execution paths in the code. Feature toggles do increases the complexity. This is another strong reason to consider feature toggles as technical debt.
Different types/categories of feature toggles
Based upon the use case, Feature toggles are usually put into one of the following categories:
- Release toggle - Feature toggles that is used to support roll-out of new features. The new feature is encapsulated by a feature toggle and activation strategies such as Gradual roll-out (or Canary release) or targeted roll-out is applied. A gradual roll-out strategy means only a random percentage will experience the new feature. A targeted roll-out will expose the new feature to only a predefined list of users. The purpose is to reduce time to market and not to increase the risk of introducing issues when releasing new code
- Experimental - experimental feature toggles is used to deliver 2 or more variants of a new feature to a user. This is often referred to A/B testing.
- Operational - operational feature toggles are either feature toggles that control platform specific changes or more long-lived feature toggles that encapsulates a flaky part of the applications. The best known pattern for long-lived operational toggle is the Kill switch pattern. Example of such a flaky part is a 3rd party integration. The purpose is that the team may gracefully degrade the application in case of issues.
Expected lifetime of a feature toggle
Since feature toggles add to the system complexity, you want to monitor their lifetime. Unleash is an open source feature toggle system that was first released in 2015. Over the years, Unleash has been able to work with data and gather insight in usage of feature toggles and if teams do remove the feature toggles or not.
Looking at the number of days a feature toggle is live in the system, the histogram above shows that there is a huge variation. Fortunately, many teams are good at cleaning up, and a huge number of feature toggles are cleaned up within the first 50 days after creation. The histogram also shows that there is also a huge number of feature toggles that are forgotten and still exist in the system after 425 days.
Release toggles tend to have a life-time of a few weeks or less. In some cases they might live for more than a month, but in our experience this is more seldom. We also see that removing a release toggle is more often forgotten by teams. Based on our studies - most release toggles can be deprecated after 40 days.
The expected life-time of an experimental toggle is more unclear. The life-time does depend on the type of experiment and available amount of traffic. To get a statistically valid experiment, the dataset is of most importance. From our observations, these experiments tend to live for weeks and sometimes even months. We advise against experiments that are active for more than some weeks. If you don’t get a clear indication after a couple of weeks, you should reconsider the configuration of the experiment. For this reason, our recommendation is also that an experimental feature toggle should be deprecated within the first 40 days.
It is in the operational toggle category where we see the largest variance in the expected lifetime of the feature toggles. As with the Release category, they are usually introduced as part of a release of a new feature and they should be removed from the code as soon as the DevOps team gets confident that there are no issues. Still, quite often we do see that the DevOps team decides to keep some of the Operational toggles for a long period of time, sometimes even permanent. From our experience, this decision makes sense.
Feature toggles tech debt best practices
What are the suggested best practices working with feature toggles? Research done by North Carolina State University shows that a recommended best practice working with feature toggles is to use a feature toggle management system. The Feature toggle management system should provide a management UI that easily allows the team to get overview of the feature toggles in the system.
In order to handle feature toggle technical debt, the team is advised to early consider what release strategy that makes sense already part of the planning. This requires the team to consider the expected risk for issues that is expected when the new feature is released. Together with Product marketing management, the team should also understand if the release requires special attention from a product marketing point of view. These elements will provide an indication of the expected lifetime of the feature toggle.
The team should also consider the Definition-of-done (DoD). For the DevOps team, DoD usually is when the code is shipped to production. We argue that the DoD should include a step to remove the Feature toggle from the source code.
The teams should also start to apply functionality in the feature toggle system to stay on top of deprecated feature toggles. Unleash does provide a technical debt dashboard that displays the state of all toggles in your system.
Unleash also identifies all toggles that are ready to be deprecated, and allows you to mark them as “Stale”. This indicates that the feature toggle is now ready to be removed from the source code. When a feature toggle is marked as stale, a stale-event is sent in the system. The team is advised to use this event to post a message to their favorite chat tool (.e.g. Slack) or to send a signal to the CI/CD deploy pipeline to provide a build warning or even break the build.
In this blog-post we have seen that feature toggles are great to decrease time-to-market and still keep risk for issues at a minimum. We have also seen how feature toggles are technical debt, and carefully should be removed after it has served its purpose. By carefully choosing feature toggle management system, the system will support you in both getting an overview of the deprecated feature toggles as well as cleaning up the technical debt from your source code.
About the Guest Blogger
Egil Østhus is the CEO of Bricks Software AS, the company behind the open source feature management system Unleash. Egil has held various management positions within the software industry. Egil is passionate about developer efficiency, in particular supporting software development teams and organizations in efficient product development.
CodeScene is the tool to manage technical debt and build efficient development teams. Fully automated, and easy to set up a free account or a paid plan for larger projects. It is free for all open source repositories and (very) affordable for closed source projects.