Why you should use Feature Toggles with Terraform

Introduction

If you come from a traditional software development background, you're likely familiar with using feature flags or toggles.

Put simply, a feature toggle acts as a switch to turn a specific feature on or off. It allows code to be released into production without activating it immediately—or only under certain conditions.

This approach enables gradual rollouts, making it possible to introduce changes incrementally and minimize risk. It also supports quick rollbacks, allowing you to disable problematic features without redeploying.

Additionally, feature toggles are commonly used for A/B testing scenarios, where new functionality is enabled for a subset of users to compare their responses with those of a control group.

However, the context is quite different when working with Terraform, an IaC language. So, why should you consider using feature flags with Terraform?

The reasons are flexibility & backwards compatibility. Let's see how we can put them to work.

Toggling with count meta-argument

A fundamental feature toggle in HCL looks as follows.

variable "enable_feature" {
  type    = bool 
  default = false
}

resource "azurerm_resource_group" "rg" {
  count = var.enable_feature ? 1 : 0

  name     = "rg-feature-toggle-demo"
  location = "switzerlandnorth"
}

Here, the meta-argument count is used to define how many instances of a resource the provider should create. In our scenario, we combine it with a conditional expression condition ? true_value : false_value. So, when the variable enable_feature becomes true then a single instance (1) of the resource gets created.

💡
You can enable this demo feature from the command line like so: terraform apply -var enable_feature=true

This pattern is peculiar when we want to reference resources to each other. For example, the following code won't work!

variable "enable_feature" {
  type    = bool 
  default = false
}

resource "azurerm_resource_group" "rg" {
  count = var.enable_feature ? 1 : 0

  name     = "rg-feature-toggle-demo"
  location = "switzerlandnorth"
}

resource "azurerm_network_security_group" "nsg" {
  count = var.enable_nsg ? 1 : 0

  name                = "nsg-feature-toggle-demo"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
}

Trying to apply this configuration will result in the following error.

│ Error: Missing resource instance key
│ 
│   on main.tf line 39, in resource "azurerm_network_security_group" "nsg":
│   39:   resource_group_name = azurerm_resource_group.rg.name
│ 
│ Because azurerm_resource_group.rg has "count" set, its attributes must be accessed on specific instances.
│ 
│ For example, to correlate with indices of a referring resource, use:
│     azurerm_resource_group.rg[count.index]

Since the count meta-argument creates instances, we need to reference the specific resource by its key. So this will fix it.

variable "enable_feature" {
  type    = bool
  default = false
}

resource "azurerm_resource_group" "rg" {
  count = var.enable_feature ? 1 : 0

  name     = "rg-feature-toggle-demo"
  location = var.location
}

resource "azurerm_network_security_group" "nsg" {
  count = var.enable_feature ? 1 : 0

  name                = "nsg-foobar"
  resource_group_name = azurerm_resource_group.rg[0].name
  location            = azurerm_resource_group.rg[0].location
}

Using for_each with a map

An alternative approach to count is using the for_each argument. Here is an example.

variable "enable_feature" {
  type    = bool
  default = false
}

resource "azurerm_resource_group" "rg" {
  for_each = var.enable_feature ? { "enabled" = "enabled" } : {}

  name     = "rg-feature-toggle-demo"
  location = "switzerlandnorth"
}

resource "azurerm_network_security_group" "nsg" {
  for_each = var.enable_feature ? { "enabled" = "enabled" } : {}

  name                = "nsg-foobar"
  resource_group_name = azurerm_resource_group.rg["enabled"].name
  location            = azurerm_resource_group.rg["enabled"].location
}

Again, we are using a conditional expression var.enable_feature ? { "enabled" = "enabled" } : {}. If the expression becomes true a map with a single element is returned (the map has a key named enabled with a value of enabled), that for_each can iterate. If the expression becomes false an empty map is returned.

The version above can be slightly optimized for better readability.

resource "azurerm_resource_group" "rg" {
  for_each = var.enable_feature ? { "enabled" = "enabled" } : {}

  name     = "rg-feature-toggle-demo"
  location = "switzerlandnorth"
}

resource "azurerm_network_security_group" "nsg" {
  for_each = var.enable_feature ? { "enabled" = azurerm_resource_group.rg["enabled"] } : {}

  name                = "nsg-foobar"
  resource_group_name = each.value.name
  location            = each.value.location
}

This time, we directly assign the referenced value to a key named enabled and access its attributes by the each object.

Toggling specific arguments

This time, we don't want to toggle the entire resource; we only wish to switch a specific attribute on and off.

In the example below, again, we are using a conditional expression that returns the desired map of Azure tags we'd like to assign in case var.enable_tags is true, otherwise, we assign null.

variable "enable_tags" {
  type    = bool
  default = false
}

resource "azurerm_resource_group" "rg" {
  name     = "rg-feature-toggle-demo"
  location = "switzerlandnorth"
}

resource "azurerm_network_security_group" "nsg" {
  name                = "nsg-foobar"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location

  tags = var.enable_tags ? { environment = "dev" } : null
}
💡
Please note, that this only works for optional arguments, because we can't assign null for required arguments!

Toggling features within modules

Consider a fictional scenario where you'd like to create a vnet including subnet and dynamically toggle the creation of a network security group.

terraform {
... 
}

provider "azurerm" {
...
}

module "vnet" {
  source = "./vnet_module"

  enable_nsg = false
  
  ... Potentially more useful attributes here
}

The root module referecing the child

Nothing stops us from using the same count construct within a child module.

variable "enable_nsg" {
  type    = bool
  default = false
}

resource "azurerm_resource_group" "vnet" {
  name     = "rg-feature-toggle-demo"
  location = "switzerlandnorth"
}

resource "azurerm_virtual_network" "vnet" {
  name                = "vnet-foobar"
  resource_group_name = azurerm_resource_group.vnet.name
  location            = azurerm_resource_group.vnet.location

  address_space = ["10.0.0.0/16"]
}

resource "azurerm_subnet" "snet1" {
  name                 = "snet1"
  resource_group_name  = azurerm_resource_group.vnet.name
  virtual_network_name = azurerm_virtual_network.vnet.name
  address_prefixes     = ["10.0.1.0/24"]
}

resource "azurerm_network_security_group" "nsg" {
  count = var.enable_nsg ? 1 : 0

  name                = "nsg-foobar"
  resource_group_name = azurerm_resource_group.vnet.name
  location            = "switzerlandnorth"
}

resource "azurerm_subnet_network_security_group_association" "example" {
  count = var.enable_nsg ? 1 : 0

  subnet_id                 = azurerm_subnet.snet1.id
  network_security_group_id = azurerm_network_security_group.nsg[0].id
}

A simple child module

Again, we use the count meta-argument to dynamically create the network security group and its subnet association.

Another benefit that comes with feature toggles

Besides the flexibility that comes with this toggle, there is another benefit, which might not be so obvious - backwards compatibility.

💡
Feature toggles can be used to provide backwards compatibility in your child modules.

Consider the case, where multiple root modules are using your vnet child module. That's what we write modules for, right? You might not even know, how many root modules in the enterprise are relaying on your shiny vnet module.

But still, you need to carry on and further enhance the module with a new feature, let's say you decide new vnets should use an edge zone. When looking at the documentation, you read ...

edge_zone - (Optional) Specifies the Edge Zone within the Azure Region where this Virtual Network should exist. Changing this forces a new Virtual Network to be created.

If we simply add the attribute to our child, the next terraform apply will re-create the vnet, which is not always what we want.

Terraform will perform the following actions:

  # module.vnet.azurerm_virtual_network.vnet must be replaced
-/+ resource "azurerm_virtual_network" "vnet" {
      ~ dns_servers             = [] -> (known after apply)
      + edge_zone               = "switzerlandnorth" # forces replacement
    ...
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Instead we can toggle the desired attribute, provide a default value of false to the toggle variable, and don't have to worry that other users of the root module will have to recreate their resources.

variable "enable_edge_zone" {
  type    = bool
  default = false
}

...

resource "azurerm_virtual_network" "vnet" {
  name                = "vnet-foobar"
  resource_group_name = azurerm_resource_group.vnet.name
  location            = azurerm_resource_group.vnet.location

  edge_zone     = var.enable_edge_zone ? "switzerlandnorth" : null
  address_space = ["10.0.0.0/16"]
}

Summary

  • Feature toggles with Terraform provide flexibility but also provide backwards compatibility for child modules
  • We can use both count and for_each constructs to realize feature toggles
  • I prefer the count version since it enhances readability
  • The count meta-argument creates instances of resources, and therefor, when referenced by other resources, needs to be access by its index

That's it for today, thanks for reading. 😎

Further reading

Feature Toggles, Blue-Green Deployments & Canary Tests with Terraform
In this post, we demonstrate some approaches to feature toggling, blue-green deployment, and canary testing of Terraform resources to mitigate impact to production infrastructure.
The count Meta-Argument - Configuration Language | Terraform | HashiCorp Developer
Count helps you efficiently manage nearly identical infrastructure resources without writing a separate block for each one.
Conditional Expressions - Configuration Language | Terraform | HashiCorp Developer
Conditional expressions select one of two values. You can use them to define defaults to replace invalid values.
The for_each Meta-Argument - Configuration Language | Terraform | HashiCorp Developer
The for_each meta-argument allows you to manage similar infrastructure resources without writing a separate block for each one.