Why you should use Feature Toggles with Terraform
Introduction
If you come from a traditional software development background, you're likely familiar with using feature flags or toggles.
Put simply, a feature toggle acts as a switch to turn a specific feature on or off. It allows code to be released into production without activating it immediately—or only under certain conditions.
This approach enables gradual rollouts, making it possible to introduce changes incrementally and minimize risk. It also supports quick rollbacks, allowing you to disable problematic features without redeploying.
Additionally, feature toggles are commonly used for A/B testing scenarios, where new functionality is enabled for a subset of users to compare their responses with those of a control group.
However, the context is quite different when working with Terraform, an IaC language. So, why should you consider using feature flags with Terraform?
The reasons are flexibility & backwards compatibility. Let's see how we can put them to work.
Toggling with count
meta-argument
A fundamental feature toggle in HCL looks as follows.
variable "enable_feature" {
type = bool
default = false
}
resource "azurerm_resource_group" "rg" {
count = var.enable_feature ? 1 : 0
name = "rg-feature-toggle-demo"
location = "switzerlandnorth"
}
Here, the meta-argument count
is used to define how many instances of a resource the provider should create. In our scenario, we combine it with a conditional expression condition ? true_value : false_value
. So, when the variable enable_feature
becomes true
then a single instance (1
) of the resource gets created.
terraform apply -var enable_feature=true
This pattern is peculiar when we want to reference resources to each other. For example, the following code won't work!
variable "enable_feature" {
type = bool
default = false
}
resource "azurerm_resource_group" "rg" {
count = var.enable_feature ? 1 : 0
name = "rg-feature-toggle-demo"
location = "switzerlandnorth"
}
resource "azurerm_network_security_group" "nsg" {
count = var.enable_nsg ? 1 : 0
name = "nsg-feature-toggle-demo"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
}
Trying to apply this configuration will result in the following error.
│ Error: Missing resource instance key
│
│ on main.tf line 39, in resource "azurerm_network_security_group" "nsg":
│ 39: resource_group_name = azurerm_resource_group.rg.name
│
│ Because azurerm_resource_group.rg has "count" set, its attributes must be accessed on specific instances.
│
│ For example, to correlate with indices of a referring resource, use:
│ azurerm_resource_group.rg[count.index]
Since the count
meta-argument creates instances, we need to reference the specific resource by its key. So this will fix it.
variable "enable_feature" {
type = bool
default = false
}
resource "azurerm_resource_group" "rg" {
count = var.enable_feature ? 1 : 0
name = "rg-feature-toggle-demo"
location = var.location
}
resource "azurerm_network_security_group" "nsg" {
count = var.enable_feature ? 1 : 0
name = "nsg-foobar"
resource_group_name = azurerm_resource_group.rg[0].name
location = azurerm_resource_group.rg[0].location
}
Using for_each
with a map
An alternative approach to count
is using the for_each
argument. Here is an example.
variable "enable_feature" {
type = bool
default = false
}
resource "azurerm_resource_group" "rg" {
for_each = var.enable_feature ? { "enabled" = "enabled" } : {}
name = "rg-feature-toggle-demo"
location = "switzerlandnorth"
}
resource "azurerm_network_security_group" "nsg" {
for_each = var.enable_feature ? { "enabled" = "enabled" } : {}
name = "nsg-foobar"
resource_group_name = azurerm_resource_group.rg["enabled"].name
location = azurerm_resource_group.rg["enabled"].location
}
Again, we are using a conditional expression var.enable_feature ? { "enabled" = "enabled" } : {}
. If the expression becomes true
a map with a single element is returned (the map has a key named enabled
with a value of enabled
), that for_each
can iterate. If the expression becomes false
an empty map is returned.
The version above can be slightly optimized for better readability.
resource "azurerm_resource_group" "rg" {
for_each = var.enable_feature ? { "enabled" = "enabled" } : {}
name = "rg-feature-toggle-demo"
location = "switzerlandnorth"
}
resource "azurerm_network_security_group" "nsg" {
for_each = var.enable_feature ? { "enabled" = azurerm_resource_group.rg["enabled"] } : {}
name = "nsg-foobar"
resource_group_name = each.value.name
location = each.value.location
}
This time, we directly assign the referenced value to a key named enabled
and access its attributes by the each
object.
Toggling specific arguments
This time, we don't want to toggle the entire resource; we only wish to switch a specific attribute on and off.
In the example below, again, we are using a conditional expression that returns the desired map of Azure tags we'd like to assign in case var.enable_tags
is true
, otherwise, we assign null
.
variable "enable_tags" {
type = bool
default = false
}
resource "azurerm_resource_group" "rg" {
name = "rg-feature-toggle-demo"
location = "switzerlandnorth"
}
resource "azurerm_network_security_group" "nsg" {
name = "nsg-foobar"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
tags = var.enable_tags ? { environment = "dev" } : null
}
null
for required arguments!Toggling features within modules
Consider a fictional scenario where you'd like to create a vnet including subnet and dynamically toggle the creation of a network security group.
Nothing stops us from using the same count
construct within a child module.
Again, we use the count
meta-argument to dynamically create the network security group and its subnet association.
Another benefit that comes with feature toggles
Besides the flexibility that comes with this toggle, there is another benefit, which might not be so obvious - backwards compatibility.
Consider the case, where multiple root modules are using your vnet child module. That's what we write modules for, right? You might not even know, how many root modules in the enterprise are relaying on your shiny vnet module.
But still, you need to carry on and further enhance the module with a new feature, let's say you decide new vnets should use an edge zone
. When looking at the documentation, you read ...
edge_zone - (Optional) Specifies the Edge Zone within the Azure Region where this Virtual Network should exist. Changing this forces a new Virtual Network to be created.
If we simply add the attribute to our child, the next terraform apply
will re-create the vnet, which is not always what we want.
Terraform will perform the following actions:
# module.vnet.azurerm_virtual_network.vnet must be replaced
-/+ resource "azurerm_virtual_network" "vnet" {
~ dns_servers = [] -> (known after apply)
+ edge_zone = "switzerlandnorth" # forces replacement
...
}
Plan: 1 to add, 0 to change, 1 to destroy.
Instead we can toggle the desired attribute, provide a default value of false
to the toggle variable, and don't have to worry that other users of the root module will have to recreate their resources.
variable "enable_edge_zone" {
type = bool
default = false
}
...
resource "azurerm_virtual_network" "vnet" {
name = "vnet-foobar"
resource_group_name = azurerm_resource_group.vnet.name
location = azurerm_resource_group.vnet.location
edge_zone = var.enable_edge_zone ? "switzerlandnorth" : null
address_space = ["10.0.0.0/16"]
}
Summary
- Feature toggles with Terraform provide flexibility but also provide backwards compatibility for child modules
- We can use both
count
andfor_each
constructs to realize feature toggles - I prefer the
count
version since it enhances readability - The
count
meta-argument creates instances of resources, and therefor, when referenced by other resources, needs to be access by its index
That's it for today, thanks for reading. 😎