Automate Rotating Credentials using Terraform

At Mixpanel, keeping your data secure is of the utmost importance. We strictly adhere to security best practices, including rotating credentials often. Anyone who has to rotate credentials periodically knows that, without automation, it can be a super time consuming task. This article will discuss our novel approach to automating credential rotations using terraform.

Rotation == Toil

Without any automation in place, rotating credentials is pure toil. You have to have an engineer go in and create a new credential and then update everything that uses it regularly. If the credentials don’t expire, you don’t have a strong incentive to be diligent. If they do expire, you are deliberately scheduling a future outage. Obviously, both of these options are bad. As an engineer on the DevInfra team at Mixpanel, one of my main enemies is toil. So any kind of “oh, we should manually update a dozen or more secrets with new credentials every month” is just antithetical to my very role.

So let’s start working on how we can automate this process.

Terraform

Terraform is the bog standard way of automating anything involving cloud platforms, so it seems like an obvious choice for this stuff. Let’s ensure that, though, so we’ll start by setting up an example for testing. We run on GCP, so GCP service accounts are commonly used for giving services access to various GCP resources, as well as for allowing external services to reach into our GCP stuff on our behalf.

We’ll create a simple service account:

resource "google_service_account" "rotation-test" {
account_id = "rotation-test"
display_name = "Rotation Test Account"
}

and create a service account key for that account

resource "google_service_account_key" "rotation-test" {
service_account_id = google_service_account.rotation-test.id
}

Run terraform plan on this, and you get:

# google_service_account.rotation-test will be created
+ resource "google_service_account" "rotation-test" {
+ account_id = "rotation-test"
+ disabled = false
+ display_name = "Rotation Test Account"
+ email = (known after apply)
+ id = (known after apply)
+ member = (known after apply)
+ name = (known after apply)
+ project = "mixpanel-tools"
+ unique_id = (known after apply) }# google_service_account_key.rotation-test will be created

+ resource "google_service_account_key" "rotation-test" {

+ id = (known after apply)
+ key_algorithm = "KEY_ALG_RSA_2048"
+ name = (known after apply)
+ private_key = (sensitive value)
+ private_key_type = "TYPE_GOOGLE_CREDENTIALS_FILE"
+ public_key = (known after apply)
+ public_key_type = "TYPE_X509_PEM_FILE"
+ service_account_id = (known after apply)
+ valid_after = (known after apply)
+ valid_before = (known after apply) }

Plan: 2 to add, 0 to change, 0 to destroy.

So we can apply this and export that private_key field to somewhere and we’ve got our service account. We’ll go ahead and throw that in now:

resource "google_secret_manager_secret" "rotation-test" {
secret_id = "rotation-test-key" replication { auto {} }}

resource "google_secret_manager_secret_version" "rotation-test" {

secret = google_secret_manager_secret.rotation-test.id secret_data = google_service_account_key.rotation-test.private_key

}

And after planning and applying, we have a service account with a key, with the credentials loaded into GCP secrets manager. But this key is static, and we need to rotate it. Simple thing first, we can just taint the service account key in terraform

terraform taint google_service_account_key.rotation-test

Plan that and apply it:

google_service_account_key.rotation-test: Destroying... [id=projects/mixpanel-tools/serviceAccounts/rotation-test@mixpanel-tools.iam.gserviceaccount.com/keys/e7e71f25bfa52b39bef5273387e1f1be5b218b02]google_secret_manager_secret.rotation-test: Creating...google_service_account_key.rotation-test: Destruction complete after 0sgoogle_service_account_key.rotation-test: Creating...

google_secret_manager_secret.rotation-test: Creation complete after 0s [id=projects/mixpanel-tools/secrets/rotation-test-key]

google_service_account_key.rotation-test: Creation complete after 0s [id=projects/mixpanel-tools/serviceAccounts/rotation-test@mixpanel-tools.iam.gserviceaccount.com/keys/3dd8bd10e9a34ca4b0b115caef84311f5151ba31]google_secret_manager_secret_version.rotation-test: Creating...

google_secret_manager_secret_version.rotation-test: Creation complete after 1s [id=projects/839233470602/secrets/rotation-test-key/versions/1]

and we’ve got a new key. But this is manual effort, so how can we automate this? Well, Terraform has a bunch of providers, including some that aren’t actually managing any kind of external resources, like the Time provider, which has a time_rotating resource. Reading the description, it seems ideal, so let’s try it out (with a very fast rotation, so I don’t spend weeks/months writing this post)

resource "time_rotating" "rotation-test" { rotation_minutes = 5}

resource "google_service_account_key" "rotation-test" {

service_account_id = google_service_account.rotation-test.id keepers = { rotation = time_rotating.rotation-test.id }

}

Applying this will immediately trigger the creation of a new service account key and update the secret with the new value

google_secret_manager_secret_version.rotation-test: Destroying... [id=projects/839233470602/secrets/rotation-test-key/versions/1]google_secret_manager_secret_version.rotation-test: Destruction complete after 0s

google_service_account_key.rotation-test: Destroying... [id=projects/mixpanel-tools/serviceAccounts/rotation-test@mixpanel-tools.iam.gserviceaccount.com/keys/3dd8bd10e9a34ca4b0b115caef84311f5151ba31]

google_service_account_key.rotation-test: Destruction complete after 0stime_rotating.rotation-test: Creating...

time_rotating.rotation-test: Creation complete after 0s [id=2024-07-24T20:12:33Z]

google_service_account_key.rotation-test: Creating...

google_service_account_key.rotation-test: Creation complete after 1s [id=projects/mixpanel-tools/serviceAccounts/rotation-test@mixpanel-tools.iam.gserviceaccount.com/keys/03f5bc6e8d62bc30b4739c6016374d0608053bc4]

google_secret_manager_secret_version.rotation-test: Creating...

google_secret_manager_secret_version.rotation-test: Creation complete after 0s [id=projects/839233470602/secrets/rotation-test-key/versions/2]

And then, after a little over 5 minutes we plan and apply again

google_secret_manager_secret_version.rotation-test: Destroying... [id=projects/839233470602/secrets/rotation-test-key/versions/2]google_secret_manager_secret_version.rotation-test: Destruction complete after 0s

google_service_account_key.rotation-test: Destroying... [id=projects/mixpanel-tools/serviceAccounts/rotation-test@mixpanel-tools.iam.gserviceaccount.com/keys/03f5bc6e8d62bc30b4739c6016374d0608053bc4]

google_service_account_key.rotation-test: Destruction complete after 0stime_rotating.rotation-test: Creating...

time_rotating.rotation-test: Creation complete after 0s [id=2024-07-24T20:18:29Z]

google_service_account_key.rotation-test: Creating...

google_service_account_key.rotation-test: Creation complete after 1s [id=projects/mixpanel-tools/serviceAccounts/rotation-test@mixpanel-tools.iam.gserviceaccount.com/keys/5d7dee0097010a88a1792f3fb79423ea6a0f0f77]

google_secret_manager_secret_version.rotation-test: Creating...

google_secret_manager_secret_version.rotation-test: Creation complete after 0s [id=projects/839233470602/secrets/rotation-test-key/versions/3]

Success! Terraform is rotating our service account key automatically for us!

Invalidation

Unfortunately, now we run into a rather thorny issue, invalidation. Terraform immediately destroys the old service account key and creates a new one, which means that anything using that secret loses access to our project until it gets the updated credentials. If you are using Kubernetes, pods can take up to a minute to update secrets. External Secrets Operator (which we use) can add up to an hour on top of that. Furthermore, some services might only load the secret on startup. This all boils down to our initial path causing outages every time it rotates the credential.

The common fix here is to have 2 (or more) credentials, rotating them in an offset pattern, and using whichever is the newest at all times. That way, when you update, you are invalidating the older key, which is (hopefully) not in use anymore, and creating a new key, which is now passed out to all the services.

Let’s implement this:

resource "time_rotating" "rotation-test" { count = 2 rotation_minutes = 5}

resource "google_service_account_key" "rotation-test" {

count = 2 service_account_id = google_service_account.rotation-test.id keepers = {

rotation = time_rotating.rotation-test[count.index].id

}}locals {

older = timecmp(time_rotating.rotation-test[0].id, time_rotating.rotation-test[1].id) > 0 ? 1 : 0

}

resource "google_secret_manager_secret_version" "rotation-test" {

secret = google_secret_manager_secret.rotation-test.id secret_data = google_service_account_key.rotation-test[local.older].private_key

}

This technically works, though it’s… not fun to set up, as initially on applying, the keys will rotate simultaneously, so you don’t get any benefit. You can do a custom import on the rotating_time objects to get them offset, or just wait half the period and terraform state rm time_rotating.rotation-test[1] to force it to rotate early and thus offset them, but there’s a deeper issue here. The time_rotating resource doesn’t rotate on a fixed schedule, i.e. 1:00:00PM then 1:05:00PM then 1:10:00PM, but instead considers the planning time as the start of a new rotation period. If the first one was at 1:00:00PM, but you don’t plan/apply again til 1:05:30PM, the next expiration will be 1:10:30PM, not 1:10:00PM. This can compound over time, til you potentially wind up with both keys rotating at the same time again, which then becomes an outage, and a particularly nasty one as you’ve probably got something doing those apply operations automatically (which we do, I’ll circle back to that later)

How do we fix this? I did a bunch of research, and… couldn’t find any answers. Plenty of threads about other people running into the same issues, but no solutions in sight. So I did what any sane engineer would do…

We wrote a Terraform provider: multirotate_set

We released a Terraform provider to plug all the aforementioned gaps once and for all:[multirotate_set](https://registry.terraform.io/providers/mixpanel/multirotate/latest/docs/resources/set).

Let’s put our new multirotate_set resource into play:

resource "multirotate_set" "rotation-test" {
rotation_period = "5m" number = 2}

resource "google_service_account_key" "rotation-test" {

count = 2 service_account_id = google_service_account.rotation-test.id keepers = { rotation = multirotate_set.rotation-test.rotation_set[count.index].expiration }}

resource "google_secret_manager_secret_version" "rotation-test" {

secret = google_secret_manager_secret.rotation-test.id secret_data = google_service_account_key.rotation-test[multirotate_set.rotation-test.current_rotation].private_key

}

A bit simpler than the time_rotating shenanigans. And after 5 minutes, we get:

# multirotate_set.rotation-test will be updated in-place resource "multirotate_set" "rotation-test" {

! current_rotation = 1 -> 0

! last_rotate = "2024-07-24T20:52:25Z" -> "2024-07-24T20:57:25Z" rotation_set = [ {

! creation = "2024-07-24T20:47:25Z" -> "2024-07-24T20:52:42Z"

! expiration = "2024-07-24T20:52:25Z" -> "2024-07-24T21:02:25Z" # (1 unchanged attribute hidden) }, # (1 unchanged element hidden) ] # (3 unchanged attributes hidden)

}

along with the service account key being recreated, and the secret value updating. Success! Again!

Enhancement: Expiration

One thing that’s missing, in my eyes, is key expiration. This is more of an optional thing, but it does look better on audits and is more of a forcing function to make sure you are rotating keys. This is starting to get into the specifics of GCP + Terraform, but let’s carry on and get it done.

First up: GCP won’t create an expiring service account key for you, so you’re stuck doing things manually, but that’s what Terraform is for. Start with a private key.

resource "tls_private_key" "rotation-test" { count = 2

algorithm = "RSA"

rsa_bits = 2048 lifecycle { replace_triggered_by = [multirotate_set.rotation-test.rotation_set[count.index]] }

}

The private key triggers the rotation, as we want a brand new key each time. Next up, generate a self-signed cert.

resource "tls_self_signed_cert" "rotation-test" { count = 2 private_key_pem = tls_private_key.rotation-test[count.index].private_key_pem subject {

common_name = "unused"

} validity_period_hours = 1 allowed_uses = [

"key_encipherment",

]

}

Note that you want the validity period to be longer than the rotation period, so the key stays valid. Next up, pass that off to Google.

resource "google_service_account_key" "rotation-test" { count = 2 service_account_id = google_service_account.rotation-test.id public_key_data = base64encode(tls_self_signed_cert.rotation-test[count.index].cert_pem)

}

And the final step, where it gets a little more complex as Google isn’t doing it for you: Build a credentials file to use, and stick it somewhere. (GCP secrets manager in this case)

resource "google_secret_manager_secret_version" "rotation-test" { secret = google_secret_manager_secret.rotation-test.id secret_data = jsonencode({

type : "service_account",

project_id : resource.google_service_account.rotation-test.project,

"private_key_id" : resource.tls_private_key.rotation-test[multirotate_set.rotation-test.current_rotation].id,

"private_key" : resource.tls_private_key.rotation-test[multirotate_set.rotation-test.current_rotation].private_key_pem,
"client_email" : resource.google_service_account.rotation-test.email,
"client_id" : resource.google_service_account.rotation-test.unique_id,
"auth_uri" : "https://accounts.google.com/o/oauth2/auth",
"token_uri" : "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url" : "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url" : "https://www.googleapis.com/robot/v1/metadata/x509/${resource.google\\_service\\_account.rotation-test.email}\\" })

}

And there you go! Rotating, expiring credentials without outages. As long as you keep on top of applying the terraform regularly of course.

Automation

It is trivial to set up a GitHub Action or some other kind of cron job to automate running terraform apply on a regular schedule. You could definitely stop there and you can consider your problem solved. However, definitely wanted to call out a SaaS tool we use that makes this automation super trival: Terrateam. With Terrateam, all we had to do to start rotating our credentials automatically was flip on drift detection and auto reconciliation, and bing bang boom, our credentials get automatically rotated regularly. No more toil.

Before tackling this credential rotation automation problem, we had already deployed Terrateam to help us scale our Terraform usage to meet the needs of 70+ engineers working in a monorepo. By enabling a gitops style Terraform workflow, we’ve completely sidestepped all the issues that come with multiple devs stepping on each other’s toes trying to make changes to terraform configurations at the same time. Terrateam has been great a great partner in deploying Terraform at scale, I’m always happy to recommend them whenever given the opportunity to do so!

If you enjoy eliminating toil like we did in this blog — Mixpanel engineering is hiring!

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.132.0. UTC+08:00, 2024-09-19 08:59
浙ICP备14020137号-1 $访客地图$