Terraform

Using the Megaport staging API with Terraform

I have been working on rearchitecting our backup cloud connectivity and am considering using Megaport’s cloud router (MCR) product. I’ll post again in the future with more details of the design and its implementation, but I wanted to write a short note of appreciation about Megaport’s provisioning interface. They provide a complete self-service portal and REST API. In addition, they provide a separate “staging” portal and API, where “all actions mirror the production system, but services will not be deployed and you will not be billed for any activity.” ...

Using a .terraformignore file

By default, a Terraform Cloud remote run will copy the entire source repository to the TFC runner before it runs the plan. If there are lots of files in the repository that aren’t needed by Terraform, this can take a long time. Using the .terraformignore file can significantly reduce the time for TFC to prepare a remote plan. A common pattern is to have a terraform/ subdirectory in a repository to deploy the infrastructure that supports the application/service/code in the repository itself. For the purposes of TFC, only that subdirectory is needed by the runner. ...

Integrating Okta SSO with NetBox

Overview NetBox is a DCIM and IPAM tool for modeling infrastructure and serving as a source of truth for the desired state of the network. Okta is an IAM company that offers a single sign-on product, which can act as a central point to manage user access. As of NetBox version 3.1.0, native support for SSO authentication was added via inclusion of python-social-auth. This library supports many backends, including Okta via both OAuth2 and OpenId Connect. Until then, the only options for an external authentication provider were LDAP, an external plugin, or moving the authentication to a proxy and passing the results to netbox via HTTP headers. ...

Migrating a Perl CGI to AWS Lambda

Motivation In migrating our NOC website to from a traditional Apache server to a serverless architecture, I’ve needed to update or replace any dynamic components. For example, replacing a Wordpress installation with Hugo to publish static content to a S3 bucket served by CloudFront. In this particular case, it was a CGI script that reads our firewall configurations and presents a web page for visualizing and searching the many object-groups and access-lists. I chose to migrate this to run as a Lambda. ...

Process flow of a GitHub AWS Connector App connecting to CodePipeline and publishing to an SNS topic

Connecting GitHub to SNS using CodePipeline

Background In the last post, I documented an approach to fan-out GitHub repository updates to AWS services using API Gateway, Lambda, and SNS. In my conclusion, I wrote: The whole time I was building and testing this, I kept thinking to myself, “I must be overlooking a more obvious solution.” I’ve asked around, and it seems that others have also run into this issue, but ended up using a different approach that didn’t involve authorization. If you know of a better/different solution, please reach out! ...

Process flow of a webhook through API Gateway using a lambda integration to publish to SNS

Publish to SNS with GitHub webhooks

Motivation and Design I have a bunch of “audit scripts” that run against the network configurations (and other data sources, such as DNS and DHCP) to check for common problems, mistakes, and inconsistencies. They run on a centralized server that periodically fetches the latest data from all these sources, runs the scripts, and emails about any discrepancies. This data sources are kept in git repositories, either updated by operations staff, or automatically. In the case of networking gear, by a tool called RANCID that collects the text configuration and output of many useful “show” commands and pushes any changes a git repository for the role/group of the device. ...

[Split](https://pixabay.com/photos/log-bark-ball-glass-ball-split-4164303/) by [manfredrichter](https://pixabay.com/users/manfredrichter-4055600/) licensed under [CC0](https://creativecommons.org/publicdomain/zero/1.0/legalcode)

Multi-homed EC2

I had an interesting design requirement for a network monitoring host. These monitoring hosts, or collectors, are used to monitor our network from an external perspective – via the Internet. They also needed to be reachable from our internal network for central management, and needed access to shared internal services, such as directory services, time servers, and central logging. Design My initial approach was to deploy the hosts in a public subnet, set the default route over the Internet, and add individual host routes via the transit gateway to the subnet routing table. This was not great from an operational perspective and violated the requirements when one of the statically-routed hosts also needed to be monitored externally. ...

Updating AzureRM templates from Terraform

Summary I have deployed some Azure SQL Managed Instances using Terraform. Since there are no native resources for this service in the Azure provider, I used an Azure Resource Manager deployment template. Recently, I had to add an output to that template (so that another workspace could set up remote logging), and wanted to note my experience with updating deployment templates from Terraform. Here, I’ll detail the original design and then walk through the update process. ...

Terraform state replace provider

I recently had a revisit an old terraform project and update it. I had built a dev environment for our applications team, and they wanted to move it to production. Typically, whenever I go through a process like this, I take the opportunity to update things like pre-commit hooks and bump the terraform version to the most recent stable release. This happened to be a migration from a 0.12.x to a 0.14.x version. After updating the provider definitions, I ran terraform init and received the following error: ...

Terraform validate list object

Since version 0.13, terraform has support for custom validation rules for input variables. The example in the documentation shows how to test a single value: variable "image_id" { type = string description = "The id of the machine image (AMI) to use for the server." validation { # regex(...) fails if it cannot find a match condition = can(regex("^ami-", var.image_id)) error_message = "The image_id value must be a valid AMI id, starting with \"ami-\"." } } But, what to do if you want to validate a more complex object, such as list(string) (or other, more complicated types)? Terraform 0.14 introduced the alltrue function that makes this much easier and readable: ...

Terraform providers lock

As of version 0.14, terraform now produces a .terraform.lock.hcl file to record which versions of dependencies – currently, just providers – were chosen when terraform init was run. They recommend adding this file to your version control system so that all future runs will use and verify those same dependencies. These can be manually upgraded by running terraform init -upgrade. I commonly will develop locally and generate the lock file on my Mac. Later, as I push to production, I will migrate the workspace to Terraform Cloud, and get the following error for each provider in the lock file: ...

Terraform Drift Detection with GitHub Actions

The Problem A common issue with infrastructure as code, is that it is often possible for someone to go in after deployment and manually change things. I still want to preserve the ability for the infrastructure folks to go in and make emergency changes, but I also want to discourage this practice as much as possible. To this end, I’ve been using a pattern where any “out of band” changes are alerted to the rest of the team. That way, everyone can be aware there was a change made, and can go back afterwards and follow the standard procedures for the change. ...