Dolphins

Leaving Bowdoin

After almost 20 years at Bowdoin College, I’ve decided to move on. This has been one of the most difficult decisions of my professional career. I thought it might be nice to mark this milestone – or is “fork in the road” a better metaphor? – with a bit of a retrospective of the projects I have initiated, led, and been involved in. In short, this has been a way for me to reflect on my experiences and process this life transition. ...

October 3, 2022 · 14 min · Jason Lavoie
Megaport and Terraform logos

Using the Megaport staging API with Terraform

I have been working on rearchitecting our backup cloud connectivity and am considering using Megaport’s cloud router (MCR) product. I’ll post again in the future with more details of the design and its implementation, but I wanted to write a short note of appreciation about Megaport’s provisioning interface. They provide a complete self-service portal and REST API. In addition, they provide a separate “staging” portal and API, where “all actions mirror the production system, but services will not be deployed and you will not be billed for any activity.” ...

July 6, 2022 · 3 min · Jason Lavoie
Diagram of data flow between NetBox, Teams, and Intrado EGW

Enhanced 911 with NetBox

Summary Over the past few months, I’ve been part of a project team to migrate an on-premises IP PBX to the Microsoft Teams cloud-based phone system. One component of this project is the Enhanced 911 (E911) service. E911 enables the capability to automatically provide the location information of the caller to the Public Safety Answering Point (PSAP) when an emergency call to 911 is placed. Any multi-line phone system implemented today must provide dispatchable location information. Recent regulation in Kari’s Law and RAY BAUMS’s act detail the compliance requirements. ...

April 27, 2022 · 35 min · Jason Lavoie
Big Ben

Disable time sync in VMware

Background In a recent upgrade of our monitoring infrastructure, I moved network monitoring off of physical hardware and onto virtual machines running on our VMware infrastructure. The migration was completely successful except for one small issue: clock drift. One of the many data points we monitor on servers and network gear is whether their configured time is in sync with the rest of the infrastructure. This is done by querying their current time (usually via NTP), and comparing it to the local monitoring server’s clock (also synced via NTP). If the offset is larger than a threshold, an alert is raised. The status of the NTP servers themselves, how many peers, what stratum, etc. is monitored separately. ...

February 25, 2022 · 4 min · Jason Lavoie
Four Site Hub and Spoke Network Diagram

BFD over broadcast networks

Overview What is BFD? Bidirectional Forwarding Detection (BFD) as defined in RFCs 5880 and 5881 is a protocol to detect network faults between the forwarding planes of two network devices. It is designed as a low-overhead protocol that can run over media that may not have built-in failure detection, including Ethernet, tunnels, and MPLS LSPs. Multiple control plane protocols can subscribe to a BFD session to be notified when connectivity is interrupted. This can help with faster convergence after a failure, as the IGP(s) do not have to wait for a connectivity timeout in their protocol. ...

February 8, 2022 · 7 min · Jason Lavoie
Okta and NetBox logos

Integrating Okta SSO with NetBox

Overview NetBox is a DCIM and IPAM tool for modeling infrastructure and serving as a source of truth for the desired state of the network. Okta is an IAM company that offers a single sign-on product, which can act as a central point to manage user access. As of NetBox version 3.1.0, native support for SSO authentication was added via inclusion of python-social-auth. This library supports many backends, including Okta via both OAuth2 and OpenId Connect. Until then, the only options for an external authentication provider were LDAP, an external plugin, or moving the authentication to a proxy and passing the results to netbox via HTTP headers. ...

February 3, 2022 · 4 min · Jason Lavoie
Banana Pieces

Troubleshooting TFTP

Another engineer reported that “TFTP is not working” when he was trying to stage firmware upgrades on our Cisco access network. I offered to help, and ended up spending a good portion of a day troubleshooting it. Replicate the Issue Fortunately, we have lab gear that I could test this on without affecting any production service. I logged into a 3850 stack in the lab and successfully transferred a test file from a TFTP server on a bastion host. ...

December 9, 2021 · 19 min · Jason Lavoie
[Mixed](https://pixabay.com/photos/legs-feet-different-mixed-standing-362182/) by [RyanMcGuire](https://pixabay.com/users/ryanmcguire-123690/) licensed under [CC0](https://creativecommons.org/publicdomain/zero/1.0/legalcode)

Network in OSPF database but not in routing table

I needed to troubleshoot a pesky OSPF issue on a new network. It turned out it was a simple fix, but had tripped up a couple other network engineers so I thought I’d lab it up and document the scenario. The problem The reported issue was that a network that was part of the OSPF process was not showing up in the routing table. Adjacencies between all routers were up and the network in question was shown in the OSPF database. ...

November 16, 2021 · 5 min · Jason Lavoie
Fire Hydrant Flushing

Filtering a packet capture by DNS Query Name

Overview An application problem was brought to me to troubleshoot. From the symptoms I observed, I was confident that the problem was an intermittent issue with the SAAS provider’s DNS. To prove this assertion, I needed to collect a packet capture of failed query. This post details the process I went through to collect that data. Investigation When the problem was reported, we saw our recursive nameservers returning NXDOMAIN in response to queries for the domain, when manual queries (with dig) directly to the provider’s nameservers returned valid data. As soon as the entry expired from the recursive nameserver’s cache, it was queried anew, and the reported issue was temporarily resolved. Based on this, my theory was that one of the SAAS provider’s – or their DNS provider’s – nameservers was occasionally responding with a negative answer to the query. I wanted to capture this response packet to help isolate and fix the problem. ...

October 28, 2021 · 6 min · Jason Lavoie
Device table showing support expiry information

Tracking vendor support status in NetBox

Timo Reimann wrote a handy NetBox plugin to collect and display support expiry information (End-of-Sale, End-of-Support, etc.) as well as the current Contract and Warranty coverage dates for all Cisco devices defined in a NetBox installation. His README does a good job showing the process for setting up the plugin, so I won’t repeat all the details here. The general process is: register an app with Cisco and obtain the API ID and secret. install the plugin (pip install netbox-cisco-support) enable the plugin (add to PLUGINS in configuration.py) configure the plugin (add to PLUGINS_CONFIG in configuration.py) apply the Django migrations (manage.py migrate) collect the EoX data (manage.py sync_eox_data) If all goes well, there will now be two additional tables in the UI device page for on any device whose manufacturer matches the manufacturer value in PLUGINS_CONFIG (default Cisco). ...

October 20, 2021 · 3 min · Jason Lavoie
NetBox device view with additional NAPALM tabs

NetBox NAPALM automation with bastion host

NetBox has an available integration with the NAPALM automation library. For supported devices, the NetBox device view will show additional tabs for status, LLDP neighbors, and device configuration. It will also proxy any (read-only) napalm getters (get_environment, get_lldp_neighbors, etc.) via the REST API. The basic configuration outlined in the documentation assumes that the NetBox server has direct ssh access to these devices. That is not the case if you use a bastion host or jump host. Here is how to configure this feature to work in such an environment. ...

October 7, 2021 · 3 min · Jason Lavoie
visualization of the netbox database

Netbox database schema diagram using schemaspy

While trying to wrap my head around some of the NetBox database relationships, I found myself wishing for a database schema diagram. I looked through the documentation and code repo, but didn’t find anything. A colleague recommended trying schemaspy, so I tried it. Setup I set up a fresh install of netbox on a Debian 10 VM, and downloaded schemaspy and its dependencies. Alternatively, they publish a Docker image. Install Java sudo apt install dfault-jdk JDBC Driver PostgreSQL has a download page for the JDBC driver. ...

September 14, 2021 · 3 min · Jason Lavoie
Lambda and Perl Camel

Migrating a Perl CGI to AWS Lambda

Motivation In migrating our NOC website to from a traditional Apache server to a serverless architecture, I’ve needed to update or replace any dynamic components. For example, replacing a Wordpress installation with Hugo to publish static content to a S3 bucket served by CloudFront. In this particular case, it was a CGI script that reads our firewall configurations and presents a web page for visualizing and searching the many object-groups and access-lists. I chose to migrate this to run as a Lambda. ...

August 30, 2021 · 10 min · Jason Lavoie
Process flow of a GitHub AWS Connector App connecting to CodePipeline and publishing to an SNS topic

Connecting GitHub to SNS using CodePipeline

Background In the last post, I documented an approach to fan-out GitHub repository updates to AWS services using API Gateway, Lambda, and SNS. In my conclusion, I wrote: The whole time I was building and testing this, I kept thinking to myself, “I must be overlooking a more obvious solution.” I’ve asked around, and it seems that others have also run into this issue, but ended up using a different approach that didn’t involve authorization. If you know of a better/different solution, please reach out! ...

August 20, 2021 · 7 min · Jason Lavoie
Process flow of a webhook through API Gateway using a lambda integration to publish to SNS

Publish to SNS with GitHub webhooks

Motivation and Design I have a bunch of “audit scripts” that run against the network configurations (and other data sources, such as DNS and DHCP) to check for common problems, mistakes, and inconsistencies. They run on a centralized server that periodically fetches the latest data from all these sources, runs the scripts, and emails about any discrepancies. This data sources are kept in git repositories, either updated by operations staff, or automatically. In the case of networking gear, by a tool called RANCID that collects the text configuration and output of many useful “show” commands and pushes any changes a git repository for the role/group of the device. ...

August 16, 2021 · 10 min · Jason Lavoie
[Split](https://pixabay.com/photos/log-bark-ball-glass-ball-split-4164303/) by [manfredrichter](https://pixabay.com/users/manfredrichter-4055600/) licensed under [CC0](https://creativecommons.org/publicdomain/zero/1.0/legalcode)

Multi-homed EC2

I had an interesting design requirement for a network monitoring host. These monitoring hosts, or collectors, are used to monitor our network from an external perspective – via the Internet. They also needed to be reachable from our internal network for central management, and needed access to shared internal services, such as directory services, time servers, and central logging. Design My initial approach was to deploy the hosts in a public subnet, set the default route over the Internet, and add individual host routes via the transit gateway to the subnet routing table. This was not great from an operational perspective and violated the requirements when one of the statically-routed hosts also needed to be monitored externally. ...

June 22, 2021 · 10 min · Jason Lavoie
Diagram of SQL MI creation flow

Updating AzureRM templates from Terraform

Summary I have deployed some Azure SQL Managed Instances using Terraform. Since there are no native resources for this service in the Azure provider, I used an Azure Resource Manager deployment template. Recently, I had to add an output to that template (so that another workspace could set up remote logging), and wanted to note my experience with updating deployment templates from Terraform. Here, I’ll detail the original design and then walk through the update process. ...

May 19, 2021 · 10 min · Jason Lavoie
Sonus SBC 2000

Ribbon SBC interface redundancy

Single-homed SBC In planning to migrate phone traffic from PRI to SIP, we decided to use an existing pair of session border controllers (SBCs) that were already in production for another (smaller) deployment. Before cutting over the whole organization’s voice traffic, I revisited the (3-year old) network design. While the two SBCs are in separate datacenters in separate buildings, each SBC is only single-homed. This means that there is SBC high-availability in terms of new calls, but existing calls will be dropped if there is a failure or maintenance on the switch. ...

April 29, 2021 · 8 min · Jason Lavoie
Crossroads

Direct Connect with VPN backup

The problem A common AWS connectivity design is to have a direct connect (DX) connection with a VPN backup. There are some routing concerns to consider when implementing this design to make sure that traffic prefers the DX circuit and only uses the backup VPN path if the DX is unavailable. Traffic from AWS transit gateway (TGW) will always prefer the direct connect gateway (DXGW) path, but traffic in the other direction (to AWS) is dependent on the customer gateway (CGW) routing policy. ...

April 27, 2021 · 3 min · Jason Lavoie
Multi-region dual-stack TGW/DXGW design

Where AWS IPv6 networking fails

Introduction AWS has made much progress over the years with IPv6 support. From S3, EC2, Cloudfront, Route53, and EC2 support back in 2016, to more recent updates to NLB and the EC2 API, I’ve appreciated every advancement and patiently waited for the next. Unfortunately, there are still pieces missing that prevent me from making full use of IPv6 in my employer’s current environment. Existing architecture The architecture is modeled after one of AWS’s recommended connectivity designs. VPCs attach to a per-region transit gateway (TGW) for access to each other, shared services, on-prem network, our Azure VNets, and Internet access. In practice, a set of TGW route tables (common, campus, etc.) allow association and propagation with and to these various routes. ...

April 15, 2021 · 4 min · Jason Lavoie