How to Identify Network Damage When a Hurricane Hits
Through the course of August and September 2019, Hurricane Dorian wreaked havoc throughout the Caribbean and mainland United States. It was the most powerful tropical cyclone on record to strike…
April 28, 2017
One of the reasons I got into networking is because it isn’t a very impactful part of IT. I mean, screwing up a WAN router configuration has minimal impact to a business, and nothing is more straightforward than a QoS configuration. I figured I could fly under the corporate radar and build my career uneventfully and in peace.
Unfortunately, that’s as far from reality as you can get. Networking is extremely impactful to the overall infrastructure and to end-users, and being a network engineer making changes on WAN routers and data center switches is by no means flying under the radar.
Even in small networks, a simple misconfiguration or not thinking through dependencies can ruin everyone’s day. A seemingly small thing like having one entry in an ACL out of place could potentially take down an entire data center. Advertising a prefix that changes the best path for a critical application could dramatically affect performance and create a firestorm of tickets for the help desk.
We need some way to validate the configurations we push to our network devices. I don’t mean a second set of eyes from the engineer in the next cubicle – I mean something programmatic that eliminates human error and doesn’t need to think of all the possible angles for each change request.
The industry is certainly moving in this direction. For example, NETCONF, which supports a rollback-on-error capability in the event of misconfiguration, is being used more frequently to configure many network devices. This is only scratching the surface, though. Considering so many network problems are a result of human error, we need mechanisms in place to mitigate this risk. This means analyzing dependencies, looking for security holes, checking configuration commands, and insisting on configuration best practices.
A simple task like creating an SVI and its associated VLAN on a core switch may seem like a completely non-disruptive action, but in more complex networks, a lot of questions must be asked before entering those commands on the big iron running in the midst of a data center. For example:
These are just a few of the questions I would ask myself when planning a network change. Especially in a complex environment, there are many more technical considerations that require careful planning and coordination, even for simple changes. In reality, the engineer is a single point of failure just like an appliance might be. All risk is placed on the shoulders of one person’s ability to think of all the angles. I understand that this can be somewhat mitigated by peer review and a change advisory board meeting, but even then there must be a better way to mitigate change risk.
NetBrain’s approach is to proactively guard against an engineer’s misconfiguration by using the concept of Executable Runbooks. Using an Executable Runbook, a design team can run an automated analysis in order to validate that the configuration meets a pre-determined set of criteria.
An Executable Runbook differs from a typical static playbook in that each step in the Runbook is an automated script. Runbooks are expandable by programmers and non-programmers alike, leveraging NetBrain’s visual programming environment. The key is that these analyses can even be triggered automatically by a change to a device recorded in an Event Management System. This is an example of automation which mitigates risk by largely removing the human component.
In this way, a new configuration can be automatically run through an exhaustive, pre-configured process of validation as a sanity check for a change after it’s been made. Think about that for a second. Assuming your Runbooks are well-designed, a typo, duplex mismatch, ACL overlap, incorrect next-hop in a route-map, and other easy-to-miss errors could all be easily and programmatically identified on a network before the change window closes and end-users are affected. This should be music to a CAB manager’s ears.
To put this into perspective, have you ever worked on a network issue and asked the question, “have any changes recently been made to this switch/router/firewall?” I know I have, and I heard someone say it in my office even today even as I wrote this article. Not only could that misconfiguration or miscalculation have been avoided, an Executable Runbook would also log the event in an event management system. There is software that can log changes, but having it as part of a single, integrated, and automated workflow is much more powerful than stitching together a tapestry of divergent tools.
Being a network engineer isn’t flying under the radar. Almost everything we do impacts our environments and is visible to the end-user. NetBrain’s Executable Runbooks mitigate the risk associated with configuration changes and address the risk associated with our most neglected single point of failure, the engineer.