April 23, 2019
You never know how complicated your job is until you have to explain it to somebody. About two years ago, I was training my replacement and I realized how many plates I needed to keep spinning, so to speak, at any given moment of my day. Communicating this was troublesome because I often wouldn’t remember a process I was responsible for until it broke. Over the course of about two weeks, I ended up creating over a dozen documents describing network and business IT processes in a way that people other than me could understand. Funny enough, I found out those documents are still in use, helping an even newer IT guy do his job.
It makes you think – how many tasks do you do on a daily, weekly, or even provisional basis that consume time you could otherwise spend constructively. For example, writing a python script to automatically generate new phone extensions, or performing basic checks on the routing policies of every firewall inside of your demarc. The big thing that deters people is the learning curve associated with automating their tasks – people might feel like it takes just as long to learn the shortcut as it would to perform the task outright.
Time is money, especially in the world of networking, and the ability to automate the grunt work has a rather steep bar to entry. Fortunately, NetBrain offers a solution even a non-programmer can utilize – Runbooks.
Runbook Automation Tools are an essential aspect of NetBrain that often get overlooked – in fact, NetBrain’s customer success program spends a lot of time introducing people to and customizing runbooks for various network processes that were previously done manually.
Runbooks are script-less, meaning that the only thing the client needs to do is define the task that they want to accomplish, and NetBrain will perform various pre-defined or customized tasks for the user.
Now, we’re going to go into some tools that make runbooks great.
When you add steps into a runbook, you’re given ten different types of actions that the runbook is capable of executing.
- Run Qapp: Execute a predefined or customized Qapp on your NetBrain platform.
- Overall Health Monitor: Execute a static Qapp known as the Overall Health Monitor in order to diagnose possible basic problems that may be causing device slowness or outages.
- Run Gapp: Execute a sequence of Qapps (known as a Gapp) for a more complex and abstract process that a Qapp may not be able to handle alone.
- Execute CLI commands: Have NetBrain remotely log into the device in order to gather specific config information for the user.
- Retrieve Live Data: Have NetBrain pull current network routing and configuration data and display it on the dynamic map.
- Ping: Have NetBrain execute a ping to a certain address, from a source address or device that the user defines
- Traceroute: Have NetBrain execute a traceroute between two points and display the results to the user.
- A/B Path: Have NetBrain calculate the critical path between two points on the network.
- Free Text: Leave notes, instructions, or requests directly on the runbook for later reference.
- Compare: Check historical data against live data or other historical benchmarks in order to spot the changes made to device configurations and topology over time.
For the purposes of this blog, imagine that we’re troubleshooting a network outage, and I want to not only automate this process but also create a ‘how-to’ for anyone who needs to troubleshoot the network after me.
The Overall Health Monitor
First, for this troubleshooting exercise, I’m looking at a basic automation tool – the overall health monitor.
The overall health monitor is a pre-loaded application within NetBrain that performs tier-0 analysis of a network device to rule out common causes of device failure, eliminating the time spent diagnosing the problem manually. Specifically, the OHM looks for specific thresholds of:
- High CPU utilization
- High Memory utilization
- Downed Interfaces
- High Input/Output traffic
As you can see, I’ve created a new runbook for troubleshooting a downed network device, and the health monitor became my first step. Once the health monitor has run its course, you now have time you would have otherwise spent troubleshooting something that probably wasn’t the problem.
Q-apps are customizable applications within NetBrain that can be used to automate tasks. Most runbooks at some point integrate Q-apps into their procedural steps. This is particularly useful to a user who doesn’t entirely understand how to perform a specific task, as the Qapp will illustrate these steps to the user.
Highlight Routing Protocol – Now that I’ve determined the systems aren’t experiencing health issues, I need to find out why traffic isn’t being routed – fire off the highlight routing protocol Qapp and the system will provide me detailed information over the designated zone.
Spanning Tree – Ok, so it seems like I’m dealing with an OSPF environment. Next, I need to understand whether STP has been configured on the L2 environment. Maybe something was triggered on the switch interfaces that caused my outage?
G-apps are simply strings of Q-apps used to automate more complex and abstract tasks within the network. Here, I’m using NetBrain to investigate whether I have interface errors or duplex mismatches within my designated devices. This is a slightly more complicated task, and instead of searching for and chaining up a series of Q-apps within my runbook, I already have the process defined in a G-app.
Finally, once I’ve been able to identify the issue within my network, I need to make changes to all the affected devices. And, you guessed it, that can be automated too. NetBrain has a wide variety of CLI parsers (see my blog CLI Parsers: The Basic Unit of Automation for details). Normally, you’d go into the CLI of each device manually, but in NetBrain it’ll look like this:
Here, you can specify which devices you want to interact with and execute code in ‘batches’, or in other words all at once using specific CLI parsers. Here, I can tell NetBrain to pull specific values from each of the devices and display them to me. In this example, I’m telling NetBrain to collect config information on each of the device’s interfaces in order to find which areas are experiencing trouble.
All of these troubleshooting steps would have taken me a while to accomplish one-at-a-time, but thanks to my runbook I’ve just empowered myself to go through a whole troubleshooting process in a matter of moments. Each of these tools – the Overall Health Monitor, my Q-apps, G-apps, and Batch CLI, all represent tasks a network engineer would normally perform in the event of some kind of network trouble.
Not only does it make my job easier, but having this process defined enables other people to understand the thought process that went into creating the runbook, allowing this knowledge to be freely and easily shared with anyone who needs to use it.