Why Problem Diagnosis Automation is so Hard
On December 7th, 2021, Amazon AWS had a major outage that started from a disruption in North Virginia and quickly spread across the entire country. Before long, many business sites...
September 10, 2019
When you look at what an Operations team faces on a daily basis, it can be daunting. In a single enterprise network, in a single day, teams will likely make dozens of changes stemming from possibly hundreds of trouble tickets every day. As you trace those tickets back to the tools generating them, you’ll see that there are thousands of alarms generated from tools like SolarWinds or Splunk. That’s as those tools are detecting and correlating on millions of events happening across the network infrastructure.
So when we talk to our customers, we know that these numbers are staggering and that they’re overwhelming. Frankly, we’re seeing that this is hitting a breaking point and that teams are struggling to keep up with this volume of tickets.
And when you take a look at those workflows which teams are using to respond to these events, you’ll see that they just don’t scale. So within a typical NOC there’s no shortage of those tools that are listening for those alarms. But the process to respond to a ticket or an event is highly manual. This meme illustrates some of the frustration engineers face, bouncing between CLI windows and different NMS tools as they try to make sense of all the data at their disposal.
To me, what this describes is a human acting as the middleware. What do I mean by that? They’re the middleware between the network – typically, one device at a time, through the CLI — and they’re the middleware between accessing the information from different tools, logging in to those tools one at a time. The result of this “human as the middleware” is that it’s a very manual workflow that results in manual troubleshooting, manual security defense (which is another type of troubleshooting), and ultimately a highly ineffective collaboration workflow as well.
We at NetBrain looked at this challenge and we saw an answer. The answer is to remove the humans as the middlewware and replace them with software — software that has access to the live network through the CLI, through CLI automation. The CLI automation is used to collect and analyze data across any multi-vendor environment. And then that software can read data from your existing IT tools and NMS tools. So all this information in the network becomes accessible from a single console.
With NetBrain, a network map is that primary user interface to the network.
The result that we’re trying to drive is a much more efficient and effective workflow, augmented by automation.
Check out the recorded presentation (< 3 minutes).