Go back

Applying Automation to Reduce MTTR

by Mark Harris Jul 18, 2017

Enterprise organizations incur thousands of network incidents each month which equates to thousands of hours of IT time spent troubleshooting and repairing. For organizations that deal with networks at that scale, reducing Mean Time to Repair (MTTR) by even small increments can make a massive difference to the bottom line. The surprising part is there are relatively few problem types seen in an enterprise, but each of these problem types are repeated over and over again. It’s because while most problems appear unique, they are actually quite similiar to previously addressed problems. Most enterprises have not realized this, so each and every problem is addressed bespoke- as if it has never been seen before. This yields tons of redundant effort and inconsistency.


Today, most organizations relying on manual processes to address network problems- particularly in the troubleshooting and escalation phases. What they don’t realize is implementing network automation for all of that repetitive work can dramatically reduce MTTR by 60 percent or more, and can be accomplished in days, not weeks or months or even years!

For most organizations, the root of the problem begins with a lack of end-to-end visibility since an engineer or technicuian will spend an hour or two just establishing the context of the network problem itself, including running preliminary diagnostics and finding topology details. Outdated network maps are useless in the event of any network incident, since even a minor inaccuracy will greatly impede problem resolution or make it impossible. And it’s time-consuming and redundant for network engineers to gain an accurate understanding of the problem vacinity each time.

Effective troubleshooting begins with a deep understanding of the network – not just the basic topology, but also the underlying design intent, configuration as well as understanding real-time performance and security characteristics. The only way to have that level of knowledge is through real-time network visibility, performance diagnostics and data comparison to known good baselines.

NetBrain helps organizations reduce MTTR in three critical ways:

Accelerate Network Troubleshooting Through Automation

Troubleshooting network issues is like finding a needle in a haystack. Network monitoring tools are great for identifying device level issues, but provide little insight into the cause of the problem or the larger view of services that may be affected. NetBrain can cut troubleshooting time in half by automating hundreds of pre-built and shared diagnostics, powered by our Network Intent technology. In fact every part of the network may be described not only by it’s connectivity, but by leveraging our Network Intent technology, real-time perfomance requirements and security profiles are visible as well. It could be said that a network of 1000 physical or virtual services should actually be described by TEN-TIMES that amount of Network Intents, or 10,000 Intents! (And NetBrain makes it easy to generate all of those Netork Intents since we can apply problem solving of SIMILIAR situations across the network at scale).

Consistency is also part of the automation value. One of the biggest challenges in traditional network troubleshooting is that it’s a very individual art, based on the knowldge and experience of each engineer. And since network engineers are working through a variety of different tools – their resolutions will be highly unique and rarely transferrable to additional situations that they or their collegues may encounter in the future. In addition, NetBrain’s Dynamic Network Map becomes the single pane of glass for troubleshooting which becomes the foundation  for automated Network Intent diagnostics to be applied.  This results in a more concise process and ensures that there is visibility into the entire troubleshooting process.

Automate Network Documentation in Real-Time

Network teams rely on documentation for troubleshooting. Unfortunately, most teams live without accurate diagrams because it takes months to document a large network and, once the project is complete, the maps are already obsolete. NetBrain allows you to automate network diagrams and keep them up-to-date by using patented auto-discovery technology which interacts with each devices L2 and L3 tables, along with a wealth of other detail, to establish the real-time view of the entire network, from edge to cloud. (Yes, it include visibility and control into all of the major cloud providers virtual controllers and services).

And this dynamic view provides vastly more detail that ever seen before! For troubleshooting network issues, engineers need to know how traffic flows across the live network, from the source to the destination, in both directions. As a diagram doesn’t exist for each traffic flow, engineers rely on traceroute to understand traffic paths. With modern networks, traceroute is very limiting – engineers need better visibility. With NetBrain, only the source and destination IP address is needed to dynamically map any traffic flow, the flows will have all of the required performance metrics included in those views.

Improve Collaboration and Share Best Practices

Troubleshooting can be a lengthy process when various organizations must get involved. Other operational teams are typically engaged through escalation, which delay problem solving.  As a real-time platform, multiple engineers can collaborate in real-time to address any issue. With NetBrain, NetOps, SecOps, DevOps and other operational teams can apply automation to every phase of troubleshooting – from ticket creation, to data collection, and for sharing knowledge of best practices. In each phase, sharing key insights is critical, and this can be accomplished by sharing a single map URL. Collaboration is critical, but it also takes valuable time, and enhancing the collaboration process will ultimately reduce MTTR.

To further improve collaboration, network teams can leverage existing scripts, design notes, text books, and tribal knowledge to digitize best practices into Executable Runbooks. Runbooks make knowledge accessible and executable, for every engineer on the team.