Why Problem Diagnosis Automation is so Hard
On December 7th, 2021, Amazon AWS had a major outage that started from a disruption in North Virginia and quickly spread across the entire country. Before long, many business sites...
July 18, 2017
Enterprise organizations incur thousands of network incidents each month which equates to many hours of IT time spent troubleshooting and repairing. For organizations that deal with networks at that scale, reducing Mean Time to Repair (MTTR) by even small increments can make a massive difference to the bottom line.
For organizations relying on manual processes – particularly in the troubleshooting and escalation phases – implementing network automation can reduce MTTR by up to 60 percent. For most organizations, the stem of the problem begins with a lack of end-to-end visibility. Outdated network maps require manual updating in the event of a network incident, and it’s time-consuming for network engineers to create a map through MS Visio each time.
Effective troubleshooting begins with a deep understanding of the network – not just the basic topology, but also the underlying design intent, configuration as well as understanding real-time performance characteristics. The only way to have that level of knowledge is through real-time network visibility and data analysis. NetBrain helps organizations reduce MTTR in three critical ways:
Network teams rely on documentation for troubleshooting. Unfortunately, most teams live without accurate diagrams because it takes months to document a large network and, once the project is complete, the maps are already obsolete. NetBrain allows you to automate network diagrams and keep them up-to-date.
For troubleshooting network issues, engineers need to know how traffic flows across the live network, from the source to the destination. As a diagram doesn’t exist for each traffic flow, engineers rely on traceroute to understand traffic paths. With modern networks, traceroute is very limiting – engineers need better visibility. With NetBrain, only the source and destination IP address is needed to dynamically map any traffic flow.
Troubleshooting network issues is like finding a needle in a haystack. Network monitoring tools are great for identifying issues, but provide little insight into the cause of the problem. NetBrain can cut troubleshooting time in half by automating hundreds of diagnoses, powered by Executable Runbooks.
One of the biggest challenges in network troubleshooting is that it’s a not a centralized process, and network engineers are working through a variety of different tools – ticketing systems, monitoring tools, security and event management system, and primarily the command line interface. With NetBrain, a Dynamic Network Map becomes the single pane of glass for troubleshooting – it integrates these systems together leveraging the map as an intuitive and visual user interface – rather than dashboards of data. This results in a more concise process and ensures that there is visibility into the entire troubleshooting process.
Troubleshooting is often a team effort – whether two engineers are collaborating side-by-side or an issue is escalated across the world. With NetBrain, teams can apply automation to every phase of troubleshooting – from ticket creation, to data collection, and for sharing knowledge of best practices. In each phase, sharing key insights is critical, and this can be accomplished by sharing a single map URL. Collaboration is critical, but it also takes valuable time, and enhancing the collaboration process will ultimately reduce MTTR.
To further improve collaboration, network teams can leverage existing scripts, design notes, text books, and tribal knowledge to digitize best practices into Executable Runbooks. Runbooks make knowledge accessible and executable, for every engineer on the team.