Applying Automation to Problem Diagnosis

Dynamic Map as the Visual Automation Console

Dynamic Maps are real-time visual representations of the network status, connectivity paths, and performance levels, with full interactive navigation and detail analysis. They not only provide a visual view of the network in real-time but also act as the foundation for NetBrain’s interactive automation console.

Based on different pre-defined troubleshooting situations, users can select a Guidebook from the Decision Tree, and run selected automation to resolve issues. Users can also execute a runbook with selected automation tasks to collect CLI commands, perform data analysis, and diagnose problems.

The concept of incident represents the overall context of a task, such as the troubleshooting process of a problem. It provides an integrated platform that organizes all necessary information for a network task and real-time communication and collaboration platform for all network engineers during troubleshooting. Incident provides the most effective way for multi-user cooperation to solve problems within NetBrain Workstation and between NetBrain Workstation and Incident Portal. It helps organize the maps and devices involved in the problem area and users’ efforts in an orderly manner. Each incident represents an overall context of a task, such as the troubleshooting process of a problem.

Interactive Automation

Interactive Automation is when NetBrain intelligence records network engineers’ diagnostic steps to create automation for their own use, enabling them to be more productive by getting data from multiple devices, looking for changes by executing a comparison automatically, and monitoring and getting alerts for threshold changes. It offers guardrails for any operator or engineer to make informed decisions based on network real-time status.

As networks get larger and more complex, automated documentation becomes more important, especially as networks incorporate technologies like SDN, SD-WAN, and public cloud. NetBrain not only provides end-to-end visibility across hybrid networks but also the ability to drill down into each segment and isolate network issues on a Dynamic Map that can be updated in real-time. This greatly accelerates problem identification and resolution. And when new devices are added or removed from the network, the documentation is quickly reflected in an updated network map which is exportable as diagrams to Microsoft Visio and Word.

With NetBrain, NetOps professionals leverage their own expertise to automatically record standardized procedures in a runbook as they detect, diagnose, and fix issues. The steps they use to diagnose issues, including the use of CLI commands on multiple devices, are automated in Runbooks.

Collaborative Automation

Collaborative Automation is where engineers and operators leverage the knowledge of peers with software that captures subject matter experts’ knowledge to create executable automation units that others can then add to their own diagnoses. Expertise is available even when the expert is not. NetBrain helps when troubleshooting a problem, assessing the state of the network, or making sense out of complex technology. It allows the NetOps staff to gather, analyze and visualize 1,000s of KPIs in seconds. It can be coupled with our collaboration capabilities to allow multiple ops teams (SecOps, DevOps, NetOps, ServerOps) to interactively resolve problems that span multiple technology domains without the need for time-consuming handoffs which result in delays. Everyone can get online at the same time and interact and make updates and remediations to the model through a shared analysis console.

NetBrain captures and codifies the SME knowledge using Runbooks, Data Views, and Network Intents. Automatically capturing this information allows experienced engineers to codify and share their knowledge with junior staff, effectively shifting knowledge left, from experienced users to less experienced team members The next time the problem occurs, the runbook is executed by responders without in-depth knowledge or training. Even complicated network issues no longer need to be handled exclusively by experts. You are essentially using the knowledge of these highly skilled level-3 workers when they are otherwise unavailable (due to location or availability).

Incident Portal

NetBrain’s Incident Portal enables collaboration among multiple users working on the same troubleshooting task. An incident represents a ticket in NetBrain to track a network problem or a network change. End users can organize and share maps, devices, individual insights, and findings targeting a specific troubleshooting task and collaborate with more colleagues to resolve issues reducing MTTR. In addition, Incident Portal offers an independent portal page for each incident. External users without NetBrain Workstation seat licenses can access a portal to join the collaboration session by viewing maps and posting messages, etc.

Triggered Automation

NetBrain’s Triggered Automation responds to external events, like tickets from an ITSM such as ServiceNow, events from Splunk, and other common ticketing and monitoring applications used by today’s NetOps teams. NetBrain provides an API interface for these third-party systems which triggers NetBrain to create maps and execute Runbooks making NetBrain the centralized console for all third-party data. NetBrain automatically performs diagnostic actions and gathers information for tickets before engineers or operators get involved, shifting the operational paradigm from human-centric to automation-centric.

NetBrain’s Triggered Automation captures the context of any problem the moment it occurs and makes it available to any network engineer when they begin troubleshooting. With this critical context, the network engineer can view CLI command outputs and the diagnosis NetBrain performed to help find the root cause quickly.

NetBrain triggered automation includes:

  • mapping problem devices
  • collecting CLI command and executing Qapp for diagnosis
  • returning a map link to the third-party system once the trigger is completed.

NetBrain dramatically reduces escalations by making all the info needed, including the fix, available to tier 1 engineers. With the information NetBrain inserts back into the ticket, many issues can be automatically addressed and stored as Runbooks that are created by engineers over time.

Trigger Problem Diagnosis Automation by ITSM systems

NetBrain integrates with ITSM tools, including ServiceNow and BMC-Remedy, which provide triggers for NetBrain to take action. NetBrain provides different integration methods for different ITSM tools. This allows NetBrain to act as the central command console for third-party ITSM information.

  • NetBrain users can integrate ServiceNow natively via the UI. The NetBrain app is available in the ServiceNow App Store.
  • NetBrain creates a dynamic map showing the vicinity of the specified ticket and allows the user to send the custom map via a fully enumerated URL back to the ServiceNow ticket. Any user can then click on the URL to leverage NetBrain to view the map as needed.
  • For other ITSM tools, including BMC-Remedy, included scripts are required to complete the integration setup.
Map Created by Trigger

After the ITSM tool detects an issue, NetBrain triggered automation automatically creates a map of the problem area including devices. The map acts as a starting point for engineers to troubleshoot network issues. NetBrain provides multiple ways to create maps for different network issues.

Capture Transient Network Issues by Trigger

After generating a map for network device problems, NetBrain triggered automation automatically runs a pre-defined Runbook template to collect important CLI command results for the network issue. The Runbook captures the data analytics in real-time and saves the results with the map giving the engineer a head-start with the troubleshooting process.

Execute Well-Known Diagnosis by Trigger

To diagnosis well-known issues, NetBrain can trigger the Qapp in the pre-defined Runbook template to auto-execute.

Incident Creation by Trigger

Incidents can be triggered by API Stub and Event Template. The user needs to run a python script where the parameters about the incident are appropriately set, such as incident subject, access code, etc.

Incident Response from NetBrain Console

User can create an incident if the user finds a problem when working on a map. Then invite others to join and collaborate within the workstation or portal.

Incident Response from SmartCLI

Users can publish device-specific findings from Smart CLI to the incident for collaborative troubleshooting.

Incident Response from Incident Portal

Users can enable the incident portal with an access code and invite external users (without NetBrain seat licenses) who will join the collaboration session by logging in to Incident Portal with the portal URL and access code.

Single-Pane-of-Glass View for Incident

Executable Runbooks can leverage datasets from third-party tools, enabling users to visualize information from all their existing tools into its Dynamic Map.

Once Runbooks are executed, you can share the results with anyone in the organization, facilitating collaboration and enabling higher-level engineers to encode their advanced knowledge into reusable tools.

Not only can NetBrain ingest data from third-party systems, but it can also trigger functionality on a third-party system, such as creating a new change request on their ticketing system.

In this example, once this is complete, this ServiceNow request is visible from Dynamic Map.
This feature allows NetBrain to act as a central command console for all their relevant third-party tools.

CLI Runbook Node

Executing CLI commands is the most critical step in runbook automation. However, it leverages traditional CLI for the key commands used to understand and troubleshoot network problems. The Execute CLI Commands node is the main portal for programming in NetBrain. Users can define Data Views, set alerts to for network issues, understand the network design with command results, and compare against historical data.

SmartCLI – Integrating automation into CLI

A redesigned SmartCLI application integrates the Dynamic Map, Executable Runbook, Compare, and Collaboration features on top of traditional CLI. SmartCLI allows network engineers to intelligently analyze the CLI output, automatically document the troubleshooting activities, and effectively collaborate with co-workers significantly reducing MTTR.