Automation for “Day-2” Network Operations

A typical enterprise may experience thousands of IT events every day – many are urgent, but all require manual efforts, causing longer resolution times.

NetBrain is designed to continuously reduce mean time to repair, by applying automation in three phases during incident response:

  1. Triggered Automation - before a human arrives on scene
  2. Interactive Automation - during active troubleshooting
  3. Proactive Automation - after the problem is resolved
IT automation steps

Triggered Automation

A critical phase of troubleshooting is the initial response and diagnosis. Since these first steps are predictable, we believe every IT problem investigation should begin with “zero touch” or triggered automation.

Where is time spent after an IT event occurs?

Triggered Automation Benefits

How Can Automation Help?

Eliminate idle time with event-triggered automation

By connecting NetBrain with your ITSM or Event Management solution, it’s easy to implement triggered diagnostic automation – to eliminate idle time, identify problems faster, and rule out potential problems automatically. The instant an event is detected, NetBrain Automation provides two essential functions:

  1. Map the problem area dynamically
  2. Execute triage diagnostics

By applying Triggered Automation during incident response, NetBrain closes the gap between the detection of a fault and the action of diagnosing.

Map the problem area instantly

Triggered by an event, NetBrain can automatically create a map of the relevant part of the network. This could be a site map or a path of an application flow. This helps to provide visualization across the infrastructure. A URL of this Dynamic Map is returned to your ITSM for quick access by anyone investing in the event.

Automate diagnostics across the network

NetBrain provides automation mechanisms to quickly scrape through reams of data such as CLI outputs, device configurations, and network telemetry. This automation is fully customizable, so you can organize common troubleshooting procedures into repeatable procedures called Executable Runbooks.

When you connect NetBrain with your ITSM or monitoring solution, you can define rules and conditions which trigger the execution of different diagnostics for different types of events. For example, a ticket for a slow application may trigger NetBrain to dynamically map the application flow and then execute a 3-step runbook which (1) collects common CLI data, (2) performs a health check at the device level, and (3) analyzes interface congestion.

Triggered Automation

Can I see a demo of Triggered Automation?

Meet a solutions engineer for a demo or watch a short video to see NetBrain in action.

Geek Mode

How does NetBrain integrate with my ITSM or Event Management tools?
NetBrain leverages RESTful APIs for integrating, so any tool that has these “hooks” can integrate with NetBrain both northbound and southbound. A common use case is to trigger the creation of a unique Dynamic Map for every new event, and then to run a diagnosis via Executable Runbook. The result is returned back to your event tool as an embedded map or URL to NetBrain.
What kinds of tools does NetBrain commonly integrate with?
Common integrations for NetBrain are with ITSM solutions like ServiceNow and BMC Remedy, 24x7 monitoring solutions like SolarWinds, CA Spectrum, or PRTG, or SIEM tools like Splunk. It is common for customers to integrate their monitoring and SIEM tools directly to their ITSM. In this case, they can rely on their ITSM as one tool for correlating events and triggering NetBrain Automation.
How much programming is required to configure Triggered Automation?
Triggered Automation is relatively simple and can usually be setup in a few hours. ServiceNow users can take advantage of NetBrain’s free app in the ServiceNow app store to further streamline this integration in as little as ten minutes.
How does NetBrain know what to automate when an event occurs?
Within NetBrain’s API Manager, you can define two types of tasks which are triggered depending on the type of event. The first task is to trigger creation of a Dynamic Map. The second is to trigger execution of a diagnostic via Executable Runbook. Since you can define any runbook, this automation is very flexible.
What types of maps can NetBrain create by a trigger?
Based on simple criteria inside a ticket or event, like a hostname or IP address, there are three common types of maps which NetBrain can create dynamically: (1) map of a device and it’s connected neighbors, (2) map of a path between two endpoints, (3) map of a pre-documented site. These parameters are configurable within NetBrain, or within NetBrain’s free ServiceNow app .
What types of diagnostics is NetBrain capable of automating?
NetBrain automation is extremely flexible. Virtually any data which you want NetBrain to analyze can be collected automatically by the system. This is typically performed through automated access to CLI (e.g. show interface to diagnose interface errors), API to a central controller (e.g. in the case of SDN or public cloud), or API to a 3rd party IT tool (e.g. Splunk or SolarWinds).
Can I see a demo of Triggered Automation?
Here is a short video of Triggered Automation. To learn more, you can request a demo with one of our Solutions Engineers.

Interactive Automation

Interactive Automation is designed to augment an engineer’s workflow – even for complex multi-stage efforts. With NetBrain a Dynamic Map is the user interface for automation, as an alternative to a CLI.

Interactive Automation can be used at various stages across a typical incident response workflow – from when an engineer first arrives on the scene, to how teams collaborate in real time, and when a fix is rolled out to the network.

Where Is Time Spent

Interactive Automation Benefits

How Can Automation Help?

Aid first-response engineers with guided troubleshooting

When an engineer first arrives on the scene to troubleshoot, there are a set of common questions they usually ask. NetBrain offers a set of tools to help answer these questions:

  • What’s changed in the network?
  • Is the network in a normal or abnormal state?
  • What should I do next?

NetBrain Data View Templates put virtually any network data at your team’s finger tips. Clicking a Data View dynamically turns on and off layers of data on top of a Dynamic Map, making it easy to visualize the network from different perspectives. For example, if you’re troubleshooting BGP, turn on a BGP Data View to visualize relevant configuration, or neighbor statuses. If you’re diagnosing packet drops, visualize interface errors like input drops or CRC errors.

Data Views not only display raw data, but also flag abnormalities in that data, across thousands of parameters. For example, the Golden Baseline may indicate that a BGP router should normally have four active neighbors. If that router loses a neighbor, this would raise as an alert on the map which may be a clue to something wrong.

To guide engineers towards more advanced troubleshooting, and help minimize the need for escalation, you can also define Recommended Actions. For example, if an alert indicates a BGP neighbor dropped, the Recommended Action may be a BGP Troubleshooting Runbook.

Improve team collaboration during active troubleshooting

Troubleshooting is often a team event so there is a need to get everyone looking at the same thing, at the same time, to reduce redundancy.

Contained within a single NetBrain URL is a Dynamic Map of the area under investigation and all troubleshooting steps performed against it. This troubleshooting record is documented automatically. As teams troubleshoot collaboratively, they can share this URL. This ability to get teams on the same page, facilitates better handoffs and avoids duplication of work.

Automatically push changes and assess the impact

Quickly restoring business services is the primary goal of incident response. But deploying a fix introduces risk of collateral damage. Its critical to effectively resolve outages while also mitigating risk during problem remediation.

From design, to implementation, to verification, NetBrain’s Change Management automates the entire change management process. You can push complex changes to multiple devices simultaneously and even integrate with Ansible, if that’s your tool of choice.

NetBrain’s Application Assurance Engine helps to quickly assess and visualize the impact of a change on the network, and the applications running on it. If any problems are discovered within the change window, you can roll back to the previous state with one-click.

Interactive Automation

Can I see a demo of Interactive Automation?

Meet a solutions engineer for a demo or watch a short video to see NetBrain in action.

Geek Mode

How is Interactive Automation different than Triggered Automation?

Triggered Automation is performed without the aid of a human, but instead triggered by a 3rd party tool or event. Interactive Automation is used as part of an ongoing human workflow during troubleshooting.

Examples of Interactive Automation include an engineer entering a source and destination IP address to automatically map between two endpoints with, A/B Path Calculator, or visualizing real time interface errors across 60 interfaces with Data View. The user interface for Interactive Automation is a Dynamic Map – designed to be intuitive and visual.

How does NetBrain collect data for diagnostics?

NetBrain collects data using CLI, SNMP, or APIs. The first time NetBrain is setup, you will need to provide the system with a list of read-only credentials which NetBrain will associate with each device in your network. When NetBrain needs to access data from a device, it will use the associated credentials in real time, to log in and issue CLI commands. The output from CLI commands is then “scraped” and analyzed automatically by the system.

How does NetBrain display diagnostic data?

NetBrain uses a Dynamic Map as a user interface for visualizing IT data. Since a single network device or interface may have hundreds of attributes, the data can be turned on and off dynamically – with a technology called Data View. Power users can define what kind of data is available within a Data View and which Data Views are available from a given map. For example, if a device on the map is configured with BGP, a BGP Data View will be available.

How does NetBrain know if observed data is normal or an anomaly?

NetBrain creates and maintains a Golden Baseline across thousands of parameters for each device in your network. For example, NetBrain may learn the core router in your Boston data center has a normal CPU range between 30% and 60% and that it has 4 BGP neighbors. If you’re observing this data on a Dynamic Map, you would see these parameters as alerts if they do not match the Golden Baseline. This may provide a clue to something wrong.

How does NetBrain create a Golden Baseline?

NetBrain performs a recurring snapshot, called a benchmark, across thousands of parameters in your network. Then, NetBrain uses AI techniques to look for trends across that data. For example, if NetBrain sees 7 consecutive benchmarks with CPU between 30% and 60% it may “assume” that this is the Golden Baseline. Later, if the CPU is 80%, this may trigger a Golden Baseline Alert. Users also have the ability to manually create and define Golden Baseline criteria.

With so much data available to NetBrain, how do users make sense of it?

Data of a similar category can be grouped into a single Data View. For example, a BGP Data View may highlight the map with relevant BGP configuration for each device, the number of BGP neighbors each device has, and highlight each interface configured with BGP a certain color. Power users also have the ability to define “Recommended Actions” with each Data View so that other end users know where to look for associated data or actions.

Can I see a demo of Interactive Automation?
Here is a Here is a short video of Interactive Automation. To learn more, you can request a demo with one of our Solutions Engineers.

Proactive Automation

Desiring to “do better next time”, world class teams leverage the post-mortem review - to determine how to prevent or reduce the impact of a similar problem in the future. Unfortunately, the success of such endeavors is fraught with difficulties applying these lessons at scale.

The goal of Proactive Automation is to codify lessons learned from every incident and translate them into automation tasks which can be leveraged by the broader team in the future.

Why aren’t lessons from previous incidents applied forward?

Triggered Automation Benefits

How Can Automation Help?

A history of the troubleshooting workflow is documented automatically

With NetBrain, all diagnostic steps and data from a given incident is preserved inside a runbook for review. The task of troubleshooting and documenting the troubleshooting process occurs simultaneously and automatically. This documentation is invaluable to help teams identify how they can do better next time.

Workloads are “shifted left”

The process by which a team empowers junior engineers to minimize escalations is known as “shifting workloads left”. When engineers document their workflows and share them as Executable Runbooks, it makes this goal much more attainable.

Executable Runbooks can be shared with the team, in the form of Interactive Automation, by offering them as “Recommended Actions” during troubleshooting. The same runbooks can be shifted even further left and configured to execute with zero human touch via Triggered Automation. Either way, shifting know-how and workloads to the left frees up senior network engineers and continuously reduces MTTR.

Proactive Automation

Can I see a demo of Proactive Automation?

Meet a solutions engineer for a demo or watch a short video to see NetBrain in action.

Geek Mode

How are runbooks created?
Runbooks can be created as a standalone process to document and share know-how. Or they can be created automatically as part of an ongoing troubleshooting process. Every task performed in NetBrain is automatically documented as a step within a runbook. At the end of a troubleshooting event, all steps within a runbook can be selected and saved to create a runbook template. This template can then be shared with the team for use in the future.
How are traditional CLI activities documented in a runbook?
NetBrain offers a SmartCLI client, which acts just like a traditional CLI tool, similar to Putty. One difference is the ability to Document to NetBrain from within the SmartCLI client. Any text within a CLI’s output can be sent to NetBrain. NetBrain documents this result to a runbook node. Once documented, NetBrain will intelligently parse the text and analyze it for variables which NetBrain recognizes as discrete data. These variables can be used for automation.
After troubleshooting is complete, how are runbooks shared?
At any time, a NetBrain runbook can be saved as a template. You will have the opportunity to select and remove steps within the runbook which you want to preserve or delete. Once the template is ready, it can be shared within a Runbook library, offered as a “Recommended Action” to users, or even set to trigger automatically with zero human touch.
How does Proactive Automation prevent problems from happening?
Proactive Automation is about “doing better next time”. If it took 4 hours to troubleshoot last time, will it take 4 hours next time? Proactive Automation is also for eliminating future problems. Automation can be scheduled to run against the entire network proactively, to look for problems before they are experienced by an end user. For example, if you found a failover device that was not configured with the same QoS policy as the primary, this would cause a failover to fail. To ensure this problem doesn’t exist across the rest of the network, you could run this diagnostic network wide every Sunday at 12am.
Can I see a demo of Proactive Automation?
Here is a short video of Proactive Automation. To learn more, you can request a demo with one of our Solutions Engineers.