Codify Tribal Knowledge to Avoid Overreliance on the Network Hero

Steve Lamont
By Steve Lamont April 9, 2018 5 minute read
Steve Lamont is a senior marketing manager at NetBrain Technologies. He has been an in-house content editor writing about operational and information technologies for over 10 years.

Imagine you’ve just come back from lunch and a network outage has brought your company to a standstill. You look over to your senior network engineer’s empty desk and remember he’s enjoying a vacation on the pink sands of Bermuda. Your team doesn’t have enough knowledge about the part of the network that’s effected — only the senior engineer does — so your team begins sifting and scanning through network data to find the issue.

This may seem far-fetched, but such a situation is a constant reality in many IT departments. The prevalence of tribal knowledge — information that’s held closely by a small group of people, or even just one person – results in situations like this every day. Tribal knowledge is extremely detrimental to a high-performing technical team for a couple of reasons:

  • There’s no common knowledge repository. All network diagrams and reports are kept on someone’s personal drive or buried in email threads, making it nearly impossible to troubleshoot an outage in real time.
  • The network hero as a single point of failure. If there’s only one person who has the expertise (the “network hero”) and his or her knowledge is not shared among other team members, network management and troubleshooting becomes exponentially more difficult.
  • No one can know everything. Consider the common issue of a slow application. There could be a number of reasons for slowness, from network latency to an under-powered app server. The network is part of  a larger collection of interdependent pieces, and no one engineer can know everything about it all. Knowledge-sharing among engineers and teams is crucial.

The goal here isn’t necessarily to hire an entire team of CCIEs (though that would be cool), it’s to arm engineers with the knowledge and tools they need to address incidents and work across knowledge silos. We need to avoid teams being paralyzed because one person is unavailable, like in the case of the vacationing engineer. Or worse, what if the network hero retires or gets another job? All that critical tribal knowledge walks out the door with him. The answer is to break down barriers and make information easily available to create an entire team of network heroes by developing a culture of collaboration and knowledge-sharing.

Digitize Tribal Knowledge in Runbooks

NetBrain directly addresses the issue of knowledge-sharing and collaboration both on the incident and infrastructure level. One of the core functions of Executable Runbooks is to enable teams to digitize best practices, policies, procedures and anything else that might be otherwise living in someone’s head, on handwritten notes or in a departmental playbook.

EXECUTABLE RUNBOOKSCodify your troubleshooting tribal knowledge into Executable Runbooks. Making this expertise executable empowers every engineer on your team to be a “Network Hero” who can solve the tough problems.

Executable Runbooks can be used to automate repetitive tasks, such as network troubleshooting steps and compliance checks — without having to write a single line of code. Runbooks make knowledge transfer not only possible but quite easy.

When you get down to it, a Runbook is a series of steps or tasks. Exactly what those steps would be depends on what you want to do. Think about our network hero in Bermuda: based on his experience and tribal knowledge, he’d go through a methodical series of steps to pinpoint the issue. Problem is, of course, that only he knows where and what to check for probable causes, and only he knows which steps need to be taken and in what kind of logical, sequential order.

That’s where Runbooks come in. Each step he would take is delineated in the Runbook, and the associated task is automatically run with a mouse click. For example, your network hero might first rule out the most common causes — say, an interface speed or duplex mismatch. If that’s not it, he’d look for other usual suspects like high interface utilization. If he sees something there, he then follows certain steps. If he doesn’t find trouble with high memory of CPU utilization, he takes another troubleshooting route.

With Runbooks, all the network hero’s decision-making process is documented in “if-then” branches so the next best steps are automated for anyone on the team to follow.  In the Runbook below, each successive step is a Qapp that performs one specific task.

Build “if/then” branches into Runbooks to automate next best steps for anyone on the team to follow – e.g., if errors are increasing on an interface, check for a duplex mismatch.

Or maybe the next step is to issue various CLI commands to collect and analyze data. Runbooks enable you to automate CLI commands across multiple devices in one fell swoop, and then visualize the output within the context  of the problem at hand right on a Dynamic Network Map.

QoS - PerformanceAutomatically issue CLI commands across multiple devices simultaneously and get the relevant data – like HSRP active/standby status or QoS queue drops – displayed right on the map.

So, instead of relying on your network hero to be available 24×7, incorporate his or her expertise into automatically executed procedures with runbook automation tools. Runbooks can be customized to any workflow, and — best of all — you don’t need to write any complicated Python scripts.  The NetBrain drag-and-drop visual programming environment allows you to codify network knowledge into lightweight applications. It’s automation without coding.

Check out how easy it is to use and build a Runbook in Matt Speidel’s blog post How Executable Runbooks Work.

 

Codifying tribal knowledge is just one way NetBrain automation makes the life of a network engineer easier. Discover others in the on-demand webinar (no registration or form to fill out) 5 Ways to Enhance Your Workflows with Automation.