How to Identify Network Damage When a Hurricane Hits
Through the course of August and September 2019, Hurricane Dorian wreaked havoc throughout the Caribbean and mainland United States. It was the most powerful tropical cyclone on record to strike…
December 18, 2017
Imagine a network outage in the middle of the workday at your organization. Now imagine that no one has any clue what to do because your senior network engineer is on vacation scuba diving in the South Pacific, and he or she is the only person who knows anything about the network. This may seem silly, but the reality is that the prevalence of tribal knowledge in IT departments results in scenarios like this all the time.
The term “tribal knowledge” refers to information that is held closely by a small group of people, possibly even just one person, and is either unintentionally or purposefully held back from the larger group. This is common practice in today’s IT departments and is detrimental to a high-performing technical team.
The individual components of our IT infrastructures are so intertwined that engineers need to work across teams in order to resolve problems. Consider the common issue of a slow application. Is it network latency that’s hurting application performance? Could an application server itself be under-powered? Could it even be the end-user’s computer that’s the problem?
Since the network is part of a larger collection of interdependent parts, and since no one engineer can know everything about all things, knowledge-sharing among engineers and teams is critical. Nevertheless, some keep information very closely guarded. In this way they may seem like the network hero, though there’s nothing heroic about keeping important information from the team.
This can manifest itself in many ways. Perhaps an IT department has only one network engineer, and all the network diagrams and spreadsheets are on his personal drive. Perhaps an IT department has a few network engineers, but they don’t share any information with the applications team. Maybe managers purposely keep information to themselves so they can ensure that they’re always the hero.
In my experience, IT departments deal with this problem in various ways (if at all). To some the answer is formal workflows. For others the answer is culture. Ultimately, with regard to tribal knowledge and the network hero, the answer isn’t to make sure any one person knows less, but that the rest of the team knows more.
With regard to tribal knowledge and the network hero, the answer isn’t to make sure any one person knows less, but that the rest of the team knows more.
The goal isn’t necessarily to hire an entire team of CCIEs, though that would be pretty cool. It’s to enable engineers with the knowledge and tools they need to address incidents and work across knowledge silos. The answer is to break down barriers and make information easily available.
If you’ve read The Phoenix Project, by Gene Kim, Kevin Behr, and George Spafford, you’re familiar with Brent, the technical bottleneck in the IT department of Parts Unlimited. In the story, Brent does nothing wrong per se, but he is the sole keeper of certain technical knowledge and a major single point of failure for the entire team. Brent is smart, willing to work, and quick to find solutions, but his priceless knowledge is in his head alone — leaving team members shooting in the dark when Brent isn’t around.
Over time, the solution presented to solve the problem of tribal knowledge at Parts Unlimited is to develop a culture of collaboration and knowledge-sharing, even if it hurts. The authors present a solution that doesn’t involve getting rid of Brent or tearing him down, but enabling the entire team to rise to his level.
In the novel, we’re presented with a fictitious IT department with idealistic scenarios and attitudes, so how can a real IT department deal with their own Brent?
It comes down to collaboration and knowledge-sharing, but the key is that it must be easy, consistent, and part of a team’s workflow. How many times have you started using an organizational tool such as Trello, Asana, or even Post-it-notes, only to give up because they weren’t helping? The problem isn’t with these tools – it’s with the discipline to use them.
NetBrain directly addresses this issue of knowledge-sharing and collaboration, both on an incident and infrastructure level. But what really brings it together is how NetBrain can fully integrate with an IT department’s existing systems in order to make knowledge-sharing easy, consistent, and part of the daily workflow.
First, NetBrain offers rich API integration, allowing any third-party system such as a ticketing system, monitoring tool, or IDS appliance to trigger NetBrain to go into action. NetBrain can integrate into an existing database, making an IT department’s CMDB always consistent and up-to-date. NetBrain easily becomes part of an IT department’s workflow.
Second, Dynamic Maps allow the network to document itself. Dynamic Maps are on-demand network diagrams which you can publish as a URL for the entire team to use. This provides an entire team the detailed and specific information about the network that’s all too often held tightly by one or a very few engineers.
Third, Executable Runbooks allow teams to digitize their best practices. Runbooks can be used to automate network troubleshooting steps and embed the data inside the Runbook workflow. This is then attached to the Dynamic Maps, making it easy to share finding and track results within the context of an incident.
All the information about the network already exists inside the devices, so a programmable solution that can gather that information and present it to the team enables any engineer to make and rollback changes. Instead of an on-call engineer struggling to hear the senior engineer speaking to him on a satellite phone from a beach in Fiji, anyone on the team is empowered to work the issue – thus eliminating tribal knowledge and giving the entire team a chance to be a network hero.