Troubleshooting Is a Team Sport: Automation That Promotes Collaboration

Steve Lamont
By Steve Lamont September 10, 2019 3 minute read
Steve Lamont is a senior marketing manager at NetBrain Technologies. He has been an in-house content editor writing about operational and information technologies for over 10 years.

Ed. note: The following transcript has been drawn from the on-demand recording — no registration needed or form to fill out — of  NetBrain’s Just in Time Automation for IT Operations webinar. Jason Baudreau, NetBrain VP of Marketing, is your host. 

Let’s talk about how automation can help as teams collaborate with incident response.

When collaboration or escalation is necessary, there are a lot of inefficiencies. Engineers are often duplicating efforts because they’re just not on the same page or they’re not aware of what the other team member is doing. And the other thing we see is finger-pointing between teams. That’s not uncommon at all — whether it’s the application team and the network team, the security team, the server team — there’s always a lot of finger-pointing about whose problem it is.

IT Events workflow --> IT Challenges

If you look at mean time to repair (MTTR), it’s really a result of two things: what we call MTTI, which is I think the bulk of the challenge, with the repair being less time-intensive. When I say MTTI, I’m actually referring to two things:

  • The mean time to identify a problem. In the context of collaboration, I’m going to talk about this in terms of escalation and handoff.
  • But also MTTI can be mean time to innocence. We know the network is guilty until proven innocent. Unfortunately, this challenge falls on the network team to prove that innocence to other teams – the app teams, server teams, for example. And that’s not always easy.

When I talk to engineers, this is a familiar challenge. . . .Do other teams assume that every application slowness issue is really a network problem? What percentage of the time is it really the network?

MTTI + Repair = MTTR (Mean Time To Identify, Mean Time To Innocence)

Let’s look at how automation can address these two challenges. The answer we came up with is to automatically document user activities inside a runbook. The runbook is embedded within the map URL so that everyone can see what their colleagues are doing as they troubleshoot alongside them. Perhaps this is used for escalation, so that the Tier-2 engineer can see what has already been performed by Tier 1. Basically, a network map can help everyone understand who did what when, and what was the result. Again, get everybody on the same page.

IT Workflow - Work Together, on the Same Dynamic Map - NetBrain

See what collaboration looks like when workflows are automatically documented and shared across the team.

Note: The live demo of how Executable Runbooks facilitate and improve collaboration begins at 2:24.

Other “chapters” from the webinar:

Make Software, Not Humans, the Middleware (overview)

How Automation Augments a Typical NetOps Workflow (introduction)

Discussion and demo of Event-Triggered Network Automation

Simplifying Network Complexity with Interactive Automation (with demo)

Make Safer Changes: Automated Change Validation (with demo)

 

Or watch the entire  Just in Time Automation for IT Operations webinar on-demand recording — no registration needed or form to fill out.