Go back

Automation in Minutes: Top-10 Assessments to Prevent Outages

by Valerie Dimartino Apr 11, 2024

Downtime is expensive. More than half (54%) of the respondents to the 2023 Uptime Institute data center survey say their most recent significant, serious, or severe outage cost more than $100,000, with 16% saying that their most recent outage cost more than $1 million. 

The phrase from the movie, Apollo 13, “Failure is not an option,” is one of the most recognizable movie taglines of all time.

NetBrain Outage Prevention for registrationpage

In network operations, it’s the same mindset. Money and reputation are on the line. Failure is not an option.

Uptime Institute data suggests that each year there are, on average, 10 to 20 high-profile IT outages or data center events globally that cause serious or severe financial loss, business and customer disruption, reputational loss, and, in extreme cases, loss of life.

So why are we still so vulnerable given all the redundancy networks have built into them? Why do we continue to rely so heavily on manual processes and reactive troubleshooting? Network engineers spend countless hours putting in place the foundation for service delivery, yet there’s little or no regular enforcement. Only, when a problem is reported, are the wheels of troubleshooting put into (slow) motion.

The answer is: that we aren’t being proactive enough. This is due to a lack of focus on the network automation industry. We let the same problems keep happening over and over again when we know how to solve them because we simply lack the mechanisms to harness and apply this knowledge automatically across hybrid networks.

A Major Outage Spurs Change at Saudi Telecom (stc)

In 2021, a critical application at stc suffered a major service disruption. It took nearly a month of troubleshooting across network operations, servers, applications, and security teams to identify the cause and restore service. This costly outage highlighted the need for better visibility and a more strategic approach to incident management. As a result, stc’s Group CTO pushed for an organization-wide solution that provides end-to-end visibility and automates incident management across infrastructure and applications.

Imagine capturing your engineers’ expertise and applying it proactively across your entire network without coding. Network automation is helping network operations react faster, but it hasn’t been advanced enough (spoiler alert: until today) to apply that knowledge across the entire network proactively in an easy way. What if we could harness the vast knowledge of our network engineers and store it for use by an automation platform?

Every day, network operations teams assess the network for drift, compliance, health, and change manually. What if engineers could do these assessments regularly with the help of automation?