The Perils of a Network Upgrade

Eyvonne Sharp
By Eyvonne Sharp December 7, 2017 3 minute read
Eyvonne Sharp is a network architect for a Fortune 100 healthcare enterprise. She's a co-founder of Network Collective, a bi-weekly video roundtable for network engineers. Before working in the enterprise, Eyvonne spent 10 years working for small VARs and integrators in the SMB space. Visit her website is www.esharp.net

The long-awaited change window for your network upgrade has arrived. Managers stack pizza on the conference room table, snacks are scattered throughout the office, and your changes have been tested as thoroughly as possible.

Your team has been planning code upgrades to your network core for weeks. It’s been challenging, to say the least. In last few years, your mid-size enterprise has invested significant resources to increase the redundancy and stability of the network core. You’ve been working hard to consolidate a network that began as a string of daisy-chained switches, ad-hoc configurations, and undocumented network surprises.

Although you’ve requested budget to create a lab environment to mirror your production core, the project remains unfunded. Instead, you’ve used every other resource at your disposal. You’ve read through the release notes document for the new code version and researched relevant issues. You’ve reached out to peers who have performed similar upgrades in the past. You’ve backed up the configs on every device on the network. You’ve talked with the vendor SE to ensure you’re moving to a stable version of code.

On top of all this, you’ve opened a proactive ticket with the vendor to expedite support should you need help in the middle of the night. As an added measure, you’ve downloaded relevant documentation and verified software images on your local drive in case you lose Internet access during the upgrade.

In preparation, you’ve documented the changes you need to make, step by step. You’ve developed a test for every incremental change, to be certain the network behaves as expected. You’ve determined key checkpoints along the way where you need to evaluate your progress.

Immediately before the change window starts, you grab a copy of the ARP table, the routing table, and the MAC address table from the devices you are upgrading, so you can compare after the reboot.

You are as prepared as you can possibly be.

Everyone’s in position. You’ve executed your plan and it’s time to see if your hard work will pay off.  You enter the seven characters that will determine the fate of your night. . . .

Reload<enter>

You wait.

This scenario is familiar to every network engineer who has worked in the enterprise. Most of us have experienced the thrill of leaving the office in the wee hours of the morning with a successful network upgrade under our belts. We’ve also experienced the agony when calls come in with unexpected impacts, strange behaviors, or seemingly unrelated application errors.

So why do things still go wrong even when we did everything we could to validate the change beforehand?

Monitor Multicast Changes

One of the great challenges in networking is that we have very little control over the devices and application traffic that ride our networks. Even when we have complete control over network configuration, we don’t always have control over network state. What kinds of state conditions can cause problems during an upgrade?

  • Servers with protocols enabled that you were unaware of — multicast is a frequent culprit.
  • Flows that follow a different path through the network after a convergence event.
  • Devices you were unaware of that can form neighbor relationships.
  • Poorly time circuit flaps, power outages, or hardware failures that are outside of your control.
  • Malformed packets which may trigger a bug in your network operating system.

In short, the more you know about your network, the more can you do to prepare, plan, and prevent problems during an upgrade.  NetBrain’s Dynamic Network Mapping can help you visualize the network in real time, to discover your full network topology.  You can visualize traffic flows through your network to understand where and how problems may arise.  You can discover misconfigurations that do not have an impact until traffic fails over to an unused link.

As you plan your network upgrade, you can build predefined validation tests that can be automated as part of Executable Runbooks.  These tools not only provide critical and timely information during your upgrade, they build confidence with your leadership team as you are able to provide specific test plans and data to prove your success.