How to Identify Network Damage When a Hurricane Hits
Through the course of August and September 2019, Hurricane Dorian wreaked havoc throughout the Caribbean and mainland United States. It was the most powerful tropical cyclone on record to strike…
April 28, 2017
I recently had an opportunity to chat with Todd Bristol, Principal Design Architect at Move, Inc. to discuss the growth in enterprise networks, the limitations of tools that manage these evolving networks, and how NetBrain helps. Todd is a very seasoned network architect, having managed complex enterprise networks for over 20 years. Move, Inc., a subsidiary of News Corp, is a leading provider of online real estate services and operates the Move network of real estate websites and mobile experiences for real estate professionals.
Below is an excerpt of my conversation with Todd.
Question #1: What do you see as the top challenge or limitation with the network management tools today?
Design and implementation of new technologies introduces new set of challenges and complexities. One big challenge is the ability to conduct deep network design analysis and understand the failure events before being deployed in the production environment. There is zero room to figure out the challenges in production.
To mitigate this challenge, most people have two common environments: (a) Physical network/production, (b) Virtual environment for learning / sandbox. Usually network management tools are set in the production environment and the sandbox is used for planning and test. In most cases, network management packages are purchased with the intention of monitoring, tuning, and benchmarking production environments. But we seldom experience the personality of an outage because after all you don’t want the outages to be happening in production. Likewise, network simulation packages run in isolated virtual environments, away from the boundaries of production networks.
While virtual network environments present a long awaited sandbox for engineers to become more familiar with commands, protocols, and technologies without having to spend money or support stacks of physical hardware, they generally fall short when it comes to providing real-time analysis. It is attributed to the fact that a vast majority of platforms and tools have significant limitations to show traffic engineering, route/ARP table comparisons, and, lastly, digest & render data in truly large networks with stacks of physical & virtual devices in real-time.
Question #2: How are you currently using NetBrain? Describe the top use-cases in your environment.
Last year we were given a challenge of connecting two additional 10G circuits between AWS environment and our data center. The design shown below is what we were eventually going to build out focusing on the AWS WEST (to be copied to AWS EAST).
Figure 1: A high-level design view showing the connectivity between the Datacenter and AWS
We deployed NetBrain in their virtual sandbox environment using Cisco’s Virtual Internet Routing Lab (VIRL). It leverages the lab built out in VIRL which was designed to mirror our production network. This environment was used to understand the live network’s design & performance and also to predict/forecast/analyze/anticipate the impact a change will have on the live network before it can cause a problem.
Figure 2: Lab built out in VIRL designed to simulate the connectivity between the Datacenter and AWS
The problem is when it comes to testing the methodology or ways one switches from one circuit to another during failure. Typically, one has to manually go and grab all the information. We took NetBrain and pointed it to VIRL running in packet which is basically VIRL running in the cloud. NetBrain effortlessly reports the state of my environment as it is changed. We built this out such that the normal traffic flows through the western path based on the routing design. In the event of a failure on the western path, traffic will flow through the eastern path.
NetBrain provides real-times visibility into the list of devices that have changed and number of changes made. In this case, we introduced just one change but we can see that actually ended up in four changes on one of the router. Also, we see that another router downstream router that had nothing to do with the change and was far away from it but got impacted as well wedging its ARP/CDP/Route tables.
Figure 3: Validate fault impact across multiple devices
Now after the change has been made we can see the expected traffic flow. At a high-level we can see that the network is behaving as per its design as we broke the circuit.
Figure 4: Validate traffic flow after induced failure
The ability to build and test configurations within VIRL is impressive by itself, but to also have the ability to use the features of NetBrain to graphically validate routing tables, traffic management, and configuration differences is pretty huge. So at the end of the day what it does for us is that now we can actually enjoy the orth of an outage and get more intimate with it which helps to understand the complexity associated with new technology
To learn more, please visit: https://learningnetwork.cisco.com/docs/DOC-30976
Question #3: How would you summarize the key values offered by NetBrain?
NetBrain has been extremely beneficial in helping us validate network design, determining performance hotspots, and deployment scenarios including the effects of network changes on the devices. The two key benefits include:
Improving Network Efficiency and Availability
NetBrain provides Move, Inc. the ability to understand and comprehensively validate their network design. It provides real-time visibility into their design and helps to identify serious routing issues caused by network changes. Also, it helps to graphically visualize and validate performance hotspots caused by different network scenarios and traffic patterns. The platform can highlight the changes and congestion points on the network map and ultimately tie the problem to the network design.
Accelerating Access to Applications and Services
Move, Inc. has had some challenges managing applications. NetBrain helped to improve operational preparedness by validating accessibility to network applications and services. It helped to improve the stability of the network by identifying the points of vulnerability and resolving them before they could affect the organization.
After a long and engaging discussion, Todd concluded, “NetBrain has completely revolutionized the way I approach network design staging and analysis. Like the fire department, it can be used for practice fire drills to understand what failures look like in order to better design and alert around them.”
Being a network engineer, it was gratifying and a great learning experience for me talking about the changing enterprise networks and tools that manage them. I’m very appreciative of the time with Todd and look forward to stronger partnership between Move, Inc. and NetBrain.