Go back

Automating BGP Configuration at Scale

NB author by Phillip Gervasi Apr 10, 2018

Manually configuring the specific attributes of BGP peers on dozens of routers doesn’t make sense anymore. It makes less sense to do it on a large scale. Much of the cost associated with maintaining an enterprise network is operational expense, so why are we still spending so much time with tasks that server administrators figured out how to automate decades ago?

One of the strengths of the Border Gateway Protocol, or BGP, is that it’s highly tunable in that there are many settings and configuration options available to an engineer. BGP differs from other routing protocols in that it’s path vector rather than distance vector or link state. That means BGP can make path selection decisions based on more factors than a typical IGP like OSPF or EIGRP.

Because of its tunability, BGP has been making its way from the WAN into the data center as a way to segment racks of servers or even individual servers into separate layer 3 domains. Consider that one robust ESXi host can have dozens of servers running on it at the same time. Not only is BGP ubiquitous in the WAN, it’s becoming a standard in the data center as well.

NetBrain automates what we did manually for years. The old way doesn’t make sense anymore, and really, it never did.

 

Whether an engineer is using BGP in the data center to route to individual hypervisors or in the WAN to route to networks all over the world, the configuration can become complex very quickly. For example, a common network design is to use multi-protocol BGP over an MPLS core using multiple VRFs and one or more IGPs with route redistribution. In a large network, this can turn into literally hundreds of lines of code per router to create peers, advertise prefixes correctly, secure connections, and pass traffic using specific paths.

Imagine copying this kind of configuration onto router after router and having to remember exactly which IP address to change, which loopback to use, which ACLs need to be flipped, and which prefix-lists to modify. Automation makes this process dramatically more efficient and without the risk associated with a bleary-eyed engineer staring at a few dozen PuTTY windows.

We run into the same problem when building a full mesh iBGP network at scale. Every single BGP speaker in the full mesh needs the entire configuration. When you add in the complexity of additional routing protocols needed for reachability, this becomes an adventure in copying and pasting.

Challenges to Automating BGP

In extremely large networks, an IT department team may have a couple of sharp engineers using on-box EEM scripts or off-box Python scripts. Historically, however, most network devices haven’t supported much in the way of on-box or off-box programmability options, so network automation never took off.

This is finally changing, though. For a while, network operators have been longing for an easier way to configure large numbers of devices with complex configurations, and vendors are finally responding. Though this is great news, there are some major hurdles to face.

  1. First, most networks run a variety of platforms, including some legacy devices and devices from multiple vendors. Creating some homegrown scripts to manage this type of environment is challenging to say the least, and as any programmer knows, maintaining those scripts through hardware refreshes and network changes is often neglected.
  2. Second, even if a network runs only a small variety of platforms from the same vendor, there can still be quite a difference in programmability options from software version to software version. device to device. For example, Cisco’s NX-OS offers on-box Python and bash as well as open APIs. Cisco IOS-XE is moving in that direction, but it’s not where NX-OS is yet.
  3. Third, the sheer cost of developers needed to create and maintain custom scripts for the gamut of devices we might be running doesn’t solve the problem of inefficiency and high operational expense. In fact, it may even exasperate it. What we need is a dynamic solution that covers a variety of platforms, a variety of software versions, and a variety of programmability options.

This is why I love what NetBrain is doing. They don’t build their own switches or network operating systems: they focus on the overlay to manage everything – and I mean everything – without suffering through manual configuration. Therefore, they aren’t beholden to any particular vendor’s operating system or to any particular automation mechanism. Built into NetBrain’s platform is the ability to programmatically interact with devices from dozens of networking vendors and hundreds of platforms. And as network vendors continue to adopt the network automation paradigm, NetBrain’s automation platform will only become more robust.

Overcoming Challenges to Automating BGP

Take a look at the screenshot below of one of a built-in automated BGP workflows, called a Runbook. Notice specifically that with one automated workflow contained in a single Runbook, I can easily check a variety of real-time BGP information across many devices at once. Normally this requires logging into each device and studying the output of each show command one by one. The built-in Runbook gives me that same visibility into the network as well but programmatically. And keep in mind that this is a basic built-in Runbook. You can create custom and much more elaborate workflows as well.

BGP workflow runbookWith Runbooks, you easily check a variety of real-time BGP information across many devices at once — instead of logging into each device and studying the output of each show command one by one.

This is all well and good, but a BGP design like the one I mentioned earlier with layer upon layer can be hard to visualize even with all these outputs at our disposal. The multi-layer Visio diagrams network engineers have to contend with don’t help much, either, because it’s difficult to cram so much information into a usable diagram.

NetBrain’s automation abstraction is visualized using Dynamic Maps, a core technology along with Runbooks. Dynamic Maps aren’t a fancy Visio, though. They are, as the name suggests, dynamic in that they represent the state of the network in real time. Every node is interactive, which means all the information we would typically cram into tab after tab of an outdated Visio diagram is available on one screen. From a Dynamic Map an engineer can discover new devices, get almost instant real-time visibility, execute built-in or custom Runbooks, or drill down into individual devices if necessary.

Notice below that iBGP neighbors are denoted automatically by a dotted line after an automated network discovery. NetBrain built this dynamic map for you, and each individual item including links and devices can be analyzed further.

iBGP mapIn this screenshot from NetBrain’s Trial environment, BGP neighbors are denoted automatically by a dotted line after an automated network discovery.

If you need to focus on just a single router such as BGP-R1, it’s just a matter of selecting the device and choosing BGP configuration from the menu to get an automatic output.

BGP R1 details

Troubleshooting BGP neighbor relationships usually starts with checking configuration errors such as misconfigured eBGP multihop. Notice in the graphic below that running the built-in BGP troubleshooting Runbook programmatically checks for several common configuration errors across all devices in the network with one click. Here we can see that NetBrain discovered a misconfiguration on router BGP-R3.

BGP misconfig error

Though BGP can be a bear to configure and troubleshoot, it’s nevertheless a very powerful routing protocol and unlikely to be replaced any time soon. The fact that it’s so customizable makes it very useful especially at scale. We just need a better way to manage it.

Experiment with NetBrain’s core technologies using the Instant Trial in order to see firsthand how automation can change enterprise network operations. The BGP labs aren’t nearly as scary as the example I gave earlier, but they do showcase how powerful Dynamic Maps and Runbooks are in managing a BGP domain.

NetBrain automates what we did manually for years. The old way doesn’t make sense anymore, and really, it never did. Automation through Dynamic Maps and Runbooks reduces the operational expense of managing an enterprise WAN running BGP without losing any of the benefits of our most beloved routing protocol.

Related