Only Days to Prepare
Over the past week I’ve had a lot of time to think about how things are done when everyone’s at home. The logistics of many things, not considered before, suddenly seemed big.
The home office is not the corporate office. In my case, my home office includes barking dogs and the chaotic noises of a home. Yet all of us still have work to do, projects to manage, the day-to-day to run.
With everyone at NetBrain now working remotely, I am now entirely reliant on our corporate remote access solution. I need it to collaborate with my team. I need it to access my data. I need it to view my wikis and to work with many of my applications. I need it now and our entire business needs it now. And we all need our remote access to perform well.
When the decision was made to go remote beginning on March 16, we had less than a week to prepare. Both our IT and network automation teams went into scramble mode to prepare our network for this rapid shift in workforce dynamics.
After a couple of long days of design analysis and topology review, our IT network engineers determined that we would have three areas of concern related to the performance of our remote access network.
1. Increased traffic at our corporate Internet circuit at our HQ
Our network engineers reviewed our corporate network architecture and felt that we were ok based on their traffic forecast numbers. But they were unsure about visibility for all our Internet edge infrastructure. Maybe we’ll use far more video than was predicted? Our IT team has valid concerns on bandwidth, network device memory and CPU, network link health, and other performance factors. These metrics need to be tracked and baselined.
2. VPN user path verification
VPN user path verification is a common concern that has bitten many of our engineers in past roles. Our team reviewed our inbound and outbound access-control lists on our Cisco ASA firewalls for the VPN user flows and compared them with the equivalent ACLs for local corporate network users. Otherwise, we’d have a ton of employees saying that they can’t access X or Y applications like they could last week from the office. To be successful, we need to consider every application path and flow.
Also, what if VPN user paths have some other characteristic that is different from the normal corporate user path that we hadn’t checked? MTU, bandwidth, QoS differences, a smaller performance firewall in the path? We knew it would be necessary to perform ongoing path verification for multiple applications and other end-to-end paths.
3. Greatly increased VPN traffic
Perhaps of greatest concern was the great unknown of how much VPN traffic. Based on forecast calculations, our IT team felt we were prepared to handle the onslaught of VPN traffic – the encrypt/decrypt packets per second rate, new sessions/second, concurrent sessions, and so on.
But how do we really know if the forecast will match our new reality? Can we diagnose and identify the problem quickly during the event of a performance issue? Or will our IT inevitably be sending out emails apologizing for the hours of disruption to an entire office of remote workers while they troubleshoot our VPN performance issues?
Adaptive Network Automation to the Rescue
Thankfully, our network automation team is not just experienced in building NetBrain automations but are seasoned professionals with a wide range of multivendor networking engineering expertise. So, they were able to codify their knowledge into repeatable automation, allowing them to build a Remote Work Toolkit comprised of NetBrain automations to help our internal IT team with this sudden shift.
This is where the flexibility of a solution is critical. I’ve used tons of tools over a couple decades as a network engineer and so many are difficult to use and rigid, or not intuitive and frustrating. How do you prepare for the unpredictable? Flexibility and adaptability are the strongest traits when considering a NetOps automation solution. Ease of building automation is the key to its adoption.
Baselining the Internet Edge
Troubleshooting the Internet edge is relatively straightforward – we concern ourselves with platform health (router, switch CPU, memory, other metrics) and link health (our Internet circuits, our southbound connectivity), and maybe some other sub systems (firewalls, load balancers). But it’s still a lot to keep track of.
Here’s where the NetBrain Golden Baseline comes in. Powered by clever AI algorithms, our platform can baseline any conceivable network data type, from a numerical variable to a string to a full table of rows and columns. Discovering, tracking, and baselining thousands of data points in your environment enables engineers to quickly know if something is normal or unexpected.
After adding a number of newly baselined and tracked performance metrics related to our Internet routers, circuits, edge switches, and firewalls, our automation engineers were able to provide IT with holistic views and granular performance baselines.
The ease at which this was done impressed me; despite the large number of individual CLI commands and associated data points, this was put together in a matter of hours! Our IT team now felt confident that in the event of a problem, they can quickly identify or rule out the Internet infrastructure as a culprit.
Analyzing our VPN User Paths
This has happened to me when working remote – I try to get to an app I haven’t used in a while and it just times out. Is it down? Or is it because I’m on the VPN?
The NetBrain automation team was able to populate our critical business application paths, enabling what we refer to as the tracking of pre-defined “Golden Paths.” This enables our IT team to easily compare historical data against current, live paths using the NetBrain Path Calculator.
The Path Calculator feature is comprehensive and quick, checking forwarding operation logics on every device in the path, verifying MPLS, NAT, Layer 3 forwarding, IPsec, firewall security policies, and much more in mere seconds.
Our IT team can use the Path Calculator to quickly assess the full VPN user path, any firewalls and associated security policies in the path, and other link characteristics that may affect performance such as MTU, bandwidth, QoS configurations.
Will Our VPN Handle the Flood?
I’ll be honest, I only work from home occasionally. Until now. Having the full workforce on the corporate remote access solution is not something that many companies have considered in advance.
To help our IT team manage and troubleshoot the VPN, and have the tools needed to quickly isolate root cause of a performance problem, our automation engineers populated our Remote Work Toolkit with a series of Data View Templates designed to collect and visualize pre-defined sets of VPN-related network data.
This is the perfect answer to our IT team’s VPN performance management concerns. Data View Templates allow you to mix and match any conceivable piece of network data, whether it comes from a CLI command or from a third-party integrated network management tool. VPN design is the ultimate use-case for this type of automated diagnostic collection and visualization given the sheer complexity of what can go wrong when VPNs encounter problems.
Our IT team now has real-time access to baselined datapoints, allowing them to drill-down into detailed VPN connection tables, track the numbers of VPN sessions and total counts of connections to support trending analysis, dynamically map all of our VPN enabled devices, or instantly scrape a slew of other helpful real-time diagnostics from our VPN devices.
Ready for the Unpredictable
Thanks to our fast-moving IT and network automation teams, we have a robust set of automation and visualization tools to give total visibility and the tools to fix things in case we overtax our suddenly crucial VPN infrastructure.
The beauty is that we accomplished our VPN readiness using our core adaptive automation elements –
- Baselining to know if this is normal
- Path Analysis to drill down into real-time, hop-by-hop forwarding operations
- Data View Templates to automatically collect and display all the info I want to see, layer it on the map as a single data layer, and enable me to quickly identify problems
Meanwhile, I’ll keep doing my best to overtax our VPN system. And try to keep my barking dogs off our conference calls.
What access to our newly built Remote Work Toolkit? Please contact our NetBrain support team for the automation kit and associated documentation –
Email: [email protected],