The following article is part 2 in a 3-part series about troubleshooting DDoS attacks, and it’s a guest post by Matt Conran of Network Insight. In this article, Matt covers the challenges to the manual approach of DDoS troubleshooting.
A DDoS (Distributed Denial of Service) attack is an ambush to alienate the online services by enormous traffic from numerous sources. In this cyber-attack, the executioner employs more than one unique IP address. So, when we think about DDoS the foremost elements that come to our minds are the detection and mitigation components. After all, this is where the money is getting pushed and where the new fancy features are being introduced. However, it’s the entire end to end solution effectiveness that stops the DDoS on time.
The missing piece of the puzzle is the troubleshooting. Many perceive that mapping a network and troubleshooting is a technical task requiring only the hands of an engineer. But there is an entire company process that needs to integrate together, leading to the final hands on configuration work, thereby stopping the DDoS.
The procedure involves the coordination of people from different teams and technical backgrounds to efficiently blend and form a solution. This is not something that happens automatically or by chance. It needs to be rehearsed and accurately controlled. One of the biggest problems with DDoS attacks is the lack of preparation. Lack of preparation leads to panic and then nothing is going to be fixed if that sets in.
All the technical experience in the world is not going to help you unless the cultures between the teams are functioning efficiently. And with a DDoS event, you will certainly need multiple teams operating together.
The Art of Networking
The art of networking has led to thousands of different network designs often referred to as unique snowflakes. In SP network, it often provides the same end goal connectivity requirement, such as an L2VPN and L3PVPN. However, the type of network design varies considerably from Provider to Provider.
Many designs are left on the heads of the designer when it should be in a central repository, tracked with changes. Troubleshooting varied network designs and configurations become tough when you are getting DDoS attacked with volumes reaching Terabyte scale.
Remotely-Triggered Black Hole (RTBH)
Remotely-Triggered Black Hole (RTBH) is one of the most common ways to mitigate an attack. It uses Border Gateway Protocol (BGP) within the network and installs rules into the forwarding place to block the destination to mitigate the DDoS attack. It essentially completes the DDoS attack on behalf of the attacker. RTHB has been useful in the past but one of its flaws is that it combines both, the mission mode routing and the security functions on the same device.
Firewall rules that are used to block an attack are placed into the network device that is performing the core role of routing traffic from point A to point B. So from a technical standpoint, we already have challenges. To compound this further, more than often we have to combine two teams to work on the same device, both security and network. From an operational standpoint, the most common approach to DDoS mitigation is mixing two technical hats on the same device.
At the peak of a DDoS attack, team collaboration is critical and the existing DDoS mitigation such as RTBH can pose team coordination challenges. To efficiently protect against DDoS, a solution can never be viewed singular from solely the technical viewpoints. It’s the entire troubleshooting process and collaboration of teams that successfully wins the race.
The Ability to Look at Everything
For efficient DDoS protection, you need to examine all the layers of the Open Systems Interconnection model (OSI) stack. It’s not just about one layer anymore; the cyber criminals are using multiple layers to penetrate into a network. Parallel attacks are often used combining both, the volumetric Layer 4 with the Layer 7 application attack. So essentially, we have two layers of the stack that need troubleshooting – Layer 4 and Layer 7.
This could also potentially involve two separate teams – Networking team for Layer 4 and an Application team for Layer 7. Combine in a few firewalls and load balancers we have a nice mix of dispersed team interaction. This type of collaboration needs to be streamlined and coordinated efficiently.
Manual Approach Example
The manual approach to DDoS contains a number of steps before actually getting to the source of the problem. All of these may need to be carried out by different people, teams, places and times. Mostly surely the individuals involved won’t be sitting next to each other. Depending on the attack and how it was signalled is what counts. If it’s an application level attack, an administrator may need to view individual server’s logs by manually logging in or via a central Log Analyser.
Another administrator may have to trace through a group of machines and issue commands like “Netstat ” to determine what connections are open on that server or appliance. Another administrator has to check if machine load is high or determine the number of HTTP Processes running. A Linux or firewall administrator has to create custom scripts and run it against IP Tables to determine other types of security information.
A Note on Custom Scripts
Custom scripts created by individual engineers are great for one of jobs but if you want it to be automated among teams, things get trickier. The person who creates the script is usually the one who manages the script. What happens when this person leaves or if there are bugs or changes that are required?
The additional work that is beyond that engineer’s realm gets pooled into his normal day to day operations and will eventually be left to the side. It’s far better to stockpile all your scripting in a centrally managed database where no single person is in charge. DDoS attacks are evolving rapidly. The type of required random changes ( TCP – > UDP ) would be a simple command on the C2 server. Trying to track all this with custom scripts is a disservice to you and your company.
A more advanced engineer may need to be called to look at the HTTP host headers or session tables on a load balancer to determine if it’s a more sophisticated attack. Layer 7 attacks do not exceed packet per second, so we need to go deeper than the standard 5-tuple and determine intermittent problems.
At times we may need to go to additional lengths to examine congestion counts, congestion windows, TCP RTT or other advanced features. On a web server, we may need to look for specific headers to write to Null0.
BGP communities are often suitable for DDoS mitigations. BGP community is set to a route and then that route is blocked at the WAN edge. If set incorrectly, it would involve a lot of manual work to find out where the source of the problem is.
The manual approach to troubleshooting eventually needs to be carried out and then all the diagnostic information must be pooled together somehow into one place as to make a decision of what to do. Trying to repeat these steps with the same people without automation is not the way for smooth operations. There are a lot of different steps involved that require many different teams, combining many different technology hats.
Some of these devices required for troubleshooting may even be on different networks with different logins. It could be the case that you need to file a troubleshooting ticket even to get on the device to start the troubleshooting. We already have our hands tied and DDoS is already winning by a long mile. This type of troubleshooting is doing a disservice to you, especially when we are already in a cat and mouse game. Combining teams, technical hats and skill sets is a hard task and requires a good team and project management. Instead of leaving faith to the Gods, it’s better to have a solution in place to make things work for you. So when a DDoS happens and it will certainly happen, you are ready with the click of a button to mitigate it.
NetBrain offers a unique approach to network automation thereby bridging the gap between DDoS detection and mitigation. The ability for NetBrain to integrate with both detection and mitigation systems completes the missing piece of the Denial of Service (DDoS) puzzle. It is not the mapping company but the automation company that leverages Dynamic Network Mapping, Executable Runbooks and rich Integration framework of APIs and workflows to troubleshoot any type of DDoS.