SNMP is an Internet Standard protocol for collecting information about managed devices on IP networks. SNMP became a vital component in many networks for monitoring the health and resource utilization of devices and connections. For a long time, SNMP was the tool to monitor bandwidth and interface utilization. In this capacity, it is used to detect line saturation events caused by volumetric DDoS attacks on an organization’s internet connection. SNMP is adequate as a sensor for threshold-based volumetric attack detection and allows automated redirection of internet traffic through cloud scrubbing centers when under attack. By automating the process of detection, mitigation time can considerably be reduced and volumetric attacks mitigated through on-demand cloud DDoS services. SNMP provides minimal impact on the device’s configuration and works with pretty much any network device and vendor. As such, it is very convenient and gained popularity for deployments of automatic diversion.
In recent years, though, DDoS attacks evolved and attackers became more automated. The typical sustained high volume flood attacks have evolved into more effective, automated attack campaigns consisting of repeated short, high volume bursts of changing attack vectors. These burst attacks, also known as hit-and-run DDoS, consist of several very short bursts, lasting only a few seconds, at random intervals.
Monitoring bandwidth utilization with SNMP
The SNMP protocol provides access to a (network) device’s system counters collected in a Management Information Base (MIB). A MIB contains a set of hierarchically-organized Object Identifiers (OID) describing the accessible objects. Interface counters are described in the Interface MIB (IF-MIB) as a number of octets (bytes) in and out. These counters can be 32-bit (ifInOctets) or 64-bit (ifHCInOctets) and are rolling over when they reach max integer range. For interface bandwidth monitoring, these byte counters are collected on regular intervals and averaged over time. Typical intervals for querying interface counters are expressed in orders of minutes.
To calculate the ingress bandwidth utilization at time T1, the byte count at time T0 is subtracted from the counter at T1, giving the number of bytes received by the interface over the sampling period (T1-T0). Dividing this number by the sampling period in seconds and normalizing it to the available bandwidth or interface speed expressed in bit per second (bps), the bandwidth utilization at T1 can be calculated. The below formula illustrates this:
Evading SNMP-based threshold detection
Using burst attacks, attackers found a way to evade SNMP-based detection systems. Not by limiting traffic levels or volumes, but by shortening the attack duration.
Consider a router connected with a downstream internet capacity of 1Gbps. The SNMP detector is configured with a periodic sampling interval of one minute and will detect volumetric attack events based on a threshold when total ingress bandwidth is above 900Mbps (90% utilization). Now consider a single burst, part of a larger burst attack, of 10 seconds at full saturation level (100% utilization). This case is illustrated in the picture below.
The blue area corresponds to the actual ingress bandwidth utilization while the green areas represent the SNMP measured bandwidth utilization. The area below the blue curve and the area below the green line are equal for every polling interval.
It is clear from the picture that the attack burst will not be detected as a saturation level event. The net impact of this attack (1Gbps for a duration of 10 seconds) on the SNMP-calculated bandwidth is 10sec/60sec * 1Gbps = 167Mbps. It is clear that the level monitored by SNMP does not come close to the 900Mbps threshold defined earlier.
Reducing the polling interval (higher sampling rate) for SNMP will have a positive effect on the detection sensitivity. But by how much do we need to reduce the polling period before the system can adequately detect small bursts of 10 seconds?
For the sake of illustration, let’s bring down the polling period of the previous example to match the duration of the attack itself, so 10 seconds. From the two illustrations below, one can clearly see that if most of the burst happens within the polling interval, we will detect a saturation level event. However, if more than 10% of the attack falls outside of the polling period, it will not be detected within that period. You might notice that if only 10% falls out of the sampling period, this means that 90% of the attack will fall in the subsequent sampling period, as such we would detect a 90% saturation in the next poll. If more than 10%, and in the extreme case 50%, overlaps the polling periods as illustrated in the right illustration below, the attack will not be detected.
So, what would be the adequate polling period for SNMP not to miss an attack?
Applying Nyquist to SNMP sampling
The Nyquist Sampling Theorem is fundamental to the digital signal processing field. Nyquist’s Sampling Theorem states that:
“A bandlimited continuous-time signal can be sampled and perfectly reconstructed from its samples if the waveform is sampled over twice as fast as its highest frequency component.”
Or put in another way: if you want to transform an analog signal into its digital representation, the sampling period needs to be half the period of the highest frequency component in the analog signal.
Now consider the incoming traffic curve (blue area in our illustration above) as a continuous signal we would like to transform to a sampled representation, a representation that would allow us to detect a burst attack. We are not interested in the actual form of the wave, merely be able to detect its level with certainty. So higher order frequencies that make up the waveform can be safely ignored and the period of the wave we would like to detect can be considered as the time between the rising and falling edge of the attack. In that case, we consider the duration of the attack as the period of the highest frequency component in the signal we want to reconstruct. Given this, the sampling period for SNMP should be half the duration of the smallest attack burst. Or in other words:
The smallest burst attack that can be detected using SNMP is twice its polling period.
In the case of our example above, to detect the 10 second attack burst with 100% certainty, we need to increase the polling rate of SNMP to collect a new sample every five seconds.
Staying ahead of the bad guys
Polling from the cloud every five seconds might not be the way one wants to build its attack detection. And even if one does, it is limited to detecting attacks where the smallest burst is no longer than 10 seconds. What to do when the burst is six seconds, or less? The SNMP polling method simply does not scale for the detection of burst attacks and we need to move away from pulling analytics to real-time, event-based methods.
On-box RMON rules with threshold detection, generating SNMP traps, provides one alternative without introducing new technologies or protocols. However, what is possible in terms of detections and triggers for SNMP traps will depend on the capabilities of your device. That said, most network equipment manufacturers provide performance management and streaming analytics that by far exceed the possibilities of SNMP. Now would be a good time to look at those alternatives and implement an on- or off-box automation for attack detection and trigger traffic redirection through API calls to the cloud service.
The off-the-shelve alternatives to a ‘roll-your-own-detection’ that have been battle tested and readily available are Hybrid DDoS and Netflow-based monitoring. Hybrid DDoS Protection covers far more than just volumetric and burst attack detection and provides the highest level of integration and confidence of detection with real-time messaging to the cloud.
As attackers are getting more mature and their attacks automated and increasingly complex, we need to continue to evolve our solutions and architectures to stay ahead.