BGP perforating wound
It was an ordinary Wednesday on 4.04.2019. Except that at some point of the midday timeline an AS60280 belonging to Belarus» NTEC leaked 18600 prefixes originating from approximately 1400 ASes.
Those routes were taken from the transit provider RETN (AS9002) and further announced to NTEC«s provider — RU-telecom«s AS205540, which, in its turn, accepted all of them, spreading the leak.
As a result, some services in Russian internet segment were disrupted for almost half an hour, as this particular loop constricted traffic of the biggest content providers, such as Yandex, whose prefixes, leaked by the RU-telecom, were also further accepted by the Megafon — on of the 3 biggest carriers.
NTEC«s AS60280 leaked those routes, by most odds, accidentally — it never happened before. The better question is why AS205540 accepted a leak from one of their customers, spreading the majority of it among many other ISPs.
This incident has hugely spread across Russian operators, and we can highlight the most interesting ones:
After 25 minutes the misconfiguration at the AS60280 was resolved, setting things back to normal though it was enough for Russian mass-media to discover the outage of probably the biggest regional content provider through one of the largest carriers.
As seen in the previous table, most of Russian top or well-known operators accepted this incident. However, if we look at the news, only Megafon users suffered this incident. It turns out, that as we take a look at Yandex prefixes, the wrong routes with them have a limited distribution. Why?
Two main factors are helping to choose between routes. First — from which direction this route has come. Quite often operators set a different preference on customers links among all others. Second — if a preference is identical, then the length of AS_PATH comes into play. The route with the smaller AS_PATH wins. So, if you have good connectivity in a region, your good routes should win most of the time, since a leaked route would contain additional ASNes in it, and thus be longer.
If we look at this incident, Megafon was a direct provider of AS205540, which accepted the leak, so both rules come to play. From the other side, if we look at our recent report, you can see that Yandex have the highest number of peering partners in the region. So that is why the acceptance of leak was so limited for them.
Previously, we already mentioned that a route leak could be actively countered, and as recent as a month ago we described the path to eliminate such events though we have never told ISPs and their customers, operating an AS, what they could do to mitigate the adverse leak effects. Improving your connectivity through peering is a viable way.
Finally, we still think that this event was exaggerated by the fact that everything happened at the Not Found page day. Next year, April 4 is going to be on Saturday — we recommend everyone to treat it as a fully working day.