Wednesday, September 3, 2008

On inferring autonomous system relationships in the Internet

As its name suggests, the purpose of the work is to infer relationships (peering, transit etc.) between ASes in the Internet, by analyzing BGP routing tables. I have to quickly admit that I do not understand the purpose of such an endeavour. The author hints at the role that such relationships play on the "structure of the Internet and the end-to-end performance characteristics", but its not clear how elucidating the actual relationships between ASes will improve that understanding. The author does give two other examples of using this information in the conclusion, although I think the paper would have been better served by putting it in the introduction in order to better motivate the problem. Anyway, the two examples are that "ISPs can reduce misconfiguration and debug router configuration files" and to "plan for future contractual agreements."

The work uses some interesting heuristics. First is the heuristic that the size of an AS can be determined by the degree of its (logical) connections. Second, the work assumes a reasonable route export/import policy based on the nature of the relationship between ASes. This assumption is then used to prove a key theorem, that is, all AS paths in any BGP routing entry satisfies the following:
"(i) A provider-to-customer edge can be followed by only provide-to-customer or sibling-to-sibling edges;
(ii) A peer-to-peer edge can be followed by only provide-to-customer or sibling-to-sibling edges"

The basic algorithm looks at each AS path, and guesses the top provider to be the one with the highest degree. Pairs of routers before (after) the top provider are then inferred to be either sibling-to-sibling or customer-to-provider (provider-to-customer). Sibling-to-sibling are simply connections where a transit relationship occur in both directions. A more refined algorithm makes use of soft information, in that each AS path provides evidence in support of a transit relationship, but a minimum number of such evidence needs to be present. A separate step is used to infer peering relationship based on certain heuristics, which I admit I do not fully understand (hence a suitable point of discussion =p).

It was difficult to make sense of the experimental results. In V.A., the main claim is that the numbers are fairly consistent over 3 different readings, but perhaps what this is saying is just that given fairly similar routing entries (how similar??), the algorithm would not give wildly different results. The results presented in V.B. could also be misleading. For example, even if confirmation of customers is 100%, it is just saying that there were no false positives of identifying of customers; there could have been a missed detection (please correct me if I'm wrong).

One possible extension is a refinement of the refined algorithm. Instead of using a simple count, perhaps some notion of either AS path reliability can be used in a weighting. Also, as the author pointed out, the heuristic that a top provider is one with the highest degree may in fact not be true. Is some kind of consistency check among the AS paths able to more reliably find the top providers? At the end of the day, it might boil down to the algorithms doing a effective one pass through all the AS paths, whereas an iterative inference algorithm could have sorted things out better.
highest degree - top provider

No comments: