Thoughts on network papers: 2008

Thursday, November 20, 2008

Scalable application layer multicast

This paper adopts an application overlay approach to multicast. The main focus of the paper is in building a suitable hierarchical structure of nodes such that the average case control overhead is O(1) and worst case is O(log N), where N is the number of nodes. (There is also a parameter to control the size of each cluster that can creep into those numbers). Care is taken to ensure that the parent of a set of nodes is about equally "far" away from them, and that when nodes join or leave, this is maintained, along with the size of a cluster of nodes (a cluster is like a set of nodes for which its parent is responsible for distributing data to), and that a tree structure is maintained (to avoid loops/data duplication).

The host joining algorithm reminds me of a tree VQ - a node finds the "right" cluster by going down the tree, and at each layer, selecting the cluster with the "closest" cluster head. However, it is not clear how the cluster leader is selected when there either changes in membership or in network conditions. It probably cannot be the cluster leader that determines when to relinquish, since it has an incentive to cheat in return for lower latency. Do the nodes do some form of voting based on the heartbeat messages? The adoption of allowing a node to immediately receive data from the current layer at which its querying is a nice touch to reduce latency of when data first starts flowing.

So this seems to be more of a framework that describes "best practices" for building a multicast tree, rather than a protocol that can be used off the shelf. What would be the best way to implement the ideas here such that a content provider can use it with minimal fuss?

I would be concerned with overhead traffic, since building and maintaing the tree require a fair amount of control coordination. The simulation results, does not quite agree with the theory that the average control overhead is constant with number of nodes. The standard deviation grows, but that is to be expected due to the worst case - in fact that should be the number to be reported. The other comparison I would have made is with an "ideal topology" for each set of conditions - i.e. when nodes leave/join, the "ideal topology" is one in which a tree builder, given central control, would have built. This would be more meaningful than IP Multicast to judge how good NICE is at maintaining good hierarchies.

Wednesday, November 19, 2008

A reliable multicast framework for light-weight sessions and application level framing

Despite the grand sounding title, this paper is really about repair mechanisms that are suitable for a multicast applications, with the whiteboard application as a specific example. Some means of doing multicast is assumed, and the paper does not quite specify how that is done. The main issues are to prevent the sender from seeing an "ACK implosion", and to exploit the fact that there are multiple receivers. Two key design choices are (i) after loss detection, the receiver should only request repair after about a fixed time that depends on its distance from the sender; (ii) that time should be randomized to prevent synchronization of repair requests. This allows receivers with lost data to benefit from other repair requests from nodes closer to the sender.

Honestly, I think the paper is a little long for what are two simple, though useful, ideas. I was really expecting something else, especially with the title and the rather general introduction. No rough gauge of timings for the whiteboard application is provided. I wonder how interactive the application can be if it relies on such a repair mechanism. To me, a countdown timer that depends on how far a receier is from the sender is one alternative; what about something that depends on how far one is from the fault (if that can be determined).

I also didn't think its right to just brush the actual implementation of multicast under the rug. If anything, how to make repairs should depend on the actual implementation. For example, if a tree-based multi-cast is used, why can't a node simply forward a repair request to its parent as soon as it detects a loss - the parent can suppress the request if necessary, and so the receiver still would not be overwhelmed by repair requests.

Monday, November 17, 2008

A delay-tolerant network architecture for challenged internets

This paper tries to address the poor performance of Internet protocols in "challenged conditions", such as those with some combination of low bandwidth, high delay, frequent down-time, long queuing delay. These causes problems due to mismatches with some of the implicit assumptions made in developing these protocols. The author propose an overlay network architecture that relies on existing links/protocols to implement a store-and-forward solution. DTN gateways control access to regions, and either bridge regions or provide a way to communicate with other nodes that are not in the same region.

This sounds very similar to an email setup. A sender sends emails to a mail gateway, which then routes it to the mail server of the receiver. Finally, the receiver pulls the email from the mail server. In this case, the DTN gateway would push the message to the receiver - isn't that what RIM does for Blackberry? Another difference, in the DTN architecture, hosts can send data directly if they are in the same region.

An architecture for internet data transfer

In this paper, the authors propose an abstraction for data transfer. Control logic still stays with applications, and in particular, communications for control logic is separated from the data transfer. On the other hand, data transfer can be initiated as a separate service, where the application do not have to worry about the actual means of transfer, just that the receiver will eventually get the data transmitted by the sender. By providing this abstraction, the authors hope that various types of transfer techniques can be used and adopted widely.

To me, this is one of those papers that sounds like a good idea at first, but a little further thinking makes overturns that initial impression. I don't think all applications would want any kind of data transfer service; each application would have its own specific need, and would probably need to choose an appropriate type of transfer technique. Web surfing, downloading or video streaming would each require different types of services. One possibility would be for the application to specify its preference, but that defeats the purpose of the abstraction. Besides, applications can simply link against libraries of the desired transfer.

I'm also not sure how this would incorporate data streaming. Right now, the entire piece of data needs to be provided by the application before a hash can be computed for its ID. Obviously this is not possible for live streaming, where the application would not even know what the data is ahead of time, and the receiver wants to get the data when its available. One possibility would be to consider each chunk a separate piece of data and use the construct here, but that's clunky.

However, benefits such as load-balancing over multiple paths and caching of duplicate data does seem tempting. Perhaps a different abstraction that provides these benefits while letting the application choose the type of transfer would work better?

Thursday, November 13, 2008

X-Trace: A pervasive network tracing framework

This paper presents an architecture for performing logging of network requests and responses, by inserting task-identifiable metadata into the request. The catch is that all the network elements/protocols that lie in the paths of this would need to modified in order to handle and propagate such metadata. Logging of data is handled by individual devices thus ensuring that sensitive information is not shared unwittingly, while the task id in the metadata allows for cooperation across domains. A (more than) proof of concept implementation is also presented.

Given that there is a hit in performance, no matter how small, I wonder if X-Trace would only be used mostly for debugging, rather than for logging of all network operations. Even if devices were X-Trace enabled, they would only propagate metadata when it is present.

As the paper mention, a user study on how much such a tool improves fault finding would provide much motivation for deploying such an architecture.

With respect to security, how easy would it be for a malicious node to hide its activity by propagating modified X-Trace data such that everything seems normal? This could make it difficult to pin-point where the true problem lies.

Wednesday, November 12, 2008

End-to-end internet packet dynamics

This is an interesting (but long) paper on determining end-to-end performance in the Internet. Analysis is done on packet traces of TCP transfers between pairs of end-nodes, where both sender and receiver are under the control of the author. Knowledge of the TCP protocol is also used to undo its effects in order for analysis of the underlying network to be performed.

If there is one key lesson from this paper, it is there there is no single number that describes characteristics such as percentage of out-of-order packets, bottleneck bandwidth, packet loss rates and jitter. These change over time, and depend on sites, network elements, TCP implementation, among others. For an internet application, this probably means constant probing and monitoring of such metircs.

The paper is generally full of interesting tidbits, although I cannot say that there is a consistent theme throughout. Among the many things I learnt was how to robustly measure bottleneck bandwidth.

I wonder if someone has did a similar thing for UDP packets, which might be interesting for streaming applications. While the author points out that TCP has this "harm-no-others" property, is that really a good reason for not looking at UDP? I think a practical reality is what rate to choose to send UDP packets at, since obviously the characteristics would be very different at different transmission rates.

Wednesday, November 5, 2008

Internet indirection infrastructure (i3)

The basic idea in this paper is as follows. Instead of a sender directly transmitting packets to a receiver, the receiver first indicates through a trigger what data id it is interested in and where to send it to, while a sender simply sends data conceptually to a data id. In practice, the sender first looks up which server is responsible for handling triggers corresponding to a data id by using some DHT (Chord in this paper) and sends it to that server. An extension is where the data id and receiver addresses can be stacked in order to provide other functionality such as third party processing of data (similar to the outsourced firework?). All these takes place on an overlay network.

Naturally, a key question is how performance is impacted by such an arrangement. Simulations show that latency goes up by about 1.5-4 times, depending on the number of samples used in a heuristic to find an identifier with a closest server. This is for the case when the sender already knows which forwarding server to use. For the first packet, where the sender needs to look up the server, the latency can be up to 15 times larger.

As before, who participates in the DHT for trigger lookup? All the folks who wish to use i3, or a dedicated set of servers? I suppose to figure out what id to use for a certain content, users would have to use some out-of-band communications.

Middleboxes no longer considered harmful

In this paper, the authors propose an architectural solution to accommodating "middleboxes", or network elements that violate certain Internet layering, such as NATs and firewalls. They argue that these middleboxes are likely to be useful even with migration to IPv6, for reasons such as the need to maintain private address realms or to outsource firewall operations. To do this, Delegation-Oriented Architecture (DOA) is proposed, which sits on top of the IP layer.

From my understanding, each host is given a unique 160bit identifier (a cryptographic hash?) which makes each host uniquely addressable. EIDs can be resolved through the use of DHT to either an IP address of a delegate or another EID. Essentially, the architecture allows a sender to address a packet using an EID (of the receiver), while the receiver can delegate responsibility for handling/forwarding packets for itself to some intemediary (addressed by either EIDs or IP address). I am tempted to think of this as providing a mixture of routing and DNS on top of "normal" Internet.

What I am not clear about is who maintains responsibility for operating the DHT for EID resolution. I am also torn between whether this is meant to be operated among small network of users (i.e. each group of hosts could operate their own DOA) or is meant to be a netwide deployment, where everyone would share the same EID namespace. In the latter case, how would DHT scale?

Overall, it is a very intriguing idea. Does it have additional value over IPv6? I'm not sure if I would agree with the authors' claims that it does. Outsourcing of firewall functions can probably still take place without DOA, though it is likely that NATs will still have value (multiple hosts sharing a single internet connection).

Monday, November 3, 2008

DNS performance and the effectiveness of caching

This paper presents a study in which traces of DNS lookups and responses, as well as TCP starts and ends, were collected for analysis and simulations for testing out of ideas. To be honest, it is not clear to me why some of the analysis they would be interesting, but it is probably just me unable to appreciate what they have done. The more interesting portions are the experiments where the traces are used to drive simulations in which various parameters or ideas can be tested. The main results seem to be that reducing TLL of adress records down to a few hundred seconds has limited ill effect on cache hit rates. Also, while sharing DNS cache among clients might be useful, it saturates at about 10-20 clients.

One concern I have is with the time period in which data was collected. Its not clear why data was not collected over the same time period (i.e. the same week) in two different years (perhaps a deadline?) The number of students/users on campus on Dec 4 to Dec 11 (during the regular semester) would be different from Jan 3 to Jan 10 (corresponding to an optional month-long independent activities period). The number of lookups is higher in mit-dec00, but of course this could be in part due to increasing hosts and websites. The other problem with different periods would be that the type of students could be different, e.g. more international students staying behind in Jan, which might contribute to a different set of domains being looked up. This is of course speculative, but it would be more convincing (to me) to keep it the same period, especially in some of the trends that the authors claim in the paper. With just two time-points that are not even at the same time of the year, I would not believe any of the trends stated in the paper.

The other issue for me is that the authors claim that in hindsight, certain collection could be done differently. For example, 128 bytes collected per packet is insufficient; how did this affect the analysis? Also, the collection of UDP flows as well could verify some of the claims made. I imagine that the collection of data packets is not a trivial task, but if all the collection software/hardware is already in place, what would prevent the authors from re-collecting the data? The only reason I can think of is bureaucratic hoops that the authors would have to jump through for each of the collection.

Development of the Domain Name System

This paper describes the design decisions behind DNS. It grew out of a need to change from a centralized hostname-to-address mapping (i.e. HOSTS.TXT) to a distributed system. I am rather amused (though not surprised) that the original centralized system is called HOSTS.TXT - for the name of the file that contains the list of hostnames and addresses. Since I was not too familiar with how DNS works, I would have preferred the paper to briefly describe the process of lookups instead of launching into why things are done in certain ways. I suppose that the intended audience is one who already know the lookup process.

The over-arching design aims seem to be distributed management and lean queries. Distributed management is mainly achieved through a hierarchical approach to delegating responsibility of maintaining the name space. A lean service is achieved by constraining the type of database operations that are permitted. Such a distributed database service reminds me of DHT, except that here, responsibility for domains need to be delegated since domain name administrators presumably want control over who can use its domain.

The other key feature of DNS seem to be caching of query responses, which prevents overloading of the root servers. There is however a tension between long caching (larger TTL) which allows for more reuse, and short caching (smaller TTL) which allows for more dyanmic DNS responses. Negative caching seems like a smart thing to do, since an attack with bogus addresses can potentially overload DNS resolver servers; I wonder if this is something that is done in practice these days.

Overall, I like this paper. It provides a historical context for the development of DNS, while explaining not just the design goals and approaches, but also talks about implementation issues and the lessons learnt.

Tuesday, October 28, 2008

Prof. R Jain talks at ACM MM

I'm at ACM MM 2008 (in Vancouver) right now, and the keynote talk is by Prof. Jain on Internet 3.0. Main poitns of concern: security, mobility, energy.

Monday, October 20, 2008

A first-principles approach to understanding the Internet's router-level topology

This paper tries to articulate the dangers of over-relying on probabilistic network topology models that matches macro statistics such as power laws on degree distributions. Its key observations (and shown in the experiments) are that different network topologies can have the same node degree distribution and that a graph that is "more likely" according to a degree distribution often have poor performance metrics due to the presence of highly connected hub nodes. The authors also tries to make the point that even when core routing topology stays the same, different end-user bandwidth demands would drive a different edge network topology leading to very different degree distributions - thus there may not be a single degree distribution for the Internet.

I think that the thesis of the paper does go deeper beyond the above arguments, but I had a hard time trying to grasp it (and may not still). The authors claim to be pushing for a new methodology to study and characterize Internet topology but they beat around the bush quite a bit. They state that technology and economic constraints are drivers for network topology, but the assertions made are qualitative statements, and are weakly followed through by the design of a "heuristically optimal" network which performs well but may not be "likely" according to a probabilistic toplogy model.

What I have hoped to see was a paper that says something along the following lines: if I have these technical constraints, and I want to satisfy certain economic conditions and some user demand (possibly heterogeneous), this is the kind of Internet network topology that I would see. There would be certain topology metrics that can be predicted and verified from collected data.

On power-law relationships of the internet topology

In this paper, the authors describe several power law relationships of quantities of interest in studying Internet topology, based on analysis of collected Internet data. If there are indeed such power laws, it would be useful in verifying or deciding between graph models used in simulations, in studying the theoretical performance of Internet protocols, or in modeling the evolution of the Internet. The power laws described in this paper are:

Node out-degree and its rank (in order of decreasing out-degree)
Out-degree frequency and out-degree
Number of pairs of nodes within h hops and h
Eigenvalue of connectivity graph and its order (in order of decreasing eigenvalue)

From these power laws, other relationships can also be elucidated, such as between edges and rank exponent and between average size of neighborhood within h hops and hop exponent.

It is interesting that such relationships exist, at least in the data that the authors examine. My biggest question would be how applicable are these laws over the years since this study was carried out. Related to this would be how time-invariant the exponents really are. My guess would be that the power laws would have exponents that are fairly constant over time, since they have to do with how hosts are connected to the internet through router, and the choices that are made are constrained by cost-effectiveness and hardware. Of course, if there was a major shift in technology or cost structure, this should also show up in the evolution of the exponents. The key point is that these laws should hold with just plain scaling.

I think that Section 5.1 would have been interesting to expand on. Namely, I would like to know if some of these power laws can be derived from first principles. As the authors pointed out, cost-effectiveness in designing networks can drive a common solution in network topology, hence leading to some of the power laws described in this paper.

Wednesday, October 8, 2008

XORSs in the air: Practical wireless network coding

The title says it all: the authors are interested in demonstrating network coding, a topic of much recent interest in network information theory, in a practical wireless testbed. Theoretical results have shown that in a network, allowing coding/mixing in the intermediary nodes achieves the multi-cast capacity in this network (while routing might not). As the authors node, the capacity and optimality of network coding is open for unicast traffic in an arbitrary network. As it turns out, that is also the scenario of interest in this paper. Another key difference is that a broadcast transmission from a node can be heard by many surrounding nodes, whereas in traditional network coding, a transmission is only heard by a node at the other end of the link.

I have to admit that I was expecting something a little different before reading this paper. I thought the paper was going to pick up from the previous paper, and describe a coding approach for exploiting path diversity for higher throughput for a single unicast flow. What the paper seem to be describing is for multiple unicast flows. If the packets from each flow are not shared, its not immediately clear to me why network coding helps. (The butterfly example is when the receivers all want the packets injected by the various sources). COPE seems to assume that the neighbors of a node would want to have all the packets that it has; why that is the case is not immediately obvious to me.

ExOR: Opportunistic multi-hop routing for wireless networks

This paper is based on the following idea: in a wireless network, a transmitting node can be heard by more than one neighboring nodes. Instead of just depending on one forwarding node as in traditional routing, why not use any node that received the data to do the forwarding? This can reduce the number of re-transmissions and increase throughput. More concisely, this paper aims at making practical use of path diversity in a wireless network.

Alas, this simple idea needs plenty of book-keeping to make it worthwhile, and the rest of the system description describes one such approach. To reduce packet overhead, only a small number of forwarding nodes are considered for each hop. Second, to avoid duplication of transmission, there needs to be some mechanism for nodes to know what other potential forwarding nodes have done. Here, the authors choose to send a batch of packets at a time to pipeline the transmission and agreement of which packets have been sent by forwarding nodes.

Partly due to the complexity of the book-keeping, I was not able to quite follow how this works beyond one hop. Clearly, from the experimental results, ExOR works for multiple hops. I also could not make up my mind how centralized/distributed this scheme is. A server seemed to be necessary to obtain a complete global picture of the network and send it to all nodes such that the source node can pre-compute which are good forwarding nodes. For dynamic networks, a periodic route finding seems expensive and unsatisfactory. Is there a way to discover multiple paths in a distributed fashion in a dynamic environment?

My overall sense of the paper is this: too many things seem to need to happen for it to work, its a wonder that the results are as good as that shown in the paper. It is nice though that the authors point out that path diversity works well only if there is indeed some diversity in loss patterns observed by forwarding nodes; if the losses are highly correlated (perhaps a localized jammer?), there is a performance gap. Still, all it takes is one forwarding node to receive the packet for it to go through.

Monday, October 6, 2008

A performance comparison of multi-hop wireless ad hoc network routing protocols

As its name suggests, this paper compares 4 routing protocols for in an ad hoc network scenario: DSDV (hop-by-hop routing, periodic broadcast updates), TORA (route discovery on demand), DSR (source routing, discovery on demand) and AODV (hop-by-hop routing, discovery on demand). The evaluations are performed on a network simulator, ns, using an updated physical and data link layer model.

The experimental methodology studies routing performance with varying node mobility, packet rates, and number of transmitting sources. Nodes use a "random waypoint" movement model, with a pause time (when the node is stationary) before moving to the next waypoint, at maximum speeds of 20 m/s (~45 mph) or 1 m/s (~2.25 mph). Transmitting sources are assumed to be constant bitrate sources. Its not clear what real-life scenarios this simulation is reflective of (perhaps vehicular adhoc networks?) at speeds of 20 m/s. If the goal is to evaluate the performance on topology changes at various speeds, then maybe limiting transmission to when sources/sinks are stationary would be more realistic.

The main results are that:

Packet delivery ratio is high (0.9-1) and relatively stable for TORA, DSR and AODV over a range of pause times (i.e. topology change rate), but is lower for DSDV at pause times less than 300s. However, at low speeds (1 m/s), all protocols have fairly similar performance.
TORA performance in packet delivery ratio degrades when the number of sources is high (i.e. 30)
Routing overhead is constant with respect to mobility for DSDV (as expected). Routing overhead scales with mobility for the other routing protocols, with TORA > AODV > DSR
Routing overhead is constant with respect to number of transmitting sources for DSDV (again as expected). It scales with number of transmitting sources for the other routing protocols. At 30 sources, TORA >> AODV >> DSR, with an order of magnitude of difference between each of them.
In terms of path optimality, both DSDV and DSR do very well, while TORA and AODV having significant number of packets routed through paths with 3/4 extra hops than the shortest path.

The main message is that there are no clear winners, and which protocol to use may well depend on what the operating conditions are.

Sunday, October 5, 2008

A high-throughput path metric for multi-hop wireless routing

The main contribution of this paper is in recognizing that choosing a route based on minimum hop-count may not always be the right thing to do, since it does not take into account the link loss rates. Furthermore, doing so might also tend to pick hops with long distances, which is likely to be correlated with high losses. Instead, the authors suggest using a metric that is based on minimizing the expected number of transmissions along the route. This is an additive metric, where the metric of each link is the expected number of transmissions (and re-transmissions) along that link, given by 1/(df*dr), where df and dr are the delivery ratios for the forward and backward directions. Both ratios can be estimated by using regularly broadcasted probe packets from each node.

What I thought was missing from the paper was an intuition of why this expected transmission count (ETX) is the right thing to do, under the assumptions that the radio uses a fixed power, and that the link layer uses re-transmissions as an error control mechanism. Ideally, one would want to choose a path that maximizes throughput. However, ETX seems to be minimizing the expected link latency (assuming no queueing delay). Even then, ETX does not take into account the turn-around time at each hop, although that probably is a second-order consideration. And why not use the expected transmission count of the bottleneck link as the metric? My best guess is that ETX accounts for both the number of hops (in a high hop count route) and the bottleneck (in a low hop count route) in a single easy to describe/implement package.

The use of the experimental wireless testbed is quite convincing, first in motivating the need to move away from a hop-count metric in routing. I'm suprised by the significant assymetry in links; I wonder how "omni-directional" the antennae really is. The CDF also paints a convincing picture of the improvement provided by using ETX.

Wednesday, October 1, 2008

Architecture and evaluation of an unplanned 802.11b mesh network

As its title suggest, this paper discusses an implementation and evaluation of a wireless mesh network. I like the overall presentation; its clear in explaining what was done, actually carried out the system experiments and show plenty of experimental results. Despite that, I do have some concerns.

First, the assumption that "community wireless networks typically share a few wired Internet connections" is a little dubious. Perhaps there might be good reasons to back up such an assertion (maybe in a rural or undeveloped setting??), but these were not made clear. On the other hand, a deployed mesh node requires power at the very least, and in the urban setting where experiments were carried out (deployment in buildings), it is hard to imagine that the nodes cannot have access to a wired internet connection. I suppose there could be a cost pressure in having wired internet connection, so it could make sense to reduce that number. In that case, it should be make clear, and the trade-off between cost of wired connections and internet access throughput could be studied.

Second, and related to above, the experiments measure throughput between pairs of mesh nodes. While the results are well and good, it doesn't answer what users would care most about, that is, how fast internet access is. Given that wired connections are limited, the gateways represent a bottleneck to the internet. This point is important because the key assumption of the study is the limited number of nodes with wired internet access.

Third, it was not clear if throughput was measured simultaneously or one at a time. Section 3.7 mentions inter-hop interference, but that seems to be due to intermediate nodes performing hops on the same path between two sender/receiver nodes. Given that a typical use case would have at least some number of nodes trying to communicate with the gateway, it would have been nice to see how average throughput varies with the number of simultaneous communications.

A minor note on the effect of density (section 3.4). It would have been interesting to see how the empirical density for full connectivity compares with known theoretical results for full-connectivity in random graph theory, given the maximum range of the communications.

Now, to end on a positive note. I think the routing protocol presented here is useful not just for wireless mesh networks, but would also be useful for an overlay network. I'm not sure how routing for overlay networks work, but I'm guessing it won't be too different.

Modeling wireless links for transport protocols

In my opinion, the key contribution of this paper is in carefully reviewing aspects of wireless links (cellular, WLAN, satellite, etc.) and proposing how the salient features of each can be expressed in simulations. The proposals are also implemented in ns-2. One hope is that by standardizing simulation parameters, it makes it possible to compare results across papers. This is in turn motivated by the desire to understand how existing protocols are affected by wireless links, since the assumption has been that a packet loss is due to congestion rather than link errors, and how to modify protocols appropriately.

The key modeling proposals are:

Link errors: packet drops based on specified per-packet/per-bit errors, with possible use of a Gilbert-Eliot channel model. Where packets are dropped also depend on whether loss is caused by link corruption (after transmission) or by handover (before transmission).
Delay: suspend packet transmission
Reordering: swapping packets or delaying a packet while letting others pass. Also, whether or not to model it could depend on if its allowable.
Bandwidth allocation/variation: applicable for GPRS
Bandwidth assymetry
Queue management
Handovers (as a result of mobility): the level of modeling details depend on what is being studied. (e.g. full fledged vs modifying link characteristics)

It seems strange that the authors make a big deal about the dangers of modeling wireless as duplex when it actually is not, but then do not quite describe what is the approach that they took in the end. They do mention two possibilities, either reducing the advertised link-rate or modeling the half-duplex nature. But what did they do at the end of the day?

I do think that a good model of wireless links is a worthy enterprise, not just for pure network research, but also for applications-oriented research such as video streaming over wireless. A good model allows for quick prototyping and simulations before building the actual system to try it out. I also do wonder if there are any models of the end-to-end link, e.g. a model of the link between a server sending data over the internet with a wireless last-hop, where both congestion and link errors/delay come into play. Again, such a model would be useful for doing research on video streaming for example.

I also wish that the paper was written in a more systematic manner. For example, I think section 2 and 3 should be swapped, but perhaps its not possible to point out shortcomings of prior simulations without first describing the properties. Section 7 seems like an add-on, which I thought not totally relevant. Perhaps its to convince readers of the need for such a model. I also wish that there was a summary to inform readers of all the modeling aspects and parameters, maybe a table of some kind.

Tuesday, September 30, 2008

MACAW: A media access protocol for wireless LANs

This paper reads like a cookbook of modifications made to a previously proposed media access protocol, MACA, to improve both total media utilization and "fairness". While media utilization is fairly well-defined, as the authors pointed out, its not quite clear in what form "fairness" should take. Should it be bandwidth per transmitter, bandwidth per receiver, or bandwidth per stream? In this paper, it seems like that authors chose the last.

The structure of the paper seems to follow this pattern: point out a short-coming of MACA, fix it with a proposed modification, repeat. In fact, sometimes the proposed modifications could bring problems of its own, and yet another new modifcation is proposed. While the proposed modifications are well-thought out, I can't help but think (sitting from where I am instead of having to design a wireless protocol) if there could have been a more holistic approach, where one looks at the various use scenarios, and come up with a scheme that addresses all concerns. Of course, the process of putting together all the modifications could still be the same "poke-finger-into-holes-in-dikes" approach, but it could feel more satisfying. I do however like the style in the paper because its works very well as a tutorial.

Anyway, the summary (section 5) of this paper gives a good wrap-up of all the issues and solutions, but I'll just repeat them for completeness - in paper chronological order:

The baseline, MACA, uses a RTS-CTS-DATA protocol, with a binary exponential backoff retransmission when CTS is not received.
The backoff counter is copied and shared by all stations, so that congestion information is known to all.
The backoff follows multiplicative increase and linear decrease to avoid big ocillations, while still backing off quickly
Each transmitter maintains separate queues for each stream (or perhaps destination) each serviced by a separate backoff algorithm such that bandwidth can be equally shared by all streams in the cell.
ACK is added (to get RTS-CTS-DATA-ACK) for fast recovery
DS is added (to get RTS-CTS-DS-DATA-ACK) such that stations that cannot hear CTS will know that data is about to be transmitted (since RTS-CTS is successful) and defer transmission
RRTS is added, such that when CTS cannot be sent due to other stations transmitting, the receiver can contend on behalf of the transmitter.
A separate backoff counter is maintained for each end of each stream but copied in such a way that all stations trying to talk with a receiver has the same backoff value. This is done to reduce leakage of backoff counters across cells to allow high utilization in each cell.

Sunday, September 21, 2008

A fast switched backplane for a gigabit switched router

This is a white paper arguing for why a switched backplane is needed in high-performance routers (instead of a shared bus backplane), and describe and explains some of the design choices for its architecture and scheduling algorithms. Because its a white paper, it eschews much technical details and includes easy-to-understand explanations and analogy. For that reason, I really enjoyed reading this paper, given my limited (actually zero) exposure to router hardware. I also wished I read this paper before the other ("Scaling internet routers...") because it would have made that reading experience so much less painful; in particular, this white paper actually does a good job of explaining what backplanes, switching crossbar, linecards etc. are. I would recommend that their order be switched in the future. (I also should have saw the words "white paper" on this paper and picked it to read first).

Anyway, the author makes the argument that a shared backplane based on a common bus is the main bottleneck to increasing router capacity rather than the processing for packet forwarding. Thus, a switched backplane based on a centrally controlled crossbar switch is more suitable since each input-output link can be operated at a much higher speed, and allows multiple input-output links to be operated simultaneously ("internally non-blocking"). This increases overall aggregate bandwidth.

Next, the author goes into several design choices for a switched backplane. First, a fixed packet length allows for simplicity in the overall design and high-performance. Second, virtual output queues (VOQs) are used in the input ports to avoid the problem of head-of-line (HOL) blocking, since the scheduler is able to more efficiently configure the crossbar if it knows which outputs would be useful to which inputs. Third, a simple (thus fast) crossbar scheduling algorithm known as iSLIP is used to configure the crossbar switch with the VOQ construct. It is a iterative algorithm that assigns outputs to inputs based on output requests by the input ports, and serves both inputs and outputs in a round-robin fashion. Finally, to control delay, the author suggests speedup of the switching fabric (more than the external line rate).

It was also personally satisfying to see the mention of multicast in the white paper; I was thinking that since a crossbar switch is used, it should be easy to connect a single input to multiple output such as in multicast. And the author does go over the support of multicast in such a router, together with a discussion on a modified iSLIP for scheduling, based on the observation that the switch need not service the multiple outputs simultaneously to avoid blocking.

This paper actually does lay a lot of the groundwork for the other paper. For example, the question of whether re-configuration of the crossbar can scale with increasing line-rate arises naturally and with increasing number of ports. Now maybe I should go back and re-read the other paper...

Saturday, September 20, 2008

Scaling internet routers using optics

In this paper, the authors set out to design a 100TB/s router, motivated by the increasing demand for high router capacity which are not being met by current router design developments. They target three main issues with existing technologies: high power consumption, unpredictable performance and poor scalability. To overcome power consumption issues, they argue for optical switching; to over come high power consumption and unpredictable performance, they propose a solution that is based on a previously proposed load-balanced switch, but modified to handle four main problems: (i) need for rapid re-configuration of switch fabric; (ii) possible mis-sequencing of packets; (iii) throughput can be made arbitrarily small; and (iv) breaks down when linecards fails.

The basic architecture is based on a load-balanced switch, which has a load-balancing stage implemented by a switch, followed by a single stage of buffer, and then followed by another switching stage to the final outputs. The switching is done in a deterministic manner, which makes it easily implemented in optics by a arrayed waveguide grating router, thus eliminating the need for rapid reconfiguration. The load-balancing is performed by the input linecard, using a Full Ordered Frames First (FOFF) scheme that spreads input packets over all intermediate linecards through the switching inputs. This also mitigates the possible mis-sequencing of packets and prevents pathological traffic patterns from arbitrarily affecting throughput. The actual switching of the packets from input to output is performed by intermediate linecards, which places them in the appropriate queue so that they will end up in the right output after the second switching stage. This is possible since switching is done in a deterministic manner. Finally, by partioning the switching into multiple stages and using a pre-determined routing of the optical switches, it can adapt to linecard additions/deletions.

I unfortunately do not follow some parts of the paper. For example, I am rather confused by the role that linecards play. The authors mention both input linecards and intermediate linecards. Are these a conceptual distinction, or does the actual physical implementation use separate linecards for two different purposes? This was especially confusing to me since some diagrams show a linecard doing both of these things. If so, how does the linecard know from the inputs what is intermediate packets and what are meant for outgoing?

The authors also list some open challenges, such as high speed address lookups due to processing and memory speed bottlenecks.

I felt that the paper started off being fairly easy to follow. The introduction motivated and introduced the problem pretty well. However, the technical sections were very hard to understand, possible (probably) because this is an area I had no exposure to previously, and many of the terms and concepts, such as crossbar, are alien to me. The various sections seem rather disjointed, perhaps because they were written by different people? By the time I got to the end of the paper, I felt rather exhausted, and yet had the feeling that I did not really got too much out of the paper.

Wednesday, September 17, 2008

Supporting real-time applications in an integrated services packet network

As its title suggests, this paper proposes an architecture and mechanism for providing service guarantees. It provides two classes of services: guaranteed delays, which is suitable for clients that are intolerant to highly varying delays and fixes its operation based on an assumed delay, and predictive delays, which is suitable for clients that are tolerant to varying delays and is able to adapt to it.

The mechanism to support such an architecture is based on a hierarchy of scheduling algorithms. For the guaranteed class of services, each client is given its own flow, while for the predictive class of services, all clients are aggregated into a virtual flow. Within that virtual flow, clients that request a particular delay constraint are grouped into the same priority class. The scheduling mechanism performs WFQ across all flows, including the virtual flow. This provides isolation of each flow, assuring that a misbehaving flow would have limited impact on the other flows. Within the virtual flow, FIFO/FIFO+ together with priority is used to schedule all the individual flows. This provides sharing of the link in order to distribute delay.

There are two further (among others) interesting details. First, FIFO+ extends FIFO by attempting to emulate FIFO in aggregate across all hops to provide more effective sharing. However, this requires modification of the packet header to include some state information about the delay seen by the packet up to the current hop. Second, for the predictive service, clients have to specify their traffic generation patterns using parameters of a token bucket filter, which provides some idea of the average and peak rates (or the amount of burstiness). This is to be used in conjunction with delay and loss requirements to appropriately assign its priority class. This seems to be a hybrid of the explicit and implicit service selection schemes mentioned in the previous paper. The network is inferring the appropriate service class using a set of application provided parameters, thus avoiding the need to know service requirements of each application class. The application is also explicitly requesting for certain service requirements, but in a more flexible and extensible manner.

The paper does assume a play-back application as the one that requires real-time service. In a real-time playback as in telephony, there is very limited buffering due to tight latency constraints. Hence, it is not clear to me how adaptive the schemes are with respect to being able to shift the playback time. (Constrast this to streaming recorded content). Also, my experience with video is that it is not too tolerant to packet losses, although various things such as error correction and concealment can be done to somewhat mitigate it. Given a choice between loss, delay and bandwidth, it is usually always much better to take a hit in bandwidth, because the resulting video experience would be better. (Delay and loss are somewhat equivalent in a real-time scenario)

Fundamental design issues for the future Internet

In this paper, the author's thesis is that due to the emergence of multimedia applications such as voice/video telephony with very different delay/bandwidth service requirements from traditional data applications, the following list of things need to be done:

IP should extend its service model, since services which are aligned with service requirements increases overall utility of the network.
The application should request the appropriate service instead of being it being estimated by the network, to maintain separation between network and application layers.
Admission control, instead of over-provisioning, should be adopted to prevent network overload, since it is economically more sound.

The author looks at the Internet from the high-level, and use macro-type of analysis to make his arguments. To do so, a framework is introduced whereby the design objective is to maximize the total utility of users of the Internet, subject to physical constraints of the network. This is not meant as a way to optimize resource allocation or choose network parameters, but rather as a tool to characterize various design choices. Unfortunately, the outcome of the analysis sometimes depend on the various cost and utility parameters, such as the relative cost of over-provisioning and changing network mechanisms, and utility function when certain service requirements are not met. For example, in trying to show that service models should be extended, the argument is assuming that over-provisioning cannot be done and the cost of adding a service model is negligible.

The author argues for an explicit choosing of service by an application, rather than the service being implicitly assigned by the network, based on a layer separation argument. While an implicit assignment might be bad, it is not clear if an explicit assignment would work in practice either, due to issues such as incentives (and authentication, accounting, billing to establish financial incentives). In other words, even if it was established that separate services is the way to go, there might not be a good way to actually assign a service to a packet.

The discussion on admission controls seems to me to be the most sound. I like the use of the utility maximizing framework here, since the outcome is independent of quantities, but just depends on the characteristics of the utility function. The logical outcome of best-effort service for data, and admission control for real-time applications was very satisfying. Unfortunately, it is not clear to me how admission control is to be done for the Internet, since the user (or gateway) has to in effect probe the state of the network, which might not be easy or stationary. Also, such a mechanism can be abused to "excommunicate" users - a chilling thought.

I'm not sure how different the Internet was during the 1990s, but with hindsight, it is interesting to note that various things we accept today, such as spam, web-crawling, IP (video)telephony, were mentioned in this paper.

Monday, September 15, 2008

Congestion control for high bandwidth-delay product networks

This paper proposes eXplicit control protocol (XCP) to address some of the performance shortcomings of TCP, particularly as delay/bandwidth increases. For example, with high RTT, TCP's policy of additive increase increases bandwidth utilization very slowly. Another scenario is with short flows, which is unable to use available bandwidth faster than slow start allows. The authors also point out that with TCP uses an implicit congestion feedback through packet loss, which only indicates the presence/absence of congestion and not its amount, thus sources have to keep pushing the link to congestion before backing off continuously.

As implied by its name, XCP provides feedback about the state of congestion at the routers to senders, by indicating to the senders how much to adjust its congestion window by. Two key elements of XCP are the efficiency controller and the fairness controller, which operate independently. The efficiency controller aims at maximizing link utilization, and uses a feedback controller based on spare bandwidth and queue size to set the aggregate change in rate. The parameters of this feedback controller are chosen based on the feedback delay to ensure that the system is stable. The fairness controller then parcels out the aggregate change to various packets to achieve fairness, based on an additive-increase-multiplicative-decrease mechanism. As the authors noted, XCP can converge to fairness much faster, since the adjustment is done with each RTT instead with each packet loss.

This paper is full of useful ideas that can be applicable in other problems as well. First, there is the use of control theory to examine how a network (the plant) should be controlled and parameters should be chosen. Second, the use of a fluid model to allow the analysis of protocols by approximating a discrete system with a continuous one. (How good the approximation is would be debatable though.) Third, the mention of a shadow pricing model is an interesting way of determining how to allocate resources based on how much users are willing to pay for them.

On the other hand, the policing of misbehaving sources based on border agents sounds like an afterthought to address the vulnerabilities of the system to non-compliant senders. While it is possible (but expensive) to have gateways to shut down the bad sources, it would be far better to have a scheme which provides no ineherent incentives to a source to misbehave - i.e. the compliant sender policy should also be one that gives the sender the highest throughput.

The experimental results are also to be expected. If a router is able to control how sources should adjust their transmission rate, then it is easy to believe that it would be able to achieve high link utilization.

Random early detection gateways for congestion avoidance

This is another paper on congestion avoidance at the gateway. However, here the goal is not so much to enforce fair queuing (although this is done in a way), but to control the average packet queue size so as to ensure a reasonable amount of latency while maintaining high throughput. The basic idea is to mark/drop packets probabilistically when the packet queue in some range (and mark/drop all packets if it exceeds some maximum threshold). The authors list 4 main design goals:

Avoid bias against bursty traffic
Avoid global synchronization (i.e. sources reducing windows at the same time)
Control queue size in absence of cooperating sources

There are two key elements in the RED algorithm. The first is to compute the average queue size for each incoming packet. This is done via a exponential weighted moving average. There is a slight twist in that it also tries to take into account idle time when the queue is empty by computing the average as though some number of small packets enter the gateway during the idle time. This to me seems like a hack, since the average is only updated per incoming packet, whereas what was probably intended is to have an average queue size over time. This could be done in a more principled manner, perhaps using the approach in the "Core-stateless fair queuing paper", but probably this method is easier and does not make too much of a difference in the end.

The second is to determine what probability to drop/mark packets at. The desired probability is computed based on a linear function of the average queue size - intuitively, the larger the queue size, the more packets the gateway should mark. The reason for choosing this feedback mechanism is not made clear, and I think some control theory can be applied to determine what is an ideal probability based on the desired time constant. However, again, this need not be an exact science. What is interesting is their next step, in which the actual probability is computed, based on a desired goal of not having too many packets dropped close to each other and avoid having a long interval between dropped packets. Instead, the process is modified such that packets are dropped uniformly.

One advantage of this approach is that there is no need to track individual flow rates, thus it scales well with the number of connections. Of course, fair queueing is not one of its objective, so there was no need to do so in the first place. On the other hand, because the packet dropping process is probabilistically applied to every incoming packet, it also has the outcome of dropping more packets from the connection using a higher bandwidth, although this only ensures that each connection is penalized by the same fraction.

Wednesday, September 10, 2008

Core-Stateless fair queuing

While starting to read the next paper, a question popped out in my head: does the fair queuing algorithm violate some end-to-end principle? While congestion avoidance is something that seem to best taken care of by the hosts (assuming that the hosts wants reliable communication), I would think that trying to allocate bandwidth fairly among flows is something that needs to be in the network, since the host does not know anything about other hosts and may not even be cooperative in the first place.

Another issue seems to be that for a router to implement fair queuing, it needs to keep track of all flows, which requires state, and grows in complexity with the number of flows. Also, when should a router stop keeping track of a flow? The proposed Core-Stateless Fair Queuing (CSFQ) tries to resolve this by pushing the work out from the core routers to the edge routers. The key idea is for the edge router to label incoming packets with their flow and rate, while the core routers probabilistically drops packets based on the labeled information and an estimated fair rate. The simulations show that CSFQ closely approach a fair queuing variant, even outperforming it in one scenario.

I am not quite sure why this might be better than fair queuing from a complexity point of view. I suppose all the workload is on the edge-routers which are next to senders, while the core routers can do something much simpler on a per packet basis. Would the edge router see less flows than a core router, and thus this is a way of distributing flow tracking work?

As the authors point out, it is not clear how easy it is to track flows. How much problem does this cause if not done properly? Also, how easy is it to label a packet with this flow ID and rate, and how do you assign a unique flow ID (some long random number perhaps?).

Analysis and Simulation of a Fair Queueing Algorithm

This paper considers a fair queuing algorithm for gateways for deciding how to forward incoming packets. The authors claim that queuing attempts to allocate bandwidth, buffer space and promptness, and first-come-first-serve (FCFS) ties all of them together. While the congestion avoidance papers from the previous class consider source/node flow control to vary transmission rate, it was assumed that hosts are cooperative. Another desired property of a queuing algorithm is to allow gateways to control congestion even when certain hosts are non-compliant.

The proposed fair queueing (FQ) algorithm essentially tries to emulate a bitwise round-robin servicing; i.e. each conversation (between host-destination) takes turn sending a bit. For practical reasons, a packet-by-packet transmission is done instead, but the packet arrival time and packet size is used to decide when it is to be sent. A separate parameter is used to regulate promptness, by taking into account when a packet was last transmitted for that conversation. Finally, if a packet arrives when the buffer is full, the latest packet from the conversation using the most buffer space is dropped, BUT the conversation would still be charged for that packet. This essentially punishes out-of-control senders.

The simulations show essentially that FQ works with a range of flow control algorithms to provide fair bandwidth distribution for high-throughput FTP and low delay for Telnet, in contrast to FCFS which could produce "winners" for FTP and high delay for Telnet. It also very nicely shuts down a ill-behaved source, which seemed to me a most desirable outcome.

I liked this paper because it addressed a question I had from the earlier readings, which is how to program gateways to deal with mis-behaving hosts and somewhat control congestion. There are also many questions that the authors posed which could lead to interesting research. For example, it was not clear if a game-theoretic analysis of fair queuing would lead to the conclusion that the best thing hosts could do is to adopt some reasonable flow control, and that they cannot do any better (i.e. cheat to get more bandwidth).

Monday, September 8, 2008

Congestion avoidance and control

The previous paper ("Analysis of Increase/Decrease...") could be thought of as providing a theoretical reason for additive-increase/multiplicative-decrease. This paper seem to be addressing practical implementation issues, describing a number of algorithms for TCP while providing some empirical reasons as to why they make sense. On the whole, I found this paper to be less rigorous, oftentimes hand-waving arguments through, and taking a rather condescending tone towards readers. Note to self: do not ever put your readers in a foul mode through your writing.

The paper assumes that rate-control is achieved with transmit windows (unlike the previous paper which was agnostic to how rate-control was performed). The first algorithm the authors explained was slow-start, which gradually increases the rate to the advertised rate to prevent a sender from overwhelming network resources when starting transmission. This is of course not the same as the AIMD congestion avoidance, so the authors describes how both slow-start and AIMD can be used together, but in the Appendix. Anyway, the key idea is that in multiplicative-decrease, instead of simplying using a congestion window that is halved, slow-start is used to reach the halved congestion window, before switching to the more conservative additive increase mode (if possible).

Another issue that is addressed is estimating RTT, which is necessary for determining when to perform retransmission. Along the way, the authors also mention the use of exponential back-off in retransmission.

Here, the paper mention reliably detecting knee-points as an open problem, and mentions the possibility of using RTT/queue length prediction as one possible method. Another issue is that of coping with misbehaving hosts (whether accidental or intentional). This was mentioned early in the introduction, leading me to hope that it would address the questions I had about the previous paper. However, it was not brought up again until the end of the paper when the authors discussed future work for gateway control, though there did not seem to be any concrete ideas.

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

This paper analyzes increase/decrease algorithms for congestion avoidance. Early in the paper, a key distinction is made between congestion control, which tries to prevent the network from entering a congested state, and congestion avoidance, which tries to keep the network around the "knee", or "sweet spot", before throughput saturates and response time increases with increasing load. Increase/decrease algorithms are ones where users increase/decrease their load in response to feedback from the network resource. In the paper, the authors focus on binary feedback (overloaded/underloaded), and look at linear control for the most part.

Through the use of efficiency, fairness and distributedness criterions (and lack of global knowledge), the authors show algebraically that the increase stage must have an additive component greater than 0, and multiplicative component of at least 1, and the decrease stage must have an additive component of 0, and multiplicative component at least 0 and less than 1. The actual parameters can be chosen based on convergence criterion, and essentially lead to an additive increase and multiplicative decrease. It is also nice to see that there is an algebraic trade-off between responsiveness and smoothness - something that is to be expected.

What I liked a lot about this paper was the use of the vector diagrams to demonstrate the criterions, and to give intuition about how different linear increase/decrease algorithms would behave with respect to those criterion. The paper was very well-written and easy to read, and very quickly describes the problem at hand, narrows the scope, and deliver the punchlines effectively. There were no surprises along the way.

I think this paper provides a convincing argument as to why the additive-increase/multiplicativ-decrease is the best choice for congestion avoidance. The scheme is simple to implement, is distributed among hosts, requires low-rate feedback that is easily generated by network resources, and yet is provably shown to satisfy efficiency and fairness criterion (albeit ones chosen by the authors) while achieving optimal (?) convergence, at least among linear increase/decrease control algorithms.

Issues that were not discussed is how to identify the knee point, which would be necessary in generating the binary feedback. This could be a straightforward thing to do, but it is not clear to me. The efficiency criterion is closely related to this knee point, and if this knee point is not chosen correctly, than being overloaded would be extremely disastrous.

Because the scheme is so devastingly simple, it could have the effect of closing off research on congestion avoidance schemes, since even if another scheme could be shown to perform better, it would probably not be simpler to implement in practice. On the other hand, an possible weakness seems to be the assumption of compliance and cooperativeness. What would happen if a host cheats and continues to increase load even as it knows (from feedback) that others would decrease their load? Would there be another congestion avoidance scheme that could punish non-compliant hosts?

Wednesday, September 3, 2008

On inferring autonomous system relationships in the Internet

As its name suggests, the purpose of the work is to infer relationships (peering, transit etc.) between ASes in the Internet, by analyzing BGP routing tables. I have to quickly admit that I do not understand the purpose of such an endeavour. The author hints at the role that such relationships play on the "structure of the Internet and the end-to-end performance characteristics", but its not clear how elucidating the actual relationships between ASes will improve that understanding. The author does give two other examples of using this information in the conclusion, although I think the paper would have been better served by putting it in the introduction in order to better motivate the problem. Anyway, the two examples are that "ISPs can reduce misconfiguration and debug router configuration files" and to "plan for future contractual agreements."

The work uses some interesting heuristics. First is the heuristic that the size of an AS can be determined by the degree of its (logical) connections. Second, the work assumes a reasonable route export/import policy based on the nature of the relationship between ASes. This assumption is then used to prove a key theorem, that is, all AS paths in any BGP routing entry satisfies the following:

"(i) A provider-to-customer edge can be followed by only provide-to-customer or sibling-to-sibling edges;

(ii) A peer-to-peer edge can be followed by only provide-to-customer or sibling-to-sibling edges"

The basic algorithm looks at each AS path, and guesses the top provider to be the one with the highest degree. Pairs of routers before (after) the top provider are then inferred to be either sibling-to-sibling or customer-to-provider (provider-to-customer). Sibling-to-sibling are simply connections where a transit relationship occur in both directions. A more refined algorithm makes use of soft information, in that each AS path provides evidence in support of a transit relationship, but a minimum number of such evidence needs to be present. A separate step is used to infer peering relationship based on certain heuristics, which I admit I do not fully understand (hence a suitable point of discussion =p).

It was difficult to make sense of the experimental results. In V.A., the main claim is that the numbers are fairly consistent over 3 different readings, but perhaps what this is saying is just that given fairly similar routing entries (how similar??), the algorithm would not give wildly different results. The results presented in V.B. could also be misleading. For example, even if confirmation of customers is 100%, it is just saying that there were no false positives of identifying of customers; there could have been a missed detection (please correct me if I'm wrong).

One possible extension is a refinement of the refined algorithm. Instead of using a simple count, perhaps some notion of either AS path reliability can be used in a weighting. Also, as the author pointed out, the heuristic that a top provider is one with the highest degree may in fact not be true. Is some kind of consistency check among the AS paths able to more reliably find the top providers? At the end of the day, it might boil down to the algorithms doing a effective one pass through all the AS paths, whereas an iterative inference algorithm could have sorted things out better.

highest degree - top provider

Interdomain Internet Routing

Coming with minimal background in computer networks, each paper is a fascinating study into the inner workings of the Internet. The lecture "Interdomain Internet Routing" attempts to discuss how routing works between autonomous systems (AS) which consist of networks of nodes, but within the context of commercial operation of ASes by ISPs. The 0th order item that the paper points out is the fact that a routing entry equals connectivity, and that in effect an ISP is charging a node for advertising routing information to it (which was non-obvious to me). Related to that are the concepts of transit, where ISP charges for access to its routing table, and peering, where two ASes exchanges a subset of their routing tables (financial or otherwise). Exporting and importing routes (connectivity) costs the ISP money, thus it requires a set of policies that are commercially sound.

The paper then goes on to describe the Border Gateway Protocol (BGP), which is used primarily by routers of different ASes to exchange route information, its design goals and how it can be used to implement routing policies. The main design goals are scalability, policy (expressiveness) and cooperation under competitive circumstance (imho ensuring things go well in a distributed manner even among selfish ISPs).

The paper also addresses how BGP can be used in internal sessions (iBGP) by routers on the edge (eBGP) within an AS to exchange routing information about the external world. While eBGP routers can be fully connected to exchange such information, this is not scalable. Thus, two techniques, router reflectors and confederations, were discussed. Both however have to be configured manually, which naturally leads to the question if such a process (exchanging routing information among all eBGP routers) can be done autonomously for ease of deployment.

Finally, two open problems are discussed. First is multi-homing, where a customer might want to have access through two ISPs, e.g. for redundancy reasons. The two main issues seem to be scalability, since hacks to make multi-homing work by tweaking the ASPATH seem to require some knowledge of the AS topology, and size constraints on ISP routing tables, which impose a high minimum of nodes the customer pays for. Second is the slow convergence of BGP after the occurrence of a fault.

This piece is obviously meant as a tutorial, and for me, its main value is in shedding light on how ISP actually inter-connect (and the role that routing plays), and in a way that makes sense commercially. For obvious reasons, it assumes that the reader has sufficient exposure to the material, so for me, reading it involved a lot of searching on Wikipedia.