Thoughts on network papers: September 2008

Tuesday, September 30, 2008

MACAW: A media access protocol for wireless LANs

This paper reads like a cookbook of modifications made to a previously proposed media access protocol, MACA, to improve both total media utilization and "fairness". While media utilization is fairly well-defined, as the authors pointed out, its not quite clear in what form "fairness" should take. Should it be bandwidth per transmitter, bandwidth per receiver, or bandwidth per stream? In this paper, it seems like that authors chose the last.

The structure of the paper seems to follow this pattern: point out a short-coming of MACA, fix it with a proposed modification, repeat. In fact, sometimes the proposed modifications could bring problems of its own, and yet another new modifcation is proposed. While the proposed modifications are well-thought out, I can't help but think (sitting from where I am instead of having to design a wireless protocol) if there could have been a more holistic approach, where one looks at the various use scenarios, and come up with a scheme that addresses all concerns. Of course, the process of putting together all the modifications could still be the same "poke-finger-into-holes-in-dikes" approach, but it could feel more satisfying. I do however like the style in the paper because its works very well as a tutorial.

Anyway, the summary (section 5) of this paper gives a good wrap-up of all the issues and solutions, but I'll just repeat them for completeness - in paper chronological order:

The baseline, MACA, uses a RTS-CTS-DATA protocol, with a binary exponential backoff retransmission when CTS is not received.
The backoff counter is copied and shared by all stations, so that congestion information is known to all.
The backoff follows multiplicative increase and linear decrease to avoid big ocillations, while still backing off quickly
Each transmitter maintains separate queues for each stream (or perhaps destination) each serviced by a separate backoff algorithm such that bandwidth can be equally shared by all streams in the cell.
ACK is added (to get RTS-CTS-DATA-ACK) for fast recovery
DS is added (to get RTS-CTS-DS-DATA-ACK) such that stations that cannot hear CTS will know that data is about to be transmitted (since RTS-CTS is successful) and defer transmission
RRTS is added, such that when CTS cannot be sent due to other stations transmitting, the receiver can contend on behalf of the transmitter.
A separate backoff counter is maintained for each end of each stream but copied in such a way that all stations trying to talk with a receiver has the same backoff value. This is done to reduce leakage of backoff counters across cells to allow high utilization in each cell.

Sunday, September 21, 2008

A fast switched backplane for a gigabit switched router

This is a white paper arguing for why a switched backplane is needed in high-performance routers (instead of a shared bus backplane), and describe and explains some of the design choices for its architecture and scheduling algorithms. Because its a white paper, it eschews much technical details and includes easy-to-understand explanations and analogy. For that reason, I really enjoyed reading this paper, given my limited (actually zero) exposure to router hardware. I also wished I read this paper before the other ("Scaling internet routers...") because it would have made that reading experience so much less painful; in particular, this white paper actually does a good job of explaining what backplanes, switching crossbar, linecards etc. are. I would recommend that their order be switched in the future. (I also should have saw the words "white paper" on this paper and picked it to read first).

Anyway, the author makes the argument that a shared backplane based on a common bus is the main bottleneck to increasing router capacity rather than the processing for packet forwarding. Thus, a switched backplane based on a centrally controlled crossbar switch is more suitable since each input-output link can be operated at a much higher speed, and allows multiple input-output links to be operated simultaneously ("internally non-blocking"). This increases overall aggregate bandwidth.

Next, the author goes into several design choices for a switched backplane. First, a fixed packet length allows for simplicity in the overall design and high-performance. Second, virtual output queues (VOQs) are used in the input ports to avoid the problem of head-of-line (HOL) blocking, since the scheduler is able to more efficiently configure the crossbar if it knows which outputs would be useful to which inputs. Third, a simple (thus fast) crossbar scheduling algorithm known as iSLIP is used to configure the crossbar switch with the VOQ construct. It is a iterative algorithm that assigns outputs to inputs based on output requests by the input ports, and serves both inputs and outputs in a round-robin fashion. Finally, to control delay, the author suggests speedup of the switching fabric (more than the external line rate).

It was also personally satisfying to see the mention of multicast in the white paper; I was thinking that since a crossbar switch is used, it should be easy to connect a single input to multiple output such as in multicast. And the author does go over the support of multicast in such a router, together with a discussion on a modified iSLIP for scheduling, based on the observation that the switch need not service the multiple outputs simultaneously to avoid blocking.

This paper actually does lay a lot of the groundwork for the other paper. For example, the question of whether re-configuration of the crossbar can scale with increasing line-rate arises naturally and with increasing number of ports. Now maybe I should go back and re-read the other paper...

Saturday, September 20, 2008

Scaling internet routers using optics

In this paper, the authors set out to design a 100TB/s router, motivated by the increasing demand for high router capacity which are not being met by current router design developments. They target three main issues with existing technologies: high power consumption, unpredictable performance and poor scalability. To overcome power consumption issues, they argue for optical switching; to over come high power consumption and unpredictable performance, they propose a solution that is based on a previously proposed load-balanced switch, but modified to handle four main problems: (i) need for rapid re-configuration of switch fabric; (ii) possible mis-sequencing of packets; (iii) throughput can be made arbitrarily small; and (iv) breaks down when linecards fails.

The basic architecture is based on a load-balanced switch, which has a load-balancing stage implemented by a switch, followed by a single stage of buffer, and then followed by another switching stage to the final outputs. The switching is done in a deterministic manner, which makes it easily implemented in optics by a arrayed waveguide grating router, thus eliminating the need for rapid reconfiguration. The load-balancing is performed by the input linecard, using a Full Ordered Frames First (FOFF) scheme that spreads input packets over all intermediate linecards through the switching inputs. This also mitigates the possible mis-sequencing of packets and prevents pathological traffic patterns from arbitrarily affecting throughput. The actual switching of the packets from input to output is performed by intermediate linecards, which places them in the appropriate queue so that they will end up in the right output after the second switching stage. This is possible since switching is done in a deterministic manner. Finally, by partioning the switching into multiple stages and using a pre-determined routing of the optical switches, it can adapt to linecard additions/deletions.

I unfortunately do not follow some parts of the paper. For example, I am rather confused by the role that linecards play. The authors mention both input linecards and intermediate linecards. Are these a conceptual distinction, or does the actual physical implementation use separate linecards for two different purposes? This was especially confusing to me since some diagrams show a linecard doing both of these things. If so, how does the linecard know from the inputs what is intermediate packets and what are meant for outgoing?

The authors also list some open challenges, such as high speed address lookups due to processing and memory speed bottlenecks.

I felt that the paper started off being fairly easy to follow. The introduction motivated and introduced the problem pretty well. However, the technical sections were very hard to understand, possible (probably) because this is an area I had no exposure to previously, and many of the terms and concepts, such as crossbar, are alien to me. The various sections seem rather disjointed, perhaps because they were written by different people? By the time I got to the end of the paper, I felt rather exhausted, and yet had the feeling that I did not really got too much out of the paper.

Wednesday, September 17, 2008

Supporting real-time applications in an integrated services packet network

As its title suggests, this paper proposes an architecture and mechanism for providing service guarantees. It provides two classes of services: guaranteed delays, which is suitable for clients that are intolerant to highly varying delays and fixes its operation based on an assumed delay, and predictive delays, which is suitable for clients that are tolerant to varying delays and is able to adapt to it.

The mechanism to support such an architecture is based on a hierarchy of scheduling algorithms. For the guaranteed class of services, each client is given its own flow, while for the predictive class of services, all clients are aggregated into a virtual flow. Within that virtual flow, clients that request a particular delay constraint are grouped into the same priority class. The scheduling mechanism performs WFQ across all flows, including the virtual flow. This provides isolation of each flow, assuring that a misbehaving flow would have limited impact on the other flows. Within the virtual flow, FIFO/FIFO+ together with priority is used to schedule all the individual flows. This provides sharing of the link in order to distribute delay.

There are two further (among others) interesting details. First, FIFO+ extends FIFO by attempting to emulate FIFO in aggregate across all hops to provide more effective sharing. However, this requires modification of the packet header to include some state information about the delay seen by the packet up to the current hop. Second, for the predictive service, clients have to specify their traffic generation patterns using parameters of a token bucket filter, which provides some idea of the average and peak rates (or the amount of burstiness). This is to be used in conjunction with delay and loss requirements to appropriately assign its priority class. This seems to be a hybrid of the explicit and implicit service selection schemes mentioned in the previous paper. The network is inferring the appropriate service class using a set of application provided parameters, thus avoiding the need to know service requirements of each application class. The application is also explicitly requesting for certain service requirements, but in a more flexible and extensible manner.

The paper does assume a play-back application as the one that requires real-time service. In a real-time playback as in telephony, there is very limited buffering due to tight latency constraints. Hence, it is not clear to me how adaptive the schemes are with respect to being able to shift the playback time. (Constrast this to streaming recorded content). Also, my experience with video is that it is not too tolerant to packet losses, although various things such as error correction and concealment can be done to somewhat mitigate it. Given a choice between loss, delay and bandwidth, it is usually always much better to take a hit in bandwidth, because the resulting video experience would be better. (Delay and loss are somewhat equivalent in a real-time scenario)

Fundamental design issues for the future Internet

In this paper, the author's thesis is that due to the emergence of multimedia applications such as voice/video telephony with very different delay/bandwidth service requirements from traditional data applications, the following list of things need to be done:

IP should extend its service model, since services which are aligned with service requirements increases overall utility of the network.
The application should request the appropriate service instead of being it being estimated by the network, to maintain separation between network and application layers.
Admission control, instead of over-provisioning, should be adopted to prevent network overload, since it is economically more sound.

The author looks at the Internet from the high-level, and use macro-type of analysis to make his arguments. To do so, a framework is introduced whereby the design objective is to maximize the total utility of users of the Internet, subject to physical constraints of the network. This is not meant as a way to optimize resource allocation or choose network parameters, but rather as a tool to characterize various design choices. Unfortunately, the outcome of the analysis sometimes depend on the various cost and utility parameters, such as the relative cost of over-provisioning and changing network mechanisms, and utility function when certain service requirements are not met. For example, in trying to show that service models should be extended, the argument is assuming that over-provisioning cannot be done and the cost of adding a service model is negligible.

The author argues for an explicit choosing of service by an application, rather than the service being implicitly assigned by the network, based on a layer separation argument. While an implicit assignment might be bad, it is not clear if an explicit assignment would work in practice either, due to issues such as incentives (and authentication, accounting, billing to establish financial incentives). In other words, even if it was established that separate services is the way to go, there might not be a good way to actually assign a service to a packet.

The discussion on admission controls seems to me to be the most sound. I like the use of the utility maximizing framework here, since the outcome is independent of quantities, but just depends on the characteristics of the utility function. The logical outcome of best-effort service for data, and admission control for real-time applications was very satisfying. Unfortunately, it is not clear to me how admission control is to be done for the Internet, since the user (or gateway) has to in effect probe the state of the network, which might not be easy or stationary. Also, such a mechanism can be abused to "excommunicate" users - a chilling thought.

I'm not sure how different the Internet was during the 1990s, but with hindsight, it is interesting to note that various things we accept today, such as spam, web-crawling, IP (video)telephony, were mentioned in this paper.

Monday, September 15, 2008

Congestion control for high bandwidth-delay product networks

This paper proposes eXplicit control protocol (XCP) to address some of the performance shortcomings of TCP, particularly as delay/bandwidth increases. For example, with high RTT, TCP's policy of additive increase increases bandwidth utilization very slowly. Another scenario is with short flows, which is unable to use available bandwidth faster than slow start allows. The authors also point out that with TCP uses an implicit congestion feedback through packet loss, which only indicates the presence/absence of congestion and not its amount, thus sources have to keep pushing the link to congestion before backing off continuously.

As implied by its name, XCP provides feedback about the state of congestion at the routers to senders, by indicating to the senders how much to adjust its congestion window by. Two key elements of XCP are the efficiency controller and the fairness controller, which operate independently. The efficiency controller aims at maximizing link utilization, and uses a feedback controller based on spare bandwidth and queue size to set the aggregate change in rate. The parameters of this feedback controller are chosen based on the feedback delay to ensure that the system is stable. The fairness controller then parcels out the aggregate change to various packets to achieve fairness, based on an additive-increase-multiplicative-decrease mechanism. As the authors noted, XCP can converge to fairness much faster, since the adjustment is done with each RTT instead with each packet loss.

This paper is full of useful ideas that can be applicable in other problems as well. First, there is the use of control theory to examine how a network (the plant) should be controlled and parameters should be chosen. Second, the use of a fluid model to allow the analysis of protocols by approximating a discrete system with a continuous one. (How good the approximation is would be debatable though.) Third, the mention of a shadow pricing model is an interesting way of determining how to allocate resources based on how much users are willing to pay for them.

On the other hand, the policing of misbehaving sources based on border agents sounds like an afterthought to address the vulnerabilities of the system to non-compliant senders. While it is possible (but expensive) to have gateways to shut down the bad sources, it would be far better to have a scheme which provides no ineherent incentives to a source to misbehave - i.e. the compliant sender policy should also be one that gives the sender the highest throughput.

The experimental results are also to be expected. If a router is able to control how sources should adjust their transmission rate, then it is easy to believe that it would be able to achieve high link utilization.

Random early detection gateways for congestion avoidance

This is another paper on congestion avoidance at the gateway. However, here the goal is not so much to enforce fair queuing (although this is done in a way), but to control the average packet queue size so as to ensure a reasonable amount of latency while maintaining high throughput. The basic idea is to mark/drop packets probabilistically when the packet queue in some range (and mark/drop all packets if it exceeds some maximum threshold). The authors list 4 main design goals:

Avoid bias against bursty traffic
Avoid global synchronization (i.e. sources reducing windows at the same time)
Control queue size in absence of cooperating sources

There are two key elements in the RED algorithm. The first is to compute the average queue size for each incoming packet. This is done via a exponential weighted moving average. There is a slight twist in that it also tries to take into account idle time when the queue is empty by computing the average as though some number of small packets enter the gateway during the idle time. This to me seems like a hack, since the average is only updated per incoming packet, whereas what was probably intended is to have an average queue size over time. This could be done in a more principled manner, perhaps using the approach in the "Core-stateless fair queuing paper", but probably this method is easier and does not make too much of a difference in the end.

The second is to determine what probability to drop/mark packets at. The desired probability is computed based on a linear function of the average queue size - intuitively, the larger the queue size, the more packets the gateway should mark. The reason for choosing this feedback mechanism is not made clear, and I think some control theory can be applied to determine what is an ideal probability based on the desired time constant. However, again, this need not be an exact science. What is interesting is their next step, in which the actual probability is computed, based on a desired goal of not having too many packets dropped close to each other and avoid having a long interval between dropped packets. Instead, the process is modified such that packets are dropped uniformly.

One advantage of this approach is that there is no need to track individual flow rates, thus it scales well with the number of connections. Of course, fair queueing is not one of its objective, so there was no need to do so in the first place. On the other hand, because the packet dropping process is probabilistically applied to every incoming packet, it also has the outcome of dropping more packets from the connection using a higher bandwidth, although this only ensures that each connection is penalized by the same fraction.

Wednesday, September 10, 2008

Core-Stateless fair queuing

While starting to read the next paper, a question popped out in my head: does the fair queuing algorithm violate some end-to-end principle? While congestion avoidance is something that seem to best taken care of by the hosts (assuming that the hosts wants reliable communication), I would think that trying to allocate bandwidth fairly among flows is something that needs to be in the network, since the host does not know anything about other hosts and may not even be cooperative in the first place.

Another issue seems to be that for a router to implement fair queuing, it needs to keep track of all flows, which requires state, and grows in complexity with the number of flows. Also, when should a router stop keeping track of a flow? The proposed Core-Stateless Fair Queuing (CSFQ) tries to resolve this by pushing the work out from the core routers to the edge routers. The key idea is for the edge router to label incoming packets with their flow and rate, while the core routers probabilistically drops packets based on the labeled information and an estimated fair rate. The simulations show that CSFQ closely approach a fair queuing variant, even outperforming it in one scenario.

I am not quite sure why this might be better than fair queuing from a complexity point of view. I suppose all the workload is on the edge-routers which are next to senders, while the core routers can do something much simpler on a per packet basis. Would the edge router see less flows than a core router, and thus this is a way of distributing flow tracking work?

As the authors point out, it is not clear how easy it is to track flows. How much problem does this cause if not done properly? Also, how easy is it to label a packet with this flow ID and rate, and how do you assign a unique flow ID (some long random number perhaps?).

Analysis and Simulation of a Fair Queueing Algorithm

This paper considers a fair queuing algorithm for gateways for deciding how to forward incoming packets. The authors claim that queuing attempts to allocate bandwidth, buffer space and promptness, and first-come-first-serve (FCFS) ties all of them together. While the congestion avoidance papers from the previous class consider source/node flow control to vary transmission rate, it was assumed that hosts are cooperative. Another desired property of a queuing algorithm is to allow gateways to control congestion even when certain hosts are non-compliant.

The proposed fair queueing (FQ) algorithm essentially tries to emulate a bitwise round-robin servicing; i.e. each conversation (between host-destination) takes turn sending a bit. For practical reasons, a packet-by-packet transmission is done instead, but the packet arrival time and packet size is used to decide when it is to be sent. A separate parameter is used to regulate promptness, by taking into account when a packet was last transmitted for that conversation. Finally, if a packet arrives when the buffer is full, the latest packet from the conversation using the most buffer space is dropped, BUT the conversation would still be charged for that packet. This essentially punishes out-of-control senders.

The simulations show essentially that FQ works with a range of flow control algorithms to provide fair bandwidth distribution for high-throughput FTP and low delay for Telnet, in contrast to FCFS which could produce "winners" for FTP and high delay for Telnet. It also very nicely shuts down a ill-behaved source, which seemed to me a most desirable outcome.

I liked this paper because it addressed a question I had from the earlier readings, which is how to program gateways to deal with mis-behaving hosts and somewhat control congestion. There are also many questions that the authors posed which could lead to interesting research. For example, it was not clear if a game-theoretic analysis of fair queuing would lead to the conclusion that the best thing hosts could do is to adopt some reasonable flow control, and that they cannot do any better (i.e. cheat to get more bandwidth).

Monday, September 8, 2008

Congestion avoidance and control

The previous paper ("Analysis of Increase/Decrease...") could be thought of as providing a theoretical reason for additive-increase/multiplicative-decrease. This paper seem to be addressing practical implementation issues, describing a number of algorithms for TCP while providing some empirical reasons as to why they make sense. On the whole, I found this paper to be less rigorous, oftentimes hand-waving arguments through, and taking a rather condescending tone towards readers. Note to self: do not ever put your readers in a foul mode through your writing.

The paper assumes that rate-control is achieved with transmit windows (unlike the previous paper which was agnostic to how rate-control was performed). The first algorithm the authors explained was slow-start, which gradually increases the rate to the advertised rate to prevent a sender from overwhelming network resources when starting transmission. This is of course not the same as the AIMD congestion avoidance, so the authors describes how both slow-start and AIMD can be used together, but in the Appendix. Anyway, the key idea is that in multiplicative-decrease, instead of simplying using a congestion window that is halved, slow-start is used to reach the halved congestion window, before switching to the more conservative additive increase mode (if possible).

Another issue that is addressed is estimating RTT, which is necessary for determining when to perform retransmission. Along the way, the authors also mention the use of exponential back-off in retransmission.

Here, the paper mention reliably detecting knee-points as an open problem, and mentions the possibility of using RTT/queue length prediction as one possible method. Another issue is that of coping with misbehaving hosts (whether accidental or intentional). This was mentioned early in the introduction, leading me to hope that it would address the questions I had about the previous paper. However, it was not brought up again until the end of the paper when the authors discussed future work for gateway control, though there did not seem to be any concrete ideas.

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

This paper analyzes increase/decrease algorithms for congestion avoidance. Early in the paper, a key distinction is made between congestion control, which tries to prevent the network from entering a congested state, and congestion avoidance, which tries to keep the network around the "knee", or "sweet spot", before throughput saturates and response time increases with increasing load. Increase/decrease algorithms are ones where users increase/decrease their load in response to feedback from the network resource. In the paper, the authors focus on binary feedback (overloaded/underloaded), and look at linear control for the most part.

Through the use of efficiency, fairness and distributedness criterions (and lack of global knowledge), the authors show algebraically that the increase stage must have an additive component greater than 0, and multiplicative component of at least 1, and the decrease stage must have an additive component of 0, and multiplicative component at least 0 and less than 1. The actual parameters can be chosen based on convergence criterion, and essentially lead to an additive increase and multiplicative decrease. It is also nice to see that there is an algebraic trade-off between responsiveness and smoothness - something that is to be expected.

What I liked a lot about this paper was the use of the vector diagrams to demonstrate the criterions, and to give intuition about how different linear increase/decrease algorithms would behave with respect to those criterion. The paper was very well-written and easy to read, and very quickly describes the problem at hand, narrows the scope, and deliver the punchlines effectively. There were no surprises along the way.

I think this paper provides a convincing argument as to why the additive-increase/multiplicativ-decrease is the best choice for congestion avoidance. The scheme is simple to implement, is distributed among hosts, requires low-rate feedback that is easily generated by network resources, and yet is provably shown to satisfy efficiency and fairness criterion (albeit ones chosen by the authors) while achieving optimal (?) convergence, at least among linear increase/decrease control algorithms.

Issues that were not discussed is how to identify the knee point, which would be necessary in generating the binary feedback. This could be a straightforward thing to do, but it is not clear to me. The efficiency criterion is closely related to this knee point, and if this knee point is not chosen correctly, than being overloaded would be extremely disastrous.

Because the scheme is so devastingly simple, it could have the effect of closing off research on congestion avoidance schemes, since even if another scheme could be shown to perform better, it would probably not be simpler to implement in practice. On the other hand, an possible weakness seems to be the assumption of compliance and cooperativeness. What would happen if a host cheats and continues to increase load even as it knows (from feedback) that others would decrease their load? Would there be another congestion avoidance scheme that could punish non-compliant hosts?

Wednesday, September 3, 2008

On inferring autonomous system relationships in the Internet

As its name suggests, the purpose of the work is to infer relationships (peering, transit etc.) between ASes in the Internet, by analyzing BGP routing tables. I have to quickly admit that I do not understand the purpose of such an endeavour. The author hints at the role that such relationships play on the "structure of the Internet and the end-to-end performance characteristics", but its not clear how elucidating the actual relationships between ASes will improve that understanding. The author does give two other examples of using this information in the conclusion, although I think the paper would have been better served by putting it in the introduction in order to better motivate the problem. Anyway, the two examples are that "ISPs can reduce misconfiguration and debug router configuration files" and to "plan for future contractual agreements."

The work uses some interesting heuristics. First is the heuristic that the size of an AS can be determined by the degree of its (logical) connections. Second, the work assumes a reasonable route export/import policy based on the nature of the relationship between ASes. This assumption is then used to prove a key theorem, that is, all AS paths in any BGP routing entry satisfies the following:

"(i) A provider-to-customer edge can be followed by only provide-to-customer or sibling-to-sibling edges;

(ii) A peer-to-peer edge can be followed by only provide-to-customer or sibling-to-sibling edges"

The basic algorithm looks at each AS path, and guesses the top provider to be the one with the highest degree. Pairs of routers before (after) the top provider are then inferred to be either sibling-to-sibling or customer-to-provider (provider-to-customer). Sibling-to-sibling are simply connections where a transit relationship occur in both directions. A more refined algorithm makes use of soft information, in that each AS path provides evidence in support of a transit relationship, but a minimum number of such evidence needs to be present. A separate step is used to infer peering relationship based on certain heuristics, which I admit I do not fully understand (hence a suitable point of discussion =p).

It was difficult to make sense of the experimental results. In V.A., the main claim is that the numbers are fairly consistent over 3 different readings, but perhaps what this is saying is just that given fairly similar routing entries (how similar??), the algorithm would not give wildly different results. The results presented in V.B. could also be misleading. For example, even if confirmation of customers is 100%, it is just saying that there were no false positives of identifying of customers; there could have been a missed detection (please correct me if I'm wrong).

One possible extension is a refinement of the refined algorithm. Instead of using a simple count, perhaps some notion of either AS path reliability can be used in a weighting. Also, as the author pointed out, the heuristic that a top provider is one with the highest degree may in fact not be true. Is some kind of consistency check among the AS paths able to more reliably find the top providers? At the end of the day, it might boil down to the algorithms doing a effective one pass through all the AS paths, whereas an iterative inference algorithm could have sorted things out better.

highest degree - top provider

Interdomain Internet Routing

Coming with minimal background in computer networks, each paper is a fascinating study into the inner workings of the Internet. The lecture "Interdomain Internet Routing" attempts to discuss how routing works between autonomous systems (AS) which consist of networks of nodes, but within the context of commercial operation of ASes by ISPs. The 0th order item that the paper points out is the fact that a routing entry equals connectivity, and that in effect an ISP is charging a node for advertising routing information to it (which was non-obvious to me). Related to that are the concepts of transit, where ISP charges for access to its routing table, and peering, where two ASes exchanges a subset of their routing tables (financial or otherwise). Exporting and importing routes (connectivity) costs the ISP money, thus it requires a set of policies that are commercially sound.

The paper then goes on to describe the Border Gateway Protocol (BGP), which is used primarily by routers of different ASes to exchange route information, its design goals and how it can be used to implement routing policies. The main design goals are scalability, policy (expressiveness) and cooperation under competitive circumstance (imho ensuring things go well in a distributed manner even among selfish ISPs).

The paper also addresses how BGP can be used in internal sessions (iBGP) by routers on the edge (eBGP) within an AS to exchange routing information about the external world. While eBGP routers can be fully connected to exchange such information, this is not scalable. Thus, two techniques, router reflectors and confederations, were discussed. Both however have to be configured manually, which naturally leads to the question if such a process (exchanging routing information among all eBGP routers) can be done autonomously for ease of deployment.

Finally, two open problems are discussed. First is multi-homing, where a customer might want to have access through two ISPs, e.g. for redundancy reasons. The two main issues seem to be scalability, since hacks to make multi-homing work by tweaking the ASPATH seem to require some knowledge of the AS topology, and size constraints on ISP routing tables, which impose a high minimum of nodes the customer pays for. Second is the slow convergence of BGP after the occurrence of a fault.

This piece is obviously meant as a tutorial, and for me, its main value is in shedding light on how ISP actually inter-connect (and the role that routing plays), and in a way that makes sense commercially. For obvious reasons, it assumes that the reader has sufficient exposure to the material, so for me, reading it involved a lot of searching on Wikipedia.