Thoughts on network papers: November 2008

Thursday, November 20, 2008

Scalable application layer multicast

This paper adopts an application overlay approach to multicast. The main focus of the paper is in building a suitable hierarchical structure of nodes such that the average case control overhead is O(1) and worst case is O(log N), where N is the number of nodes. (There is also a parameter to control the size of each cluster that can creep into those numbers). Care is taken to ensure that the parent of a set of nodes is about equally "far" away from them, and that when nodes join or leave, this is maintained, along with the size of a cluster of nodes (a cluster is like a set of nodes for which its parent is responsible for distributing data to), and that a tree structure is maintained (to avoid loops/data duplication).

The host joining algorithm reminds me of a tree VQ - a node finds the "right" cluster by going down the tree, and at each layer, selecting the cluster with the "closest" cluster head. However, it is not clear how the cluster leader is selected when there either changes in membership or in network conditions. It probably cannot be the cluster leader that determines when to relinquish, since it has an incentive to cheat in return for lower latency. Do the nodes do some form of voting based on the heartbeat messages? The adoption of allowing a node to immediately receive data from the current layer at which its querying is a nice touch to reduce latency of when data first starts flowing.

So this seems to be more of a framework that describes "best practices" for building a multicast tree, rather than a protocol that can be used off the shelf. What would be the best way to implement the ideas here such that a content provider can use it with minimal fuss?

I would be concerned with overhead traffic, since building and maintaing the tree require a fair amount of control coordination. The simulation results, does not quite agree with the theory that the average control overhead is constant with number of nodes. The standard deviation grows, but that is to be expected due to the worst case - in fact that should be the number to be reported. The other comparison I would have made is with an "ideal topology" for each set of conditions - i.e. when nodes leave/join, the "ideal topology" is one in which a tree builder, given central control, would have built. This would be more meaningful than IP Multicast to judge how good NICE is at maintaining good hierarchies.

Wednesday, November 19, 2008

A reliable multicast framework for light-weight sessions and application level framing

Despite the grand sounding title, this paper is really about repair mechanisms that are suitable for a multicast applications, with the whiteboard application as a specific example. Some means of doing multicast is assumed, and the paper does not quite specify how that is done. The main issues are to prevent the sender from seeing an "ACK implosion", and to exploit the fact that there are multiple receivers. Two key design choices are (i) after loss detection, the receiver should only request repair after about a fixed time that depends on its distance from the sender; (ii) that time should be randomized to prevent synchronization of repair requests. This allows receivers with lost data to benefit from other repair requests from nodes closer to the sender.

Honestly, I think the paper is a little long for what are two simple, though useful, ideas. I was really expecting something else, especially with the title and the rather general introduction. No rough gauge of timings for the whiteboard application is provided. I wonder how interactive the application can be if it relies on such a repair mechanism. To me, a countdown timer that depends on how far a receier is from the sender is one alternative; what about something that depends on how far one is from the fault (if that can be determined).

I also didn't think its right to just brush the actual implementation of multicast under the rug. If anything, how to make repairs should depend on the actual implementation. For example, if a tree-based multi-cast is used, why can't a node simply forward a repair request to its parent as soon as it detects a loss - the parent can suppress the request if necessary, and so the receiver still would not be overwhelmed by repair requests.

Monday, November 17, 2008

A delay-tolerant network architecture for challenged internets

This paper tries to address the poor performance of Internet protocols in "challenged conditions", such as those with some combination of low bandwidth, high delay, frequent down-time, long queuing delay. These causes problems due to mismatches with some of the implicit assumptions made in developing these protocols. The author propose an overlay network architecture that relies on existing links/protocols to implement a store-and-forward solution. DTN gateways control access to regions, and either bridge regions or provide a way to communicate with other nodes that are not in the same region.

This sounds very similar to an email setup. A sender sends emails to a mail gateway, which then routes it to the mail server of the receiver. Finally, the receiver pulls the email from the mail server. In this case, the DTN gateway would push the message to the receiver - isn't that what RIM does for Blackberry? Another difference, in the DTN architecture, hosts can send data directly if they are in the same region.

An architecture for internet data transfer

In this paper, the authors propose an abstraction for data transfer. Control logic still stays with applications, and in particular, communications for control logic is separated from the data transfer. On the other hand, data transfer can be initiated as a separate service, where the application do not have to worry about the actual means of transfer, just that the receiver will eventually get the data transmitted by the sender. By providing this abstraction, the authors hope that various types of transfer techniques can be used and adopted widely.

To me, this is one of those papers that sounds like a good idea at first, but a little further thinking makes overturns that initial impression. I don't think all applications would want any kind of data transfer service; each application would have its own specific need, and would probably need to choose an appropriate type of transfer technique. Web surfing, downloading or video streaming would each require different types of services. One possibility would be for the application to specify its preference, but that defeats the purpose of the abstraction. Besides, applications can simply link against libraries of the desired transfer.

I'm also not sure how this would incorporate data streaming. Right now, the entire piece of data needs to be provided by the application before a hash can be computed for its ID. Obviously this is not possible for live streaming, where the application would not even know what the data is ahead of time, and the receiver wants to get the data when its available. One possibility would be to consider each chunk a separate piece of data and use the construct here, but that's clunky.

However, benefits such as load-balancing over multiple paths and caching of duplicate data does seem tempting. Perhaps a different abstraction that provides these benefits while letting the application choose the type of transfer would work better?

Thursday, November 13, 2008

X-Trace: A pervasive network tracing framework

This paper presents an architecture for performing logging of network requests and responses, by inserting task-identifiable metadata into the request. The catch is that all the network elements/protocols that lie in the paths of this would need to modified in order to handle and propagate such metadata. Logging of data is handled by individual devices thus ensuring that sensitive information is not shared unwittingly, while the task id in the metadata allows for cooperation across domains. A (more than) proof of concept implementation is also presented.

Given that there is a hit in performance, no matter how small, I wonder if X-Trace would only be used mostly for debugging, rather than for logging of all network operations. Even if devices were X-Trace enabled, they would only propagate metadata when it is present.

As the paper mention, a user study on how much such a tool improves fault finding would provide much motivation for deploying such an architecture.

With respect to security, how easy would it be for a malicious node to hide its activity by propagating modified X-Trace data such that everything seems normal? This could make it difficult to pin-point where the true problem lies.

Wednesday, November 12, 2008

End-to-end internet packet dynamics

This is an interesting (but long) paper on determining end-to-end performance in the Internet. Analysis is done on packet traces of TCP transfers between pairs of end-nodes, where both sender and receiver are under the control of the author. Knowledge of the TCP protocol is also used to undo its effects in order for analysis of the underlying network to be performed.

If there is one key lesson from this paper, it is there there is no single number that describes characteristics such as percentage of out-of-order packets, bottleneck bandwidth, packet loss rates and jitter. These change over time, and depend on sites, network elements, TCP implementation, among others. For an internet application, this probably means constant probing and monitoring of such metircs.

The paper is generally full of interesting tidbits, although I cannot say that there is a consistent theme throughout. Among the many things I learnt was how to robustly measure bottleneck bandwidth.

I wonder if someone has did a similar thing for UDP packets, which might be interesting for streaming applications. While the author points out that TCP has this "harm-no-others" property, is that really a good reason for not looking at UDP? I think a practical reality is what rate to choose to send UDP packets at, since obviously the characteristics would be very different at different transmission rates.

Wednesday, November 5, 2008

Internet indirection infrastructure (i3)

The basic idea in this paper is as follows. Instead of a sender directly transmitting packets to a receiver, the receiver first indicates through a trigger what data id it is interested in and where to send it to, while a sender simply sends data conceptually to a data id. In practice, the sender first looks up which server is responsible for handling triggers corresponding to a data id by using some DHT (Chord in this paper) and sends it to that server. An extension is where the data id and receiver addresses can be stacked in order to provide other functionality such as third party processing of data (similar to the outsourced firework?). All these takes place on an overlay network.

Naturally, a key question is how performance is impacted by such an arrangement. Simulations show that latency goes up by about 1.5-4 times, depending on the number of samples used in a heuristic to find an identifier with a closest server. This is for the case when the sender already knows which forwarding server to use. For the first packet, where the sender needs to look up the server, the latency can be up to 15 times larger.

As before, who participates in the DHT for trigger lookup? All the folks who wish to use i3, or a dedicated set of servers? I suppose to figure out what id to use for a certain content, users would have to use some out-of-band communications.

Middleboxes no longer considered harmful

In this paper, the authors propose an architectural solution to accommodating "middleboxes", or network elements that violate certain Internet layering, such as NATs and firewalls. They argue that these middleboxes are likely to be useful even with migration to IPv6, for reasons such as the need to maintain private address realms or to outsource firewall operations. To do this, Delegation-Oriented Architecture (DOA) is proposed, which sits on top of the IP layer.

From my understanding, each host is given a unique 160bit identifier (a cryptographic hash?) which makes each host uniquely addressable. EIDs can be resolved through the use of DHT to either an IP address of a delegate or another EID. Essentially, the architecture allows a sender to address a packet using an EID (of the receiver), while the receiver can delegate responsibility for handling/forwarding packets for itself to some intemediary (addressed by either EIDs or IP address). I am tempted to think of this as providing a mixture of routing and DNS on top of "normal" Internet.

What I am not clear about is who maintains responsibility for operating the DHT for EID resolution. I am also torn between whether this is meant to be operated among small network of users (i.e. each group of hosts could operate their own DOA) or is meant to be a netwide deployment, where everyone would share the same EID namespace. In the latter case, how would DHT scale?

Overall, it is a very intriguing idea. Does it have additional value over IPv6? I'm not sure if I would agree with the authors' claims that it does. Outsourcing of firewall functions can probably still take place without DOA, though it is likely that NATs will still have value (multiple hosts sharing a single internet connection).

Monday, November 3, 2008

DNS performance and the effectiveness of caching

This paper presents a study in which traces of DNS lookups and responses, as well as TCP starts and ends, were collected for analysis and simulations for testing out of ideas. To be honest, it is not clear to me why some of the analysis they would be interesting, but it is probably just me unable to appreciate what they have done. The more interesting portions are the experiments where the traces are used to drive simulations in which various parameters or ideas can be tested. The main results seem to be that reducing TLL of adress records down to a few hundred seconds has limited ill effect on cache hit rates. Also, while sharing DNS cache among clients might be useful, it saturates at about 10-20 clients.

One concern I have is with the time period in which data was collected. Its not clear why data was not collected over the same time period (i.e. the same week) in two different years (perhaps a deadline?) The number of students/users on campus on Dec 4 to Dec 11 (during the regular semester) would be different from Jan 3 to Jan 10 (corresponding to an optional month-long independent activities period). The number of lookups is higher in mit-dec00, but of course this could be in part due to increasing hosts and websites. The other problem with different periods would be that the type of students could be different, e.g. more international students staying behind in Jan, which might contribute to a different set of domains being looked up. This is of course speculative, but it would be more convincing (to me) to keep it the same period, especially in some of the trends that the authors claim in the paper. With just two time-points that are not even at the same time of the year, I would not believe any of the trends stated in the paper.

The other issue for me is that the authors claim that in hindsight, certain collection could be done differently. For example, 128 bytes collected per packet is insufficient; how did this affect the analysis? Also, the collection of UDP flows as well could verify some of the claims made. I imagine that the collection of data packets is not a trivial task, but if all the collection software/hardware is already in place, what would prevent the authors from re-collecting the data? The only reason I can think of is bureaucratic hoops that the authors would have to jump through for each of the collection.

Development of the Domain Name System

This paper describes the design decisions behind DNS. It grew out of a need to change from a centralized hostname-to-address mapping (i.e. HOSTS.TXT) to a distributed system. I am rather amused (though not surprised) that the original centralized system is called HOSTS.TXT - for the name of the file that contains the list of hostnames and addresses. Since I was not too familiar with how DNS works, I would have preferred the paper to briefly describe the process of lookups instead of launching into why things are done in certain ways. I suppose that the intended audience is one who already know the lookup process.

The over-arching design aims seem to be distributed management and lean queries. Distributed management is mainly achieved through a hierarchical approach to delegating responsibility of maintaining the name space. A lean service is achieved by constraining the type of database operations that are permitted. Such a distributed database service reminds me of DHT, except that here, responsibility for domains need to be delegated since domain name administrators presumably want control over who can use its domain.

The other key feature of DNS seem to be caching of query responses, which prevents overloading of the root servers. There is however a tension between long caching (larger TTL) which allows for more reuse, and short caching (smaller TTL) which allows for more dyanmic DNS responses. Negative caching seems like a smart thing to do, since an attack with bogus addresses can potentially overload DNS resolver servers; I wonder if this is something that is done in practice these days.

Overall, I like this paper. It provides a historical context for the development of DNS, while explaining not just the design goals and approaches, but also talks about implementation issues and the lessons learnt.