Thursday, November 20, 2008

Scalable application layer multicast

This paper adopts an application overlay approach to multicast. The main focus of the paper is in building a suitable hierarchical structure of nodes such that the average case control overhead is O(1) and worst case is O(log N), where N is the number of nodes. (There is also a parameter to control the size of each cluster that can creep into those numbers). Care is taken to ensure that the parent of a set of nodes is about equally "far" away from them, and that when nodes join or leave, this is maintained, along with the size of a cluster of nodes (a cluster is like a set of nodes for which its parent is responsible for distributing data to), and that a tree structure is maintained (to avoid loops/data duplication).

The host joining algorithm reminds me of a tree VQ - a node finds the "right" cluster by going down the tree, and at each layer, selecting the cluster with the "closest" cluster head. However, it is not clear how the cluster leader is selected when there either changes in membership or in network conditions. It probably cannot be the cluster leader that determines when to relinquish, since it has an incentive to cheat in return for lower latency. Do the nodes do some form of voting based on the heartbeat messages? The adoption of allowing a node to immediately receive data from the current layer at which its querying is a nice touch to reduce latency of when data first starts flowing.

So this seems to be more of a framework that describes "best practices" for building a multicast tree, rather than a protocol that can be used off the shelf. What would be the best way to implement the ideas here such that a content provider can use it with minimal fuss?

I would be concerned with overhead traffic, since building and maintaing the tree require a fair amount of control coordination. The simulation results, does not quite agree with the theory that the average control overhead is constant with number of nodes. The standard deviation grows, but that is to be expected due to the worst case - in fact that should be the number to be reported. The other comparison I would have made is with an "ideal topology" for each set of conditions - i.e. when nodes leave/join, the "ideal topology" is one in which a tree builder, given central control, would have built. This would be more meaningful than IP Multicast to judge how good NICE is at maintaining good hierarchies.

Wednesday, November 19, 2008

A reliable multicast framework for light-weight sessions and application level framing

Despite the grand sounding title, this paper is really about repair mechanisms that are suitable for a multicast applications, with the whiteboard application as a specific example. Some means of doing multicast is assumed, and the paper does not quite specify how that is done. The main issues are to prevent the sender from seeing an "ACK implosion", and to exploit the fact that there are multiple receivers. Two key design choices are (i) after loss detection, the receiver should only request repair after about a fixed time that depends on its distance from the sender; (ii) that time should be randomized to prevent synchronization of repair requests. This allows receivers with lost data to benefit from other repair requests from nodes closer to the sender.

Honestly, I think the paper is a little long for what are two simple, though useful, ideas. I was really expecting something else, especially with the title and the rather general introduction. No rough gauge of timings for the whiteboard application is provided. I wonder how interactive the application can be if it relies on such a repair mechanism. To me, a countdown timer that depends on how far a receier is from the sender is one alternative; what about something that depends on how far one is from the fault (if that can be determined).

I also didn't think its right to just brush the actual implementation of multicast under the rug. If anything, how to make repairs should depend on the actual implementation. For example, if a tree-based multi-cast is used, why can't a node simply forward a repair request to its parent as soon as it detects a loss - the parent can suppress the request if necessary, and so the receiver still would not be overwhelmed by repair requests.

Monday, November 17, 2008

A delay-tolerant network architecture for challenged internets

This paper tries to address the poor performance of Internet protocols in "challenged conditions", such as those with some combination of low bandwidth, high delay, frequent down-time, long queuing delay. These causes problems due to mismatches with some of the implicit assumptions made in developing these protocols. The author propose an overlay network architecture that relies on existing links/protocols to implement a store-and-forward solution. DTN gateways control access to regions, and either bridge regions or provide a way to communicate with other nodes that are not in the same region.

This sounds very similar to an email setup. A sender sends emails to a mail gateway, which then routes it to the mail server of the receiver. Finally, the receiver pulls the email from the mail server. In this case, the DTN gateway would push the message to the receiver - isn't that what RIM does for Blackberry? Another difference, in the DTN architecture, hosts can send data directly if they are in the same region.

An architecture for internet data transfer

In this paper, the authors propose an abstraction for data transfer. Control logic still stays with applications, and in particular, communications for control logic is separated from the data transfer. On the other hand, data transfer can be initiated as a separate service, where the application do not have to worry about the actual means of transfer, just that the receiver will eventually get the data transmitted by the sender. By providing this abstraction, the authors hope that various types of transfer techniques can be used and adopted widely.

To me, this is one of those papers that sounds like a good idea at first, but a little further thinking makes overturns that initial impression. I don't think all applications would want any kind of data transfer service; each application would have its own specific need, and would probably need to choose an appropriate type of transfer technique. Web surfing, downloading or video streaming would each require different types of services. One possibility would be for the application to specify its preference, but that defeats the purpose of the abstraction. Besides, applications can simply link against libraries of the desired transfer.

I'm also not sure how this would incorporate data streaming. Right now, the entire piece of data needs to be provided by the application before a hash can be computed for its ID. Obviously this is not possible for live streaming, where the application would not even know what the data is ahead of time, and the receiver wants to get the data when its available. One possibility would be to consider each chunk a separate piece of data and use the construct here, but that's clunky.

However, benefits such as load-balancing over multiple paths and caching of duplicate data does seem tempting. Perhaps a different abstraction that provides these benefits while letting the application choose the type of transfer would work better?

Thursday, November 13, 2008

X-Trace: A pervasive network tracing framework

This paper presents an architecture for performing logging of network requests and responses, by inserting task-identifiable metadata into the request. The catch is that all the network elements/protocols that lie in the paths of this would need to modified in order to handle and propagate such metadata. Logging of data is handled by individual devices thus ensuring that sensitive information is not shared unwittingly, while the task id in the metadata allows for cooperation across domains. A (more than) proof of concept implementation is also presented.

Given that there is a hit in performance, no matter how small, I wonder if X-Trace would only be used mostly for debugging, rather than for logging of all network operations. Even if devices were X-Trace enabled, they would only propagate metadata when it is present.

As the paper mention, a user study on how much such a tool improves fault finding would provide much motivation for deploying such an architecture.

With respect to security, how easy would it be for a malicious node to hide its activity by propagating modified X-Trace data such that everything seems normal? This could make it difficult to pin-point where the true problem lies.

Wednesday, November 12, 2008

End-to-end internet packet dynamics

This is an interesting (but long) paper on determining end-to-end performance in the Internet. Analysis is done on packet traces of TCP transfers between pairs of end-nodes, where both sender and receiver are under the control of the author. Knowledge of the TCP protocol is also used to undo its effects in order for analysis of the underlying network to be performed.

If there is one key lesson from this paper, it is there there is no single number that describes characteristics such as percentage of out-of-order packets, bottleneck bandwidth, packet loss rates and jitter. These change over time, and depend on sites, network elements, TCP implementation, among others. For an internet application, this probably means constant probing and monitoring of such metircs.

The paper is generally full of interesting tidbits, although I cannot say that there is a consistent theme throughout. Among the many things I learnt was how to robustly measure bottleneck bandwidth.

I wonder if someone has did a similar thing for UDP packets, which might be interesting for streaming applications. While the author points out that TCP has this "harm-no-others" property, is that really a good reason for not looking at UDP? I think a practical reality is what rate to choose to send UDP packets at, since obviously the characteristics would be very different at different transmission rates.

Wednesday, November 5, 2008

Internet indirection infrastructure (i3)

The basic idea in this paper is as follows. Instead of a sender directly transmitting packets to a receiver, the receiver first indicates through a trigger what data id it is interested in and where to send it to, while a sender simply sends data conceptually to a data id. In practice, the sender first looks up which server is responsible for handling triggers corresponding to a data id by using some DHT (Chord in this paper) and sends it to that server. An extension is where the data id and receiver addresses can be stacked in order to provide other functionality such as third party processing of data (similar to the outsourced firework?). All these takes place on an overlay network.

Naturally, a key question is how performance is impacted by such an arrangement. Simulations show that latency goes up by about 1.5-4 times, depending on the number of samples used in a heuristic to find an identifier with a closest server. This is for the case when the sender already knows which forwarding server to use. For the first packet, where the sender needs to look up the server, the latency can be up to 15 times larger.

As before, who participates in the DHT for trigger lookup? All the folks who wish to use i3, or a dedicated set of servers? I suppose to figure out what id to use for a certain content, users would have to use some out-of-band communications.