Wednesday, November 19, 2008

A reliable multicast framework for light-weight sessions and application level framing

Despite the grand sounding title, this paper is really about repair mechanisms that are suitable for a multicast applications, with the whiteboard application as a specific example. Some means of doing multicast is assumed, and the paper does not quite specify how that is done. The main issues are to prevent the sender from seeing an "ACK implosion", and to exploit the fact that there are multiple receivers. Two key design choices are (i) after loss detection, the receiver should only request repair after about a fixed time that depends on its distance from the sender; (ii) that time should be randomized to prevent synchronization of repair requests. This allows receivers with lost data to benefit from other repair requests from nodes closer to the sender.

Honestly, I think the paper is a little long for what are two simple, though useful, ideas. I was really expecting something else, especially with the title and the rather general introduction. No rough gauge of timings for the whiteboard application is provided. I wonder how interactive the application can be if it relies on such a repair mechanism. To me, a countdown timer that depends on how far a receier is from the sender is one alternative; what about something that depends on how far one is from the fault (if that can be determined).

I also didn't think its right to just brush the actual implementation of multicast under the rug. If anything, how to make repairs should depend on the actual implementation. For example, if a tree-based multi-cast is used, why can't a node simply forward a repair request to its parent as soon as it detects a loss - the parent can suppress the request if necessary, and so the receiver still would not be overwhelmed by repair requests.

No comments: