Another PRCS/XDELTA question for Josh Jonathan S. Shapiro (shap@eros-os.org)
Mon, 17 Apr 2000 22:39:04 -0400

Josh:

I want to make sure I have enough information expressed at the server interface to be able to apply XDELTA or XDFS later. At the moment, I simply have an interface of the form put(universal-name, byte buffer). I think this should probably be augmented with:

revise(old-entity-universal-name, new-universal-name, byte buffer)

If I use an interface in this style, does it provide enough information that XDELTA and your cool DB-based file system can get the job done?

The reason I ask is that your master's thesis appears to assume that files are sequenced objects (I may have failed to understand something here). This creates a problem in distribution, because the sequence numbering cannot be resolved locally. I have therefore adopted a design in which every object version is conceptually distinct, and the delta management is an artifact of the store implementation. The API clearly needs to provide enough hints that the server can do something vaguely intelligent about creating deltas, but must not introduce a serialization requirement. I can see no inherent reason why the "base" file of any given delta cannot be arbitrarily chosen. The most efficient delta is likely to come from a previous version of the same file, but there is no law of nature that says this will always be true.

So: is the revise() interface above a sufficient interface, or would integrating with your stuff require something more? If so, can you say what that might be?

Hmm. I'm also making a naive assumption, which is that the rsync() algorithm does a pretty good job of minimizing traffic across a wire if we assume that the store's delta generator did a decent job. The exception to this is that I may not want to sync all of a repository, so at some baseline I need to get something that is a complete file from which all of my deltas can be resolved.

I'm not currently worrying much about optimizing the wire protocol. The current protocol has:

get(universal-name) => byte-buffer

The server is free to ship the response as either a delta or a complete file. I will shortly be adding:

get-expanded(universal-name) => byte-buffer

This can be used by the client to obtain a baseline version (undelta'd) of a file. It is the responsibility of the store to ensure that it is always capable of generating a file in expanded form.

Your design appears to want very much to support:

get-delta(from-universal-name, to-universal-name)

I certainly see why this is useful, but I don't plan to leverage this in the first version of the DCMS wire protocol. The algorithms for deciding how to leverage a get-delta() function are non-trivial, and here again I'm after getting something that works first. I am inclined to suspect that get-delta() is a second order optimization on the wire transfer.

Reactions, critiques, and suggestions would be greatly appreciated, both in general and with respect to how well this will facilitate meshing with your work.

Jonathan