Re: "file type" support Jonathan S. Shapiro (shap@eros-os.org)
Wed, 12 Apr 2000 10:10:14 -0400

I have actually given this some thought. :-)

First, I agree that a text file should be checked out in a fashion appropriate to the client system. This means that newline conventions should be updated. It also means that for purposes of computing the hash there must be a canonical representation of text file newlines. The file need not actually be stored that way.

My initial thought was to have the user agent mark which files are text and convert all text files to UNIX conventions on checkin, and to client conventions on checkout, computing the hash in terms of the UNIX conventions. Generally, my experience is that files with line endings preserved arise only when dealing with non-native files (e.g. the info file for CD-ROM autorun that I built for the last SOSP, which needed to be in DOS format in spite of the fact that I was working on a UNIX system). My current view is that such files are best treated as binary files.

You definitely want to be able to override things.

After some thought, though, I concluded that this view wasn't ideal. The notions of "text" and "binary" are useful human conventions (and one we should support), but for the purposes of the CM system what we are really trying to do is specify the input transformer rule and the output transformer rule. That is, we are specifying a filter program. In the limit, I want to be able to write filter programs in Scheme and specify an arbitrary filter. This divorces keyword expansion from any particular CM system policy about keywords, for example. Under this model, if you change the transformer rule, you really are changing the content in the repository (even if the bytes are accidentally the same), so a new checkin is appropriate.

The current configuration file captures:

(fs-name, true name, type,...)

where type is currently one of 't' or 'b', but will eventually become the name of a scheme code entity.

shap



From: "Justin Mason" <jm@jmason.org>
To: "Bill Frantz" <frantz@netcom.com>
Cc: "Jonathan S. Shapiro" <shap@eros-os.org>; "DCMS List" <dcms-dev@eros-os.org>
Sent: Wednesday, April 12, 2000 7:27 AM
Subject: "file type" support

>
> Bill Frantz said:
>
> > Unfortunately, for source files we must fact up to the line end issue.
(I
> > deeply wish one system had won. Even the one I most dislike.) My
> > suggestion is to define a canonical line ending convention, and convert
all
> > incoming source files to that convention.
> >
> > There may be a need to support binary files as well. An example might
be
> > project files for some of the popular IDEs.
>
> Definitely! Also images, etc.
>
> Note that some projects may require so-called "text" files, but with the
> line-endings preserved; so being able to specify a file as being of type
> text or binary would be beneficial, without any CMS file-identification
> heuristics getting in the way. Being able to re-designate an existing file
> as being a different type would be good too, but I suppose a delete op
> following by an add -t type op would be acceptable.
>
> There will probably be more required types of course, as this discussion
> goes on. I can think of one other useful one, used by Perforce to support
> RCS-style $Id$ tags; Perforce uses a default file-format type of "text"
> which will not interpolate these tags, and is faster as a result; then
> "ktext", which will, and is obviously slightly slower.
>
> It also uses a file type to represent if the file should be executable or
> not (which IMHO sounds like a workaround requiring another layer of
> metadata in their system ;), and a file type for a symlink (first time
> I've seen a CMS that allows check-ins of symbolic links!). Also, it has 2
> file types for binary files, one of which is stored compressed in the
> repository, the other uncompressed (for already-compressed binaries),
> which is a nice tweak.
>
> Their binary format does not handle deltas. The entire binary file for
> each rev is stored in the repository.
>
>
> BTW, on another issue: regarding encryption of repository files -- would
> it be possible to simply allow optional encryption of the on-disk
> representation? Then, if the synchronize/commit/label operations, and
> other ops that require network traffic, are able to be used over an ssh
> tunnel, you've got a reasonably secure system without too much work being
> involved, and you won't have crypto throughout the code, just in the file
> IO code.
>
> BBTW being able to run the network aspects over an ssh tunnel will greatly
> simplify the usability of it too IMHO.
>
> --j.
>
>
>