US Patent 5,301,316

Mark S. Miller markm@caplet.com
Sun, 18 Apr 1999 12:52:45 -0700


At 11:50 AM 4/18/99 , Charles Landau wrote:
>An obvious simplification is: Ask A to return an object C that refers to
>A but will only compare. (C is a weakened proxy for A.) Then ask B
>whether it is equivalent to the object for which C is a proxy. If B
>recognizes C and can open it up, it can do the comparison.

An important difference between their protocol and your simplification is
that theirs can do grant matching
http://www.erights.org/elib/capability/grant-matcher/index.html , while
still being an extensible form of equality.  However, in order for B to be
able to meaningfully say that it agrees with the choice of C, you still
need a primitive symmetric equality primitive (like DISCRIM).  In your
simplification, B can simply lie.

What's Muse?


	Cheers,
	--MarkM


From: Eric S. Raymond <esr@snark.thyrsus.com>
To: <linux-kernel@vger.rutgers.edu>
Subject: Some very thought-provoking ideas about OS architecture.
Date: Sunday, June 20, 1999 12:59 AM

(Please copy any replies to me explicitly, as I'm not presently subscribed
to the linux-kernel list -- it's not practical when I'm spending so
much time on the road.)

Gents and ladies, I believe I have may have seen what comes after
Unix. Not a half-step like Plan 9, but an advance in OS architecture
as fundamental at Multics or Unix was in its day.

As an old Unix hand myself, I don't make this claim lightly; I've
been wrestling with it for a couple of weeks now.  Nor am I suggesting
we ought to drop what we're doing and hare off in a new direction.
What I am suggesting is that Linus and the other kernel architects
should be taking a hard look at this stuff and thinking about it.  It
may take a while for all the implications to sink in.  They're huge.

What comes after Unix will, I now believe, probably resemble at least
in concept an experimental operating system called EROS.  Full details
are available at <http://www.eros-os.org/>, but for the impatient I'll
review the high points here.

EROS is built around two fundamental and intertwined ideas.  One is
that all data and code persistence is handled directly by the OS.
There is no file system.  Yes, I said *no file system*.  Instead, 
everything is structures built in virtual memory and checkpointed out
to disk every so often (every five minutes in EROS).  Want something?
Chase a pointer to it; EROS memory management does the rest.

The second fundamental idea is that of a pure capability architecture
with provably correct security.  This is something like ACLs, except
that an OS with ACLs on a file system has a hole in it; programs can
communicate (in ways intended or unintended) through the file system
that everybody shares access to.

Capabilities plus checkpointing is a combination that turns out to
have huge synergies.  Obviously programming is a lot simpler -- no
more hours and hours spent writing persistence/pickling/marshalling
code.  The OS kernel is a lot simpler too; I can't find the figure to
be sure, but I believe EROS's is supposed to clock in at about 50K of code.

Here's another: All disk I/O is huge sequential BLTs done as part of
checkpoint operations.  You can actually use close to 100% of your
controller's bandwidth, as opposed to the 30%-50% typical for
explicit-I/O operating systems that are doing seeks a lot of the time.
This means the maximum I/O throughput the OS can handle effectively
more than doubles.  With simpler code.  You could even afford the time
to verify each checkpoint write...

Here's a third: Had a crash or power-out?  On reboot, the system
simply picks up pointers to the last checkpointed state.  Your OS, and
all your applications, are back in thirty seconds.  No fscks, ever
again!

And I haven't even talked about the advantages of capabilities over
userids yet.  I would, but I just realized I'm running out of time --
gotta get ready to fly to Seattle tomorrow to upset some stomachs
at Microsoft.

www.eros-os.org.  Eric sez check it out.  Mind-blowing stuff once
you've had a few days to digest it.
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

The Bible is not my book, and Christianity is not my religion.  I could never
give assent to the long, complicated statements of Christian dogma.
	-- Abraham Lincoln

From: Eric S. Raymond <esr@snark.thyrsus.com>
To: <alan@lxorguk.ukuu.org.uk>
Cc: <linux-kernel@vger.rutgers.edu>; <jsshapiro@earthlink.net>; <markm@foresight.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: Sunday, June 20, 1999 8:41 AM

(Apologies for losing the thread ID.  Alan's mail to me bounced.)

Alan Cox writes:
> > EROS is built around two fundamental and intertwined ideas.  One is
> > that all data and code persistence is handled directly by the OS.
> > There is no file system.  Yes, I said *no file system*.  Instead,
> > everything is structures built in virtual memory and checkpointed out
> > to disk every so often (every five minutes in EROS).  Want something?
> > Chase a pointer to it; EROS memory management does the rest.
> 
> This is actually an old idea. The problem that has never been solved well is
> recovery from errors. You lose 1% of your object store. How do you tidy up.
> 20% of your object store dies in a disk crash, how do you run an fscobject
> tool. You can do it, but you end up back with file system complexity and
> all the other fs stuff.

Accepting your analysis, it still seems to me there's a difference,
though.  In an EROS-like world, you would only pay the complexity cost of doing
fscobject-like things in the postmortem analyzer that's trying to
stitch together the remaining pieces.  You wouldn't have to pay that same
cost in the kernel for each and every access to persistent stuff;  no
namespace management to worry about.

So, yes.  An EROS-like architecture has the same error-recovery
problem that fsck addresses. But it appears to me, at least as far as
I've taken the logic, that that problem would be better contained than
in a Unix-like system.

> Another peril is that external interfaces don't always like replay of events.

A much more serious objection, I agree.

> You still end up with a lot of your objects having checkpoint/restart aware
> methods.

Yes, I grant that's true.  (The way I'd put it is that you still need
something like commit/rollback in database-land.)  But this is a solvable
problem.  Butler Lampson showed years ago how to do provably correct
serialization of access to shared critical regions with timestamps
even in the absence of reliable locks.  So as long as your
hypothetical user can't futz with the system clock...
 
> Moving just some objects between systems is fun too. You then get into
> cluster checkpointing, which is a field that requires you wear a pointy hat,
> have a beard and work for SGI or Digital.

Not something I have opinions about -- or am qualified to. :-)

> Their numbers are for a microkernelish core. They are still very good, but
> that includes basically no drivers, no network stack, no graphics and apparentlyno real checkpoint/restart in the face of corruption. I may be wrong on the
> last item.

You're probably right; I'm told all EROS actually does at this point
is run its own debugging and benchmarking tools.  Still, the fact that
the test kernel can be that small is IMO an argument that the design
is sound.
 
> That nature of I/O is no different. If you always do large sequential
> block writes tell me how it will outperform a conventional OS if only
> a small number of changes in a small number of objects occur.

No seeks to read inodes, because the map from EROS's virtual-space
blocks to disk blocks is almost trivial (essentially the disks get
treated like a honkin' big series of swap volumes).  So the disk
access pattern would be quite different, I think.
 
> Object stores are great models for some applications, thats why libraries
> for doing persistent object stores in application space exist (eg texas)
> 
> Another way to look at this
> 
>                 File System                     Object Store
> 
> Index           Inode Number                    Object ID
> Update          Look in directory               Look in an object
>                 Find item                       Find item location
>                 Write(maybe COW)                Write(maybe COW)
> Page In         Look in directory               Look in an object
>                 Find item                       Find item location
>                 Write(maybe COW)                Write(maybe COW)
> Granularity     User controlled                 Enforced by OS
> 
> 
> So if I promise to call my inodes object ids, call the directory structure
> "objects" and I have a checkpointing scheme  - what is the great new concept.

That, under most circumstances, you don't have to manage persistence
yourself (or to put it more concretely, no explicit disk I/O in most
applications).  That's clearly a huge win, even if you end up having to do 
more conventional-looking things in applications that require
commit/rollback.

And it's not clear to me that you do end up there; with one single
added atomic-flush primitive, I think you could use Lampson's
timestamp trickery to do reliable journalling without having to go all
the way to fs-like namespace management.

> o       I don't think the object model is the good stuff

Even if you're right...
 
> o       The security model is very very interesting indeed.

...this is still very true.

> o       They are making it hard to help them however.

This is indeed true.  However, I may have some leverage on a win-win
solution.  But that's a topic for another day.

What I'm thinking is this: remember RT-Linux?  Suppose the kernel were
a process running over an EROS-like layer...
--- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

To make inexpensive guns impossible to get is to say that you're
putting a money test on getting a gun.  It's racism in its worst form.
        -- Roy Innis, president of the Congress of Racial Equality (CORE), 1988

From: Eric S. Raymond <esr@snark.thyrsus.com>
To: <alan@lxorguk.ukuu.org.uk>
Cc: <linux-kernel@vger.rutgers.edu>; <jsshapiro@earthlink.net>; <markm@foresight.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: Sunday, June 20, 1999 8:51 AM

(Apologies for losing the thread ID.  Alan's mail to me bounced.)

Alan Cox writes:
> > EROS is built around two fundamental and intertwined ideas.  One is
> > that all data and code persistence is handled directly by the OS.
> > There is no file system.  Yes, I said *no file system*.  Instead,
> > everything is structures built in virtual memory and checkpointed out
> > to disk every so often (every five minutes in EROS).  Want something?
> > Chase a pointer to it; EROS memory management does the rest.
> 
> This is actually an old idea. The problem that has never been solved well is
> recovery from errors. You lose 1% of your object store. How do you tidy up.
> 20% of your object store dies in a disk crash, how do you run an fscobject
> tool. You can do it, but you end up back with file system complexity and
> all the other fs stuff.

Accepting your analysis, it still seems to me there's a difference,
though.  In an EROS-like world, you would only pay the complexity cost of doing
fscobject-like things in the postmortem analyzer that's trying to
stitch together the remaining pieces.  You wouldn't have to pay that same
cost in the kernel for each and every access to persistent stuff;  no
namespace management to worry about.

So, yes.  An EROS-like architecture has the same error-recovery
problem that fsck addresses. But it appears to me, at least as far as
I've taken the logic, that that problem would be better contained than
in a Unix-like system.

> Another peril is that external interfaces don't always like replay of events.

A much more serious objection, I agree.

> You still end up with a lot of your objects having checkpoint/restart aware
> methods.

Yes, I grant that's true.  (The way I'd put it is that you still need
something like commit/rollback in database-land.)  But this is a solvable
problem.  Butler Lampson showed years ago how to do provably correct
serialization of access to shared critical regions with timestamps
even in the absence of reliable locks.  So as long as your
hypothetical user can't futz with the system clock...
 
> Moving just some objects between systems is fun too. You then get into
> cluster checkpointing, which is a field that requires you wear a pointy hat,
> have a beard and work for SGI or Digital.

Not something I have opinions about -- or am qualified to. :-)

> Their numbers are for a microkernelish core. They are still very good, but
> that includes basically no drivers, no network stack, no graphics and apparentlyno real checkpoint/restart in the face of corruption. I may be wrong on the
> last item.

You're probably right; I'm told all EROS actually does at this point
is run its own debugging and benchmarking tools.  Still, the fact that
the test kernel can be that small is IMO an argument that the design
is sound.
 
> That nature of I/O is no different. If you always do large sequential
> block writes tell me how it will outperform a conventional OS if only
> a small number of changes in a small number of objects occur.

No seeks to read inodes, because the map from EROS's virtual-space
blocks to disk blocks is almost trivial (essentially the disks get
treated like a honkin' big series of swap volumes).  So the disk
access pattern would be quite different, I think.
 
> Object stores are great models for some applications, thats why libraries
> for doing persistent object stores in application space exist (eg texas)
> 
> Another way to look at this
> 
>                 File System                     Object Store
> 
> Index           Inode Number                    Object ID
> Update          Look in directory               Look in an object
>                 Find item                       Find item location
>                 Write(maybe COW)                Write(maybe COW)
> Page In         Look in directory               Look in an object
>                 Find item                       Find item location
>                 Write(maybe COW)                Write(maybe COW)
> Granularity     User controlled                 Enforced by OS
> 
> 
> So if I promise to call my inodes object ids, call the directory structure
> "objects" and I have a checkpointing scheme  - what is the great new concept.

That, under most circumstances, you don't have to manage persistence
yourself (or to put it more concretely, no explicit disk I/O in most
applications).  That's clearly a huge win, even if you end up having to do 
more conventional-looking things in applications that require
commit/rollback.

And it's not clear to me that you do end up there; with one single
added atomic-flush primitive, I think you could use Lampson's
timestamp trickery to do reliable journalling without having to go all
the way to fs-like namespace management.

> o       I don't think the object model is the good stuff

Even if you're right...
 
> o       The security model is very very interesting indeed.

...this is still very true.

> o       They are making it hard to help them however.

This is indeed true.  However, I may have some leverage on a win-win
solution.  But that's a topic for another day.

What I'm thinking is this: remember RT-Linux?  Suppose the kernel were
a process running over an EROS-like layer...
--- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

To make inexpensive guns impossible to get is to say that you're
putting a money test on getting a gun.  It's racism in its worst form.
        -- Roy Innis, president of the Congress of Racial Equality (CORE), 1988

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/


From: Alan Cox <alan@lxorguk.ukuu.org.uk>
To: Eric S. Raymond <esr@snark.thyrsus.com>
Cc: <alan@lxorguk.ukuu.org.uk>; <linux-kernel@vger.rutgers.edu>; <jsshapiro@earthlink.net>; <markm@foresight.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: Sunday, June 20, 1999 9:20 AM

> Accepting your analysis, it still seems to me there's a difference,
> though.  In an EROS-like world, you would only pay the complexity cost of doing
> fscobject-like things in the postmortem analyzer that's trying to
> stitch together the remaining pieces.  You wouldn't have to pay that same

That depends on if you want to consistency check your object store after a
crash. Unless you journal the object store - which btw is hard. If you have
two thousand inter-related objects you need to dump the set of them 
consistently and in a snapshotted state.

Funnily enough "object system" and file system journalling are identical.

> is run its own debugging and benchmarking tools.  Still, the fact that
> the test kernel can be that small is IMO an argument that the design
> is sound.

V7 Unix is smaller than that. System III is about 60K on a 68010 CPU.

> > block writes tell me how it will outperform a conventional OS if only
> > a small number of changes in a small number of objects occur.
> 
> No seeks to read inodes, because the map from EROS's virtual-space
> blocks to disk blocks is almost trivial (essentially the disks get
> treated like a honkin' big series of swap volumes).  So the disk
> access pattern would be quite different, I think.

Not really. The mapping of objects to disks is complex the moment you allow
an object to expand or want to reclaim space. Again it is the same problems 

1.	You need an indexing object - in a fileystem it is a directory or
	directory tree
2.	You need to position objects/files for locality
3.	You need fragmentation resistant algorithms

There are arguments about how you provide this - ranging from conventional
cylinder groups through log structured file systems to stuff like reiserfs
where the fs is viewed much more in a database fashion.

It doesn't matter if your email is an object or a file or a database record,
they are the same thing. Your handle just changes name.

> That, under most circumstances, you don't have to manage persistence
> yourself (or to put it more concretely, no explicit disk I/O in most
> applications).  That's clearly a huge win, even if you end up having to do 
> more conventional-looking things in applications that require
> commit/rollback.

Why is having persistence managed by a library that is playing guessing games
of your intent a good idea ? It has to know about object relationships, 
potentially it has to blindly snapshot the entire system. It has to do a lot
of work to know in detail what has changed.

Again this is identical to conventional file system problems. if you've ever
tried to send someone a file not knowing what other files it happens to include
or need, you've met the object relationship problems.

So all you have to do is export every object that this object refers to. Like
the windowing environment, whoops oh dear. I've worked with persistant object
stores of sorts. AberMUD5 is a multiuser game that is basically a persistant
object store. The "how do I split something off from the rest" problem isn't
fun. You can do most of it by making a logical copy of the object store
marking the objects you want as needed and garbage collecting the rest. However
even if you then are smart about some of the object linkages you still find
large numbers of objects that wont go away and whose connectivity and relation
to the stuff you want is very complex even to visualise.

> What I'm thinking is this: remember RT-Linux?  Suppose the kernel were
> a process running over an EROS-like layer...

Suppose Eros was just a set of persistent object libraries that ran on
top of numerous other platforms too, could be downloaded off the net and worked
pretty well within the limits of the "programmer lazy, do more work than
needed" paradigm.

	ftp://ftp.cs.utexas.edu/pub/garbage/texas/README

And that is demonstrably the right way up. If you put a "lazy programmer"
system at the bottom of an environment you prevent the smart programmer doing
smart things. If your bottom layer is fundamentally ignorant of programmer
provided clues you cripple the smart.

Alan


From: Alan Cox <alan@lxorguk.ukuu.org.uk>
To: Eric S. Raymond <esr@snark.thyrsus.com>
Cc: <alan@lxorguk.ukuu.org.uk>; <linux-kernel@vger.rutgers.edu>; <jsshapiro@earthlink.net>; <markm@foresight.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: Sunday, June 20, 1999 9:25 AM

> Accepting your analysis, it still seems to me there's a difference,
> though.  In an EROS-like world, you would only pay the complexity cost of doing
> fscobject-like things in the postmortem analyzer that's trying to
> stitch together the remaining pieces.  You wouldn't have to pay that same

That depends on if you want to consistency check your object store after a
crash. Unless you journal the object store - which btw is hard. If you have
two thousand inter-related objects you need to dump the set of them 
consistently and in a snapshotted state.

Funnily enough "object system" and file system journalling are identical.

> is run its own debugging and benchmarking tools.  Still, the fact that
> the test kernel can be that small is IMO an argument that the design
> is sound.

V7 Unix is smaller than that. System III is about 60K on a 68010 CPU.

> > block writes tell me how it will outperform a conventional OS if only
> > a small number of changes in a small number of objects occur.
> 
> No seeks to read inodes, because the map from EROS's virtual-space
> blocks to disk blocks is almost trivial (essentially the disks get
> treated like a honkin' big series of swap volumes).  So the disk
> access pattern would be quite different, I think.

Not really. The mapping of objects to disks is complex the moment you allow
an object to expand or want to reclaim space. Again it is the same problems 

1.	You need an indexing object - in a fileystem it is a directory or
	directory tree
2.	You need to position objects/files for locality
3.	You need fragmentation resistant algorithms

There are arguments about how you provide this - ranging from conventional
cylinder groups through log structured file systems to stuff like reiserfs
where the fs is viewed much more in a database fashion.

It doesn't matter if your email is an object or a file or a database record,
they are the same thing. Your handle just changes name.

> That, under most circumstances, you don't have to manage persistence
> yourself (or to put it more concretely, no explicit disk I/O in most
> applications).  That's clearly a huge win, even if you end up having to do 
> more conventional-looking things in applications that require
> commit/rollback.

Why is having persistence managed by a library that is playing guessing games
of your intent a good idea ? It has to know about object relationships, 
potentially it has to blindly snapshot the entire system. It has to do a lot
of work to know in detail what has changed.

Again this is identical to conventional file system problems. if you've ever
tried to send someone a file not knowing what other files it happens to include
or need, you've met the object relationship problems.

So all you have to do is export every object that this object refers to. Like
the windowing environment, whoops oh dear. I've worked with persistant object
stores of sorts. AberMUD5 is a multiuser game that is basically a persistant
object store. The "how do I split something off from the rest" problem isn't
fun. You can do most of it by making a logical copy of the object store
marking the objects you want as needed and garbage collecting the rest. However
even if you then are smart about some of the object linkages you still find
large numbers of objects that wont go away and whose connectivity and relation
to the stuff you want is very complex even to visualise.

> What I'm thinking is this: remember RT-Linux?  Suppose the kernel were
> a process running over an EROS-like layer...

Suppose Eros was just a set of persistent object libraries that ran on
top of numerous other platforms too, could be downloaded off the net and worked
pretty well within the limits of the "programmer lazy, do more work than
needed" paradigm.

	ftp://ftp.cs.utexas.edu/pub/garbage/texas/README

And that is demonstrably the right way up. If you put a "lazy programmer"
system at the bottom of an environment you prevent the smart programmer doing
smart things. If your bottom layer is fundamentally ignorant of programmer
provided clues you cripple the smart.

Alan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/


From: Eric S. Raymond <esr@thyrsus.com>
To: <alan@lxorguk.ukuu.org.uk>
Cc: <linux-kernel@vger.rutgers.edu>; <jsshapiro@earthlink.net>; <markm@foresight.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: Sunday, June 20, 1999 10:06 AM

Alan:
> That depends on if you want to consistency check your object store after a
> crash. Unless you journal the object store - which btw is hard. If you have
> two thousand inter-related objects you need to dump the set of them 
> consistently and in a snapshotted state.

I've been thinking about this since your last post. Seems to me the
primitive one needs is the ability to say "This object and all its
dependents need to be written atomically".  Not too hard to imagine
how to do that given you already have enough of a VM system to do
copy-on-write.  OK, you end up having to allocate in two different
spaces, one with atomicity constraints and one without. But it's
solvable.  (See below on why this doesn't mean you end up journaling
everything).

> Why is having persistence managed by a library that is playing guessing games
> of your intent a good idea ? It has to know about object relationships, 
> potentially it has to blindly snapshot the entire system. It has to do a lot
> of work to know in detail what has changed.

For the *exact same* reasons that automatic memory management with
garbage collection is preferable to slinging your own buffers.  Perl
and Python and Tcl are on the rise because, outside the kernel, accepting
all that complexity and the potential for buffer overruns just doesn't
make any damn sense with clocks and memory as cheap as they are now.

Remember, the name of the game in OS design is really to optimize for
least complexity overhead for the *application programmer* and *user*.
If this means accepting a marginally more complex and less efficient
OS substructure (like the difference between a journaled object store
and a file system with explicit I/O) then that's fine.  But in fact I
think Shapiro makes strong arguments that an object store, done
properly, is *more* efficient.

> So all you have to do is export every object that this object refers to. Like
> the windowing environment, whoops oh dear.

Now you know it's not that bad in practice.  Not all object references are
pointers.  Some are capabilities and cookies that are persistent without
prearrangement.  That's especially likely to be true of OS services, and 
especially if you design your API with that in mind.

> Suppose Eros was just a set of persistent object libraries that ran on
> top of numerous other platforms too, could be downloaded off the net and
> pretty well within the limits of the "programmer lazy, do more work than
> worked needed" paradigm.
> 
> 	ftp://ftp.cs.utexas.edu/pub/garbage/texas/README
> 
> And that is demonstrably the right way up. If you put a "lazy programmer"
> system at the bottom of an environment you prevent the smart programmer doing
> smart things. If your bottom layer is fundamentally ignorant of programmer
> provided clues you cripple the smart.

If that's true, why is Perl a success?

That's not intended to be a snarky question.  Your argument here is
essentially the argument for malloc(3) as opposed to unlimited-extent
types and garbage collection.  And the answer is the same: there comes
a point where the value of the optimization you can do with hints no
longer pays for the complexity overhead of having to do the storage
management yourself.

The EROS papers implicitly argue that we've reached that point not
just in memory management but with respect to the entire persistence 
problem.  I'm inclined to agree with them.

At the very least, it's something that I think we'd all be better off
doing a little forward thinking about.  As I said at the beginning of
the thread, I'm not after changing the whole architecture of Linux
right away; that would be silly and futile.  But this exchange will
have achieved my purposes if it only plants a few conceptual seeds.
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

The right of self-defense is the first law of nature: in most
governments it has been the study of rulers to confine this right
within the narrowest limits possible.  Wherever standing armies
are kept up, and when the right of the people to keep and bear
arms is, under any color or pretext whatsoever, prohibited,
liberty, if not already annihilated, is on the brink of
destruction." 
	-- Henry St. George Tucker (in Blackstone's Commentaries)

From: Eric S. Raymond <esr@thyrsus.com>
To: <alan@lxorguk.ukuu.org.uk>
Cc: <linux-kernel@vger.rutgers.edu>; <jsshapiro@earthlink.net>; <markm@foresight.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: Sunday, June 20, 1999 10:11 AM

Alan:
> That depends on if you want to consistency check your object store after a
> crash. Unless you journal the object store - which btw is hard. If you have
> two thousand inter-related objects you need to dump the set of them 
> consistently and in a snapshotted state.

I've been thinking about this since your last post. Seems to me the
primitive one needs is the ability to say "This object and all its
dependents need to be written atomically".  Not too hard to imagine
how to do that given you already have enough of a VM system to do
copy-on-write.  OK, you end up having to allocate in two different
spaces, one with atomicity constraints and one without. But it's
solvable.  (See below on why this doesn't mean you end up journaling
everything).

> Why is having persistence managed by a library that is playing guessing games
> of your intent a good idea ? It has to know about object relationships, 
> potentially it has to blindly snapshot the entire system. It has to do a lot
> of work to know in detail what has changed.

For the *exact same* reasons that automatic memory management with
garbage collection is preferable to slinging your own buffers.  Perl
and Python and Tcl are on the rise because, outside the kernel, accepting
all that complexity and the potential for buffer overruns just doesn't
make any damn sense with clocks and memory as cheap as they are now.

Remember, the name of the game in OS design is really to optimize for
least complexity overhead for the *application programmer* and *user*.
If this means accepting a marginally more complex and less efficient
OS substructure (like the difference between a journaled object store
and a file system with explicit I/O) then that's fine.  But in fact I
think Shapiro makes strong arguments that an object store, done
properly, is *more* efficient.

> So all you have to do is export every object that this object refers to. Like
> the windowing environment, whoops oh dear.

Now you know it's not that bad in practice.  Not all object references are
pointers.  Some are capabilities and cookies that are persistent without
prearrangement.  That's especially likely to be true of OS services, and 
especially if you design your API with that in mind.

> Suppose Eros was just a set of persistent object libraries that ran on
> top of numerous other platforms too, could be downloaded off the net and
> pretty well within the limits of the "programmer lazy, do more work than
> worked needed" paradigm.
> 
> 	ftp://ftp.cs.utexas.edu/pub/garbage/texas/README
> 
> And that is demonstrably the right way up. If you put a "lazy programmer"
> system at the bottom of an environment you prevent the smart programmer doing
> smart things. If your bottom layer is fundamentally ignorant of programmer
> provided clues you cripple the smart.

If that's true, why is Perl a success?

That's not intended to be a snarky question.  Your argument here is
essentially the argument for malloc(3) as opposed to unlimited-extent
types and garbage collection.  And the answer is the same: there comes
a point where the value of the optimization you can do with hints no
longer pays for the complexity overhead of having to do the storage
management yourself.

The EROS papers implicitly argue that we've reached that point not
just in memory management but with respect to the entire persistence 
problem.  I'm inclined to agree with them.

At the very least, it's something that I think we'd all be better off
doing a little forward thinking about.  As I said at the beginning of
the thread, I'm not after changing the whole architecture of Linux
right away; that would be silly and futile.  But this exchange will
have achieved my purposes if it only plants a few conceptual seeds.
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

The right of self-defense is the first law of nature: in most
governments it has been the study of rulers to confine this right
within the narrowest limits possible.  Wherever standing armies
are kept up, and when the right of the people to keep and bear
arms is, under any color or pretext whatsoever, prohibited,
liberty, if not already annihilated, is on the brink of
destruction." 
	-- Henry St. George Tucker (in Blackstone's Commentaries)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/


From: Eric S. Raymond <esr@thyrsus.com>
To: <alan@lxorguk.ukuu.org.uk>
Cc: <linux-kernel@vger.rutgers.edu>; <jsshapiro@earthlink.net>; <markm@foresight.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: Sunday, June 20, 1999 10:11 AM

Alan:
> That depends on if you want to consistency check your object store after a
> crash. Unless you journal the object store - which btw is hard. If you have
> two thousand inter-related objects you need to dump the set of them 
> consistently and in a snapshotted state.

I've been thinking about this since your last post. Seems to me the
primitive one needs is the ability to say "This object and all its
dependents need to be written atomically".  Not too hard to imagine
how to do that given you already have enough of a VM system to do
copy-on-write.  OK, you end up having to allocate in two different
spaces, one with atomicity constraints and one without. But it's
solvable.  (See below on why this doesn't mean you end up journaling
everything).

> Why is having persistence managed by a library that is playing guessing games
> of your intent a good idea ? It has to know about object relationships, 
> potentially it has to blindly snapshot the entire system. It has to do a lot
> of work to know in detail what has changed.

For the *exact same* reasons that automatic memory management with
garbage collection is preferable to slinging your own buffers.  Perl
and Python and Tcl are on the rise because, outside the kernel, accepting
all that complexity and the potential for buffer overruns just doesn't
make any damn sense with clocks and memory as cheap as they are now.

Remember, the name of the game in OS design is really to optimize for
least complexity overhead for the *application programmer* and *user*.
If this means accepting a marginally more complex and less efficient
OS substructure (like the difference between a journaled object store
and a file system with explicit I/O) then that's fine.  But in fact I
think Shapiro makes strong arguments that an object store, done
properly, is *more* efficient.

> So all you have to do is export every object that this object refers to. Like
> the windowing environment, whoops oh dear.

Now you know it's not that bad in practice.  Not all object references are
pointers.  Some are capabilities and cookies that are persistent without
prearrangement.  That's especially likely to be true of OS services, and 
especially if you design your API with that in mind.

> Suppose Eros was just a set of persistent object libraries that ran on
> top of numerous other platforms too, could be downloaded off the net and
> pretty well within the limits of the "programmer lazy, do more work than
> worked needed" paradigm.
> 
> 	ftp://ftp.cs.utexas.edu/pub/garbage/texas/README
> 
> And that is demonstrably the right way up. If you put a "lazy programmer"
> system at the bottom of an environment you prevent the smart programmer doing
> smart things. If your bottom layer is fundamentally ignorant of programmer
> provided clues you cripple the smart.

If that's true, why is Perl a success?

That's not intended to be a snarky question.  Your argument here is
essentially the argument for malloc(3) as opposed to unlimited-extent
types and garbage collection.  And the answer is the same: there comes
a point where the value of the optimization you can do with hints no
longer pays for the complexity overhead of having to do the storage
management yourself.

The EROS papers implicitly argue that we've reached that point not
just in memory management but with respect to the entire persistence 
problem.  I'm inclined to agree with them.

At the very least, it's something that I think we'd all be better off
doing a little forward thinking about.  As I said at the beginning of
the thread, I'm not after changing the whole architecture of Linux
right away; that would be silly and futile.  But this exchange will
have achieved my purposes if it only plants a few conceptual seeds.
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

The right of self-defense is the first law of nature: in most
governments it has been the study of rulers to confine this right
within the narrowest limits possible.  Wherever standing armies
are kept up, and when the right of the people to keep and bear
arms is, under any color or pretext whatsoever, prohibited,
liberty, if not already annihilated, is on the brink of
destruction." 
	-- Henry St. George Tucker (in Blackstone's Commentaries)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/


From: Alan Cox <alan@lxorguk.ukuu.org.uk>
To: Eric S. Raymond <esr@thyrsus.com>
Cc: <alan@lxorguk.ukuu.org.uk>; <linux-kernel@vger.rutgers.edu>; <jsshapiro@earthlink.net>; <markm@foresight.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: Sunday, June 20, 1999 12:49 PM

> For the *exact same* reasons that automatic memory management with
> garbage collection is preferable to slinging your own buffers.  Perl
> and Python and Tcl are on the rise because, outside the kernel, accepting
> all that complexity and the potential for buffer overruns just doesn't
> make any damn sense with clocks and memory as cheap as they are now.

But would you write your kernel in perl - no. Thats the issue. 

> Remember, the name of the game in OS design is really to optimize for
> least complexity overhead for the *application programmer* and *user*.

If you want windows yes maybe. The role of the OS core is

o	To make the common very efficient
o	To make the uncommon possible and if possible efficient

nothing nothing nothing about making life cute for visual basic programmers.

If you put automatic garbage collection in the kernel for example then every
application is forced to pay the cost of this, even things like webservers
using hand tuned static buffers for speed and cache optimisation. 

If you shovel it into libraries and tools your web server goes like smoke
and you can still write python apps.

> and a file system with explicit I/O) then that's fine.  But in fact I
> think Shapiro makes strong arguments that an object store, done
> properly, is *more* efficient.

I don't think so. Nobody has ever demonstrated a working object store based
OS handling a real evil job load. When Shapiro can run innd under a full load
faster than a conventional OS can, and can network share the article objects
to 16 front end servers, then I'll be more convinced.

> > And that is demonstrably the right way up. If you put a "lazy programmer"
> > system at the bottom of an environment you prevent the smart programmer doing
> > smart things. If your bottom layer is fundamentally ignorant of programmer
> > provided clues you cripple the smart.
> 
> If that's true, why is Perl a success?

Because perl doesnt cripple the smart. People routinely rewrite perl programs
into C when they become performance issues. Most of them time the performance
issue is programming rate not program execution rate. If the claims of the EROS
people were correct the Corel java office would not have failed.

> types and garbage collection.  And the answer is the same: there comes
> a point where the value of the optimization you can do with hints no
> longer pays for the complexity overhead of having to do the storage
> management yourself.

You miss the most important point of this

The situation you are describing does *not* occur on a global system space
scale. Performance critical applications are written in high performance
languages. Other stuff is often written in tools like python.

The "conventional" model of memory allocation requires address space and page
allocation services. page protection services help. The object based model
needs address and page allocation services and really wants page protection
services.

So they share a common underlying set of needs. Should the OS provide
conventional model, object model, or the underlying service both needs. The
answer is obvious and its precisely want unix does supply.

Alan


From: Alan Cox <alan@lxorguk.ukuu.org.uk>
To: Eric S. Raymond <esr@thyrsus.com>
Cc: <alan@lxorguk.ukuu.org.uk>; <linux-kernel@vger.rutgers.edu>; <jsshapiro@earthlink.net>; <markm@foresight.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: Sunday, June 20, 1999 12:56 PM

> For the *exact same* reasons that automatic memory management with
> garbage collection is preferable to slinging your own buffers.  Perl
> and Python and Tcl are on the rise because, outside the kernel, accepting
> all that complexity and the potential for buffer overruns just doesn't
> make any damn sense with clocks and memory as cheap as they are now.

But would you write your kernel in perl - no. Thats the issue. 

> Remember, the name of the game in OS design is really to optimize for
> least complexity overhead for the *application programmer* and *user*.

If you want windows yes maybe. The role of the OS core is

o	To make the common very efficient
o	To make the uncommon possible and if possible efficient

nothing nothing nothing about making life cute for visual basic programmers.

If you put automatic garbage collection in the kernel for example then every
application is forced to pay the cost of this, even things like webservers
using hand tuned static buffers for speed and cache optimisation. 

If you shovel it into libraries and tools your web server goes like smoke
and you can still write python apps.

> and a file system with explicit I/O) then that's fine.  But in fact I
> think Shapiro makes strong arguments that an object store, done
> properly, is *more* efficient.

I don't think so. Nobody has ever demonstrated a working object store based
OS handling a real evil job load. When Shapiro can run innd under a full load
faster than a conventional OS can, and can network share the article objects
to 16 front end servers, then I'll be more convinced.

> > And that is demonstrably the right way up. If you put a "lazy programmer"
> > system at the bottom of an environment you prevent the smart programmer doing
> > smart things. If your bottom layer is fundamentally ignorant of programmer
> > provided clues you cripple the smart.
> 
> If that's true, why is Perl a success?

Because perl doesnt cripple the smart. People routinely rewrite perl programs
into C when they become performance issues. Most of them time the performance
issue is programming rate not program execution rate. If the claims of the EROS
people were correct the Corel java office would not have failed.

> types and garbage collection.  And the answer is the same: there comes
> a point where the value of the optimization you can do with hints no
> longer pays for the complexity overhead of having to do the storage
> management yourself.

You miss the most important point of this

The situation you are describing does *not* occur on a global system space
scale. Performance critical applications are written in high performance
languages. Other stuff is often written in tools like python.

The "conventional" model of memory allocation requires address space and page
allocation services. page protection services help. The object based model
needs address and page allocation services and really wants page protection
services.

So they share a common underlying set of needs. Should the OS provide
conventional model, object model, or the underlying service both needs. The
answer is obvious and its precisely want unix does supply.

Alan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/