[cap-talk] Secure Restart or Trusted Recovery?
Jed Donnelley
capability at webstart.com
Thu Jan 4 20:33:47 CST 2007
At 05:35 PM 1/4/2007, Valerio Bellizzomi wrote:
>On 04/01/2007, at 19.52, Jonathan S. Shapiro wrote:
>
> >On Fri, 2007-01-05 at 01:11 +0100, Valerio Bellizzomi wrote:
> >> On 04/01/2007, at 9.34, Jonathan S. Shapiro wrote:
> >>
> >> >On Thu, 2007-01-04 at 10:54 +0100, Neal H. Walfield wrote:
> >> >> Is what you describe as secure restart essentially trusted recovery?
> >> >> Is there any reason for the term rotation?
> >> >
> >> >I believe the two terms mean the same thing. Some of the KeyKOS
> >> >terminology came out of the IBM world, which never shared a lexicon
>with
> >> >the rest of the world. Not sure if this is an example or not.
> >>
> >> Are we talking about "whole system+application restart" or only "kernel
> >> restart" ?
> >
> >Definitely whole system. Whether applications are recovered or restarted
> >depends on the system, of course...
>
>Of course :)
>In my understanding "recovery" comes after kernel restart, but this may
>not be the case after a disk crash.
>I think the term "recovery" is too much general, there are a variety of
>ways for doing a recovery. The term "restart" is more appropriate in this
>case.
I'm hesitant to get involved in this Restart/Recovery thread given my heavy
involvement in some other threads, but I'll mention some terminology
we used for our NLTSS system (from memory ;-):
1. Cold start - build only an initial set of systems processes as if
after a base system build to the disk. Fairly safe, but of course even
the code for the kernel and the base system processes could have
been corrupted on disk.
2. Warm start - recover all processes that have no components
in memory. Process state is checked for minimal consistency
(e.g. memory mapping, etc. when pulled in from disk - this happens
in any case). Processes that had contents in memory are faulted.
Processes whose state was entirely on the disk can begin to run.
This "recovery" might be considered to do it's best to check
and recover from process state on disk.
3. Hot start - a system initializer paws through what's in
memory (overlaying lower memory where the kernel code
resides), checking for consistency with process state that's
on disk and tries to write out the memory content to the appropriate
process states on disk. Any processes successfully recovered
can begin to run right away as well as those whose state was entirely
on rotating storage.
This "recovery" can be considered to do it's best to check
and recover from process state on disk and from process
state in memory.
I don't expect these terms to be necessarily helpful or
relevant, but I throw them out in case. Naturally there are
risks with any such recovery mechanisms. If, for example,
an incorrect read from disk overwrote part of a process'
memory resident address space before a crash and then
the system was Hot started, that data would show up,
inappropriately, within that running process' memory space.
Of course the risk is the same from such a horrific event
(which we never saw to my knowledge) if the system doesn't
recognize such a problem and crash. Such problems can also
spray data to disk with serious consequences.
Our system crashes in those days were pretty evenly spread
out between inconsistencies recognized by code for situations
that we believed couldn't happen, but did and should happen,
and those situations that we believed couldn't happen but that
did happen because of a hardware failure. One of the most
famous of the latter was the frustratingly random "Bits from
outer space" problem:
http://www.computer-history.info/Page5.dir/pages/Chronicles.dir/images/bits-from-space.gif
which I believe later picked up the moniker "12th Bit Problem":
http://www.computer-history.info/Page5.dir/pages/Chronicles.dir/images/12th-bit.gif
that we pinned on a failing disk controller that was passing all
the checks that were available from software checks of the hardware.
I believe there's really very little that can be done in such situations
but isolate the problem and fix it. Our users wanted us to do our best to
recover their running processes even in the face of the obvious dangers
from such problems. We obliged. I find it interesting that even our
users were willing to give up these warm and hot start facilities when
Unix came along. There was a little grumbling, but what had been considered
vital one year (< 1994) was gone the next (>= 1995) in light of COTS value.
--Jed http://www.webstart.com/jed/
More information about the cap-talk
mailing list