Hardware for EROS?
Mark S. Miller
Tue, 11 Apr 2000 11:33:30 -0700
At 09:10 AM 4/11/00 , Jonathan S. Shapiro wrote:
> > This technology is based on *volitile* memory chips (unlike the one that
> > Constantine pointed out.) While it has a battery backup, an extended
> > outage means you lose everything.
>Still, using it for the checkpoint log is probably worth looking into.
I agree. The military seems to be interested in the exotic technology
because they want instant-on. I don't care about instant-on; but for
distributed computing, I'm desperate for low latency stable storage so that
I can do instant commit. Then, I could afford to have machine X not release
message M until the state that had computed message M had been committed to.
If a disk seek is the cost of a commit, then many realistic distributed
computing scenarios cannot wait that long before releasing a message. If
the state from which the message was computed fails to be committed, now
we've got an "interesting" (in the sense of the Chinese curse) problem to
solve. (For obscure reasons, the E project calls this problem "hangover
inconsistency".) There are solutions -- Jonathan has a good one -- but it
would be great to get rid of the problem.
If all we're interested in is low latency stable storage, the obvious answer
to the battery-exhaustion problem is to have a disk and enough battery to
run it as well, and to use the battery backed-up ram just as a holding area
until the data is safely on disk. It would seem that a vanilla EROS system
+ a UPS sufficient to keep things going until the next two commits (ie,
until the next checkpoint is stable) could validly be considered an EROS
that never needed to roll back because of power outages.
Does EROS check invariants before taking a checkpoint? If
invariant-test-failure another source of possible rollback? To where (last
ram checkpoint or last disk checkpoint)? What other sources of possible
rollback are there?
Is it worth thinking separately about low latency journaling?