Persistence and transactions shapj@us.ibm.com
Mon, 27 Sep 1999 09:23:06 -0400

One of the challenges with capability systems comes from storing them. If you allow objects containing capabilities to be written to disk, then you get into a bunch of update ordering issues. The general rule is that you want the object to get written before any of its capabilities get written, but in a capability system there can be circular write dependencies. This is of the reasons that EROS uses transparent systemwide persistence.

However, persistence seems to generate confusion (in humans, often including me) about how things like transactions work. The other day somebody asked me to explain that to them and I botched it horribly. This note is an attempt to get it right.

TRANSACTIONS IN NON-PERSISTENT SYSTEMS

Before proceeding to a discussion of transactions in persistent systems, I want to draw attention to three things that all correct transaction clients must deal with:

  1. Network Failure
  2. Commit agreement failure
  3. No blind updates

I want to describe each in turn and what happens when they occur. Then I'll get to persistent systems.

  1. Network Failure

Routers may go down at any time. Therefore, at any point where there is an interaction between transaction client and transaction server -- up to and including the commit -- either client or server may get the error "your network connection has died". Generally, connection keepalives are used in this situation to allow both sides to do an orderly timeout and independently abort the connection.

The point is that a correct transaction client must be programmed to allow for the possibility of a network failure.

2. Commit Agreement Failure

I'm sure there is a technical term for this, and I don't know what it is. The issue is that in any commit architecture, it is possible for a failure to occur in such a way that (a) the commit has succeeded, and (b) the client isn't told. Consider, for example, the following two-phase commit scenario:

     Client         Server         Phone Company
     Prepare
     to commit
               Ready to
               Commit

     Commit
                         Cuts line with backhoe

               Committed

     ????

Don't think this doesn't happen -- it's happened to me.

This is really just a special case of case (1). What is interesting about it is that it occurs *after* the commit. It is unknown to the client whether the commit succeeded or failed.

3. No Blind Updates

The following code logic is NOT correct:

     begin transaction
          check if some update has occurred
     end transaction
     ....
     begin transaction
          do update
     end transaction

The problem is that the update might be done by someone else between the transactions.

The one case where blind update is arguably "safe" is the case where you are doing an initial data load of a dataset. Even there, humans forget and do the process more than once, and I'ld argue that the first step of the transaction ought to be a check to see if the table already exists.

II. PERSISTENCE AND TRANSACTIONS: THE SERVER PERSPECTIVE

From the server perspective, persistence doesn't introduce any complications, but rollback does. The problem is that commitments must not be undone if the machine crashes. The bad sequence is

     take checkpoint
     begin transaction A
     end transaction A (committing)
     crash

After the failure, the commit of transaction A must not be lost.

EROS solves this problem by building an exception into the checkpoint mechanism called journalling. This allows a database to say "this here page must come back, even if a failure occurs." When the database system does a commit, it first writes the necessary information into a write-ahead log, journals those pages, and then announces that the commit has occurred. On restart, any modifications in the write-ahead log are re-applied, so transactions are not lost by the server.

III. PERSISTENCE AND TRANSACTIONS: THE CLIENT PERSPECTIVE

I'll go through the cases in a minute, but the crucial thing to remember is that network connections are NOT included in the persistence contract. The upshot of this is that the client will see what appears to be a network failure when the system recovers from a checkpoint.

Here are the scenarios and consequences as seen by the client

     Client              System
                    Take Checkpoint A


(failures here are not seen
by the client) Open server connection
(failures after the connection
manifest as network failures) Begin transaction Take Checkpoint B
(failures after ckpt B still
manifest as network failures) Commit
(rollback to Ckpt B will see
network failure, but commit has definitely occurred. See Note 1) Take Checkpoint C
(failures from here will see
that the network failed after the commit) Close Connection

Note 1: This case must already be handled by correct clients, per failures (1) and (2) described above.

The only remaining problem is the situation where a client has some sort of work queue of pending requests, and sits in a loop of the form

     while (more in work queue)
          grab next job
          begin transaction
               handle job
          end transaction

In this situation, items in the work might be processed more than once, if the checkpoint occurs as follows:

     while (more in work queue)
          grab next job
          TAKE_CHECKPOINT()
          begin transaction
               handle job
          end transaction
          SYSTEM_CRASH()

There is, however, an easy fix to this, which is to *simulate* a network failure. What you do is build a transaction into the initial connection to the database, and have both sides count the successful commits. You then alter the begin_transaction() code to send the number of commits that the client believes have occurred in the current session. If the server does not agree, it immediately aborts the transaction.

Where the remote database does not directly support such a protocol, it can be synthesized on the client side using a database front end that is built on the logging mechanism.

Jonathan S. Shapiro, Ph. D.
IBM T.J. Watson Research Center
Email: shapj@us.ibm.com
Phone: +1 914 784 7085 (Tieline: 863)
Fax: +1 914 784 7595