DataComm bug Bill Frantz (frantz@communities.com)
Thu, 10 Dec 1998 17:06:16 -0800

I was getting the DataComm test programs in shape to ship out to MarkM for distribution and I discovered a significant bug in suspend/resume. (Bit rot strikes again. This test used to work.) This bug also "turned on the light bulb" about a bug which existed in the old Electric Communities ECHabitats release 167 of almost a year ago. Since at least MarkM and I have been baffled by that bug, I thought I would write it up for the list.

The symptoms of both bugs are the same. RecvThread reports a "Checksum mismatch" when trying to verify the Message Authentication Code (MAC) on a message. This error causes both the received and calculated MACs to be printed. Careful examination shows that they differ by one bit, usually in byte 6 (counting from 0).

This bug only occurs when using encryption. (When using just authentication, there should be many more bits different in the MAC.) What is happening in the current case is that when the connection is being suspended, DataComm is saving the message sequence number from the receive side one count before the last message. After the connection is resumed, the send side will be using an encryption IV which is one higher than the receive side. This will result in a one bit error in the received data (which always falls in the MAC). (The error propagation characteristics of Cypher Block Chaining are such that the rest of the blocks are not effected.)

In E, this bug is solid with the current test case. I believe that in R167 the bug occurred only if one side actually sent a message after the suspend, and that only occurred because of an obscure timing race.