[cap-talk] use of hashcodes?

David Barbour dmbarbour at gmail.com
Sat Feb 20 21:55:38 PST 2010


I'm not fond of signatures / hash based security, for a few reasons:

First, I'm unconvinced of the protection hash signatures offer.

Read http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/

The attack described there isn't generally applicable. SHA-256 hasn't
been broken or weakened - not yet. And HMAC is effectively a secret
key scheme: one prepends some secret content to the document before
hashing it; one isn't just signing the hash, and one double-hashes
(again with the key) to reduce probability of using the hash to break
the key. HMAC can work even with broken hash functions. But, still,
the linked document was enough to make me question the viability of
hashes as a security mechanism.

Second, and more importantly, I don't believe hash semantics are very
good for performance - not given a proper environment, and not on any
scale (small or large). Both hashes and encryption can be used for
authentication, yet the fundamental reason to use hashes over regular
encryption is performance related. If the performance benefits of
hashes can be defeated in a proper environment, then perhaps it would
be better to set up said proper environment than it would be to
continue using of hashes.

The relevant semantics for asymmetric encryption is captured by E
style Seal/Unseal pairs, with the semantics that for arbitrary message
M, M==Unseal(Seal(M)).
* Seal(M) is usually not comparable. That is, Seal(M)==Seal(M) is
undefined and naive (bitwise) comparison may easily yield False; this
allows injection of some random data to help protect the Seal
function.
* Necessarily, Seal(M) must have an equal information content to M,
and therefore must be of a size equal to or greater than the
compressed size of M.

The relevant semantics for asymmetric hashes are captured by
Stamp/Verify pairs, with the semantics that for arbitrary message M,
Verify(M,Stamp(M)) is True.
* Stamp(M) is not necessarily comparable. That is, Stamp(M) ==
Stamp(M) is undefined and (if compared bitwise) might return false due
to different stamping contexts.
* Note that one does NOT promise that M1 != M2 implies
Verify(M1,Stamp(M2)) is False. However, a secure Stamp makes it
cryptographically expensive to produce such collisions.
* Use of hashes allows Stamp(M) to be much smaller than M.

It is trivial to produce a Stamp/Verify pair from a Sealer/Unsealer pair:
* Stamp = Seal
* Verify = function(Msg,Sig) { return (Msg == Unseal(Sig)) }

However, with Seal/Unseal, one doesn't actually need the Verify
function; the message can usually be determined authentic simply by
the fact that Unseal could extract something sensibly typed from the
Sealed message.

Symmetric approaches are similar to the asymmetric designs, except
that Stamp/Verify or Seal/Unseal must be distributed as a pair, can't
be separated from one another, which effectively limits how they are
used. OTOH, they tend to be a bit cheaper. But the asymmetric case is
inherently more general than the symmetric case.

Hashes presumably offer a performance benefit because Stamp(M) can be
much smaller than Seal(M) for sufficiently large values of M. Of
course, to authenticate a message with a stamp, one must send (M,
Stamp(M)), whereas to authenticate a message with a seal one must send
only Seal(M). So in the single-auth case, the communications cost are
about equal. But if three different agencies wish to authenticate a
single message, one could send (M,[Stamp1(M),Stamp2(M),Stamp3(M)]),
which in the communications environments of today is likely much
cheaper than [Seal1(M),Seal2(M),Seal3(M)] for a sufficiently large
'M'.

A relevant question is: in which environments is a sealer-based
authentication cheaper than equivalently-secure hash-based
authentication?

Well, assume an environment with code-distribution of various forms,
where sometimes sender and receiver happen to be on the same host, and
sometimes they happen to be on different hosts. Having senders and
receivers on the same host is the 'small scale' variation of
authentication. If they are on different hosts, that's 'large scale'.

In E language if the seal and unseal behaviors both occur on the same
host (same vat), one can logically eliminate the overhead of actually
encrypting anything - taking advantage of the fact that
Unseal(Seal(M))==M is guaranteed. Presumably, one could even treat
Seal and Unseal as creating static phantom types, such that the
small-scale authentication is proven at compile-time without any
runtime overhead. One could do something similar for
Verify(M,Stamp(M)), and this would often be the case. However, what
about Verify(M1,Stamp(M2)) where M1 != M2? With a sealer-based stamp,
knowing that M1 != M2 would be sufficient to guarantee that
Verify(M1,Stamp(M2)) would be false - one can leverage this for
partial-evaluations and a number of other local optimizations, and one
can guarantee the semantics will be the same whether the seal was
performed remotely or locally. With hash-based authentication,
however, you must either give up identical semantics between remote
and local users OR actually perform the hash operation to know whether
Verify(M1,Stamp(M2)) will be true or false.  So, at the small scale,
it seems that Seal/Unseal has at least a marginal advantage over
Stamp/Verify, simply due to better mathematical properties for
optimization.

But we're also interested in large scale, distributed communications.

At the large scale, I believe that Seal/Unseal can easily do just as
well as hash-based approaches. I noted in passing earlier that Seal(M)
must be of a size greater or equal to the "compressed size" of M. But
what sort of compression is allowed?

In English, we can communicate a lot of information by using Names
that reference mutually accessible information resources, and one
might call this 'name-based' compression. Rather than sending a large
blob of data, one sends a mutually accessible reference - a capability
- that identifies the large blob of data. Then one can start
communicating using names-for-values rather than the values
themselves. If certain names become common in the protocol - used
often - one essentially has developed new words: one can talk of
applying MPEG-4 encoding to a video stream... without actually
communicating the entire MPEG-4 algorithm. Names can be used to
compress common values repeatedly, multiplying the savings in
communications. Ensuring mutual access to the value can be as trivial
as holding it on your own machine (which is secure, but might not
survive destruction of the node), or can be secure and redundant
distributed across a cloud in small chunks that require accessing N
different hosts to fetch the whole thing (similar to Tahoe).
Relevantly, names can naturally grow from belonging to just a pair of
hosts, to becoming distributed and turned into true distributed
capabilities.

Seal(NameForM) may, reasonably, be much smaller than Seal(M), for a
sufficiently large M. And thus
[Seal1(NameForM),Seal2(NameForM),Seal3(NameForM)] isn't all that much
more expensive than [Stamp1(M),Stamp2(M),Stamp3(M)], though producing
names - especially secure names for more than pairwise use between
hosts - isn't 'cheap' and so the absolute cost may be a bit larger. In
both cases, the actual value for M need only be communicated to a
particular host just once, then it may be cached.

The name-based protocols are useful for a number of other designs,
such as secure names for streaming or lazily produced data (allowing
intermediate processes to actually redirect and relocate the streams
without seeing the data). So it is something that we'd likely want
even without the benefits for 'compressing' Sealed messages. But for
use in the wild, we'll need capability-secure protocols to become more
common... I'm thinking that some combination of Pluribus and Tapestry
or Chord would be nice. And those protocols will benefit from
first-class support for several 'types' of capabilities, including:
* named-immutable-value caps, used for name-based compression,
cursors, lazy value streaming, et. al; one can leverage the
'immutability' a great deal.
* object caps
* single-use caps (suitable for replies), allowing cheaper proxies
where desired, perhaps with option recommending true secure exchange
protocols (cap passes between trust-domains is renamed) for exclusive
access
* reactive data resource caps (protocol know caching and multi-cast)
* simple event resource caps (protocol keeps a brief history, also
knows multi-cast)
* each in pair-wise (cheap incremental numbers) and distributed (large
random number) variations, with the ability to transparently escalate
a pairwise cap to a distributed one
* * any encrypted cap must be escalated to a distributed one unless
the unsealer is only owned by one host.

Since I'm convinced we'll want named value caps regardless - even in
the absence of encryption - it hardly hurts to leverage them to
achieve cheap authentication of messages!

Intermediate to large and small scale is the 'trust domain' - multiple
hosts that are separate from one another, but are trusted.
Interestingly, using Sealer/Unsealers, one can produce a 'trusted
domain' automatically:
* I Seal a message and hand it to you.
* You realize that you're consuming the message locally (not
forwarding it to another party) and ask to be part of a trusted domain
wrgt. a particular Unsealer and Capability.
* I Seal a challenge for proof that you have a particular Unsealer.
* You answer the challenge.
* I stop Sealing messages to you on a particular Capability, since you
can unseal them anyway. (The communications stream might still be
encrypted, though, separate from the Sealer/Unsealer, as part of the
host-to-host protocol.)
* Thus we avoid some encryption costs, without loss of security.

Thus, we can also reduce Sealer/Unsealer costs even between hosts or
vats, should our protocols become sufficiently complex to allow it.

Costs aside, though, it's the small-scale semantics-vs.-performance
issues of Verify(M1,Stamp(M2)) that really convinced me to shy away
from hashes. The possibility of including such issues in my language
semantics made me feel dreadfully unclean, even apart from the whole
issue of 'hashing' capabilities that might later need to be renamed.
The protocol approaches I describe above were my TRIZ-based solution
to avoid compromise and tradeoffs between clean semantics and
performance.


More information about the cap-talk mailing list