[cap-talk] FW: x.509 -- MD5 considered harmful today
zooko
zooko at zooko.com
Wed Dec 31 11:52:25 EST 2008
On Dec 31, 2008, at 6:34 AM, Toby Murray wrote:
> The real upshot of this is that phasing out crypto algorithms is hard;
Certainly.
> but crypto algorithms are broken overnight, often without warning.
Waitaminute -- Dobbertin found collisions in the MD5 compression
function in 1996 (and already by then there was reason to suspect
that such feats against MD5 might be possible).
My reading of history suggests that cryptographers and engineers then
gradually drifted apart over the next dozen years, with
cryptographers deciding that MD5 was too weak and going on to invent
and analyze SHA-1, then deciding that SHA-1 was too weak and going on
to invent and analyze SHA-2, and currently they are deciding that
SHA-2 is of uncertain reliability and they are engaged in inventing
and analyzing SHA-3. Meanwhile, engineers seem to have mostly
settled into using either MD5, or SHA-1, or Tiger.
Do you have some other examples of sudden, catastrophic breaks of
crypto algorithms that were formerly considered safe by cryptographers?
Not coincidentally, I posted a note about this to the SHA-3
discussion list this week. I'll append my note to this message.
> Another point to take home is that the entire HTTPS / PKI
> infrastructure is only as strong as the weakest Certificate Authority.
This is a good point.
> The only real assumption to make here is, given how easily (in
> relative terms) they pulled this off, that someone else with bigger
> pockets and a stronger incentive to do so must have already done it.
That's interesting! But if anybody had done *this* then we would
have probably found out about it -- this particular attack leaves
indeniable evidence lying around.
I imagine that groups such as nation-state "cyber war" agencies,
wealthy criminals, etc. may have plenty of tools like these all
primed and ready but they aren't using them because once they are
revealed then people will build defenses against them.
Regards,
Zooko
------- begin appended message that I posted to the SHA-3 discussion
group
From: zooko at zooko.com
To: multiple recipients of list
Subject: will SHA-3 replace the current standard secure hash
algorithm -- MD5?
Date: December 22, 2008 15:02:40 PM MST
Folks:
Below, I re-post a letter that I wrote to this list last February. I
think that this letter, which some of you may not have seen, casts
light on why we have different assumptions about valid security/
performance trade-offs. It is because secure hashes are used today
for more than just their original purpose.
Since I wrote this letter, I did some more snooping around, and I
learned that the situation is even more extreme than I thought -- for
some areas of endeavour, MD5 is actually the standard secure hash
algorithm in 2008.
I chatted with a couple of friends who are information security
consultants -- they get paid big bucks by household-name corporations
to audit source code and systems for security flaws. I asked them
what kinds of secure hash functions they see used in the wild. They
answered that MD5 was the most common, occasionally SHA-1, in large
part because it is a default value on the Java Cryptography
Extensions, and they have never seen any other secure hash functions
in client systems.
I chatted with a friend who works at the Internet Archive -- all
files stored at the Internet Archive are identified by their MD5 hashes.
I noticed that there was a new release of the Haskell compiler GHC.
One of the new features is that it uses MD5 to identify code modules.
I learned more about the "computer forensics" field. MD5 appears to
be the standard mechanism to identify files in that field. I read
discussion forums in which computer forensics practitioners asked
each other whether the cryptographic attacks on MD5 that they had
heard about meant that they needed to change their practice. The
consensus seemed to be that they could continue using MD5 for now.
Finally, I was intrigued to see that NIST, of all organizations, uses
and recommends the use of MD5 (in addition to SHA-1), as part of its
"National Software Reference Library", which supports digital
forensics. This document explaining why NIST believes that this is
safe is fascinating:
http://www.nsrl.nist.gov/Documents/analysis/draft-060530.pdf
The wide gap between the performance needs of using a secure hash
function for public key cryptography versus using it for bulk data
identification and integrity checking (which is what I use it for at
my day job), make me wonder if SHA-3 should include variants or
officially recommended tuning parameters so that people identifying
large files can use a SHA-3 which is at least as fast as MD5 or
Tiger, while people who are signing thousand-year documents can use a
SHA-3 which is more expensive but safer. (By the way, I tend to
think that HMAC shouldn't be weighted heavily as a use case for SHA-3
simply because people should stop using HMAC and start using Carter-
Wegman MACs instead such as Poly1305 or VMAC.)
Regards,
Zooko O'Whielacronx
------- begin appended message re-post
From: zooko
Date: February 5, 2008 12:15:53 PM MST
To: Multiple recipients of list
Subject: bulk data use cases -- SHA-256 is too slow
Folks:
Cryptographic hash functions were invented for hashing small variable-
length strings, such as human-readable text documents, public keys,
or certificates, into tiny fixed-length strings in order to sign
them. When considering such usage, the inputs to the hash function
are short -- often only hundreds or thousands of bytes, rarely as
much as a million bytes. Also, the computational cost of the hash
function is likely to be swamped by the computational cost of the
public key operation.
Later, hash functions were pressed into service in MACs as
exemplified by HMAC. In that usage, the inputs to the hash function
tend to be small -- typically hundreds of bytes in a network packet.
Also, the network is often the limiting factor on performance, in
which case the time to compute the MAC is not the performance
bottleneck.
I would like to draw your attention to another way that cryptographic
hash functions have been pressed into service -- as core security
mechanisms in a myriad of bulk data systems. Examples include local
filesystems (e.g. ZFS [1]), decentralized filesystems (e.g. a project
that I hack on: allmydata.org [2]), p2p file-sharing tools (e.g.
BitTorrent [3], Bitzi [4]), decentralized revision control tools
(e.g. monotone [5], git [6], mercurial [7], darcs [8]), intrusion
detection systems (e.g. Samhain [9]), and software package tools
(e.g. Microsoft CLR strong names [10], Python setuptools [11], Debian
control files [12], Ubuntu system-integrity-check [13]).
Commonly in this third category of uses the size of the data being
hashed can be large -- millions, billions or even trillions of bytes
at once -- and there is no public key operation or network delay to
hide the cost of the hash function. The hash function typically sits
squarely on the critical path of certain operations, and the speed of
the hash function is the limiting factor for the speed of those
operations.
Something else common about these applications are that the designers
are cryptographically unsophisticated, compared to designers in the
earlier two use cases. It is not uncommon within those communities
for the designers to believe that hash collisions are not a problem
as long as second pre-image attacks are impossible, or to believe
that the natural redundancy and structure of their formats protect
them ("only meaningless files can have hash collisions", they say).
A consequence of these conditions is that raw speed of a hash
function is very important for adoption in these systems. If you
browse the references I've given above, you'll find that SHA-1,
Tiger, and MD5 (!!) are commonly used, and SHA-256 is rare. In fact,
of all the examples listed above, SHA-256 is used only in my own
project -- allmydata.org. It is available in ZFS, but it is never
turned on because it is too slow compared to the alternative non-
cryptographic checksum.
I should emphasize that this is not just a matter of legacy -- it is
not just that these older hash functions have been "grandfathered
in". Legacy is certainly a very important part of it, but newly
designed and deployed systems often use SHA-1. Linus Torvalds chose
to use SHA-1 in his newly designed "git" decentralized revision
control tool, *after* the original 2005-02-15 Wang et al. attack was
announced, and roundly mocked people who suggested that he choose a
more secure alternative [6]. I recently plead with the developers of
the "darcs" revision control tool that they should not use SHA-1 for
their new, backwards-incompatible design. (The issue currently hangs
on whether I can find a sufficiently fast implementation of SHA-256
or Tiger with Haskell bindings.)
Because of my exposure to these systems, I was surprised to see a few
comments recently on this mailing list that SHA-256 is fast enough.
My surprise abated when I decided that the commentors are coming from
a background where the first two use cases -- public key signatures
and MACs -- are common, and they may not be aware that SHA-256 is
potentially too slow for some other use cases.
Regards,
Zooko O'Whielacronx
[1] http://www.solarisinternals.com/wiki/index.php/
ZFS_Evil_Tuning_Guide#Tuning_ZFS_Checksums
[2] http://allmydata.org
[3] http://en.wikipedia.org/wiki/BitTorrent_%28protocol%29
[4] http://bitzi.com/developer/bitprint
[5] http://www.venge.net/mtn-wiki/FutureCryptography
[6] http://www.gelato.unsw.edu.au/archives/git/0506/5299.html
[7] http://www.selenic.com/pipermail/mercurial/2005-August/003832.html
[8] http://www.nabble.com/announcing-darcs-2.0.0pre3-
tt15027931.html#a15048993
[9] http://la-samhna.de/samhain/manual/hash-function.html
[10] http://blogs.msdn.com/shawnfa/archive/2005/02/28/382027.aspx
[11] http://peak.telecommunity.com/DevCenter/setuptools
[12] http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-
f-Files
[13] https://wiki.ubuntu.com/IntegrityCheck
More information about the cap-talk
mailing list