[cap-talk] FW: x.509 -- MD5 considered harmful today

zooko zooko at zooko.com
Wed Dec 31 11:52:25 EST 2008

On Dec 31, 2008, at 6:34 AM, Toby Murray wrote:

> The real upshot of this is that phasing out crypto algorithms is hard;


> but crypto algorithms are broken overnight, often without warning.

Waitaminute -- Dobbertin found collisions in the MD5 compression  
function in 1996 (and already by then there was reason to suspect  
that such feats against MD5 might be possible).

My reading of history suggests that cryptographers and engineers then  
gradually drifted apart over the next dozen years, with  
cryptographers deciding that MD5 was too weak and going on to invent  
and analyze SHA-1, then deciding that SHA-1 was too weak and going on  
to invent and analyze SHA-2, and currently they are deciding that  
SHA-2 is of uncertain reliability and they are engaged in inventing  
and analyzing SHA-3.  Meanwhile, engineers seem to have mostly  
settled into using either MD5, or SHA-1, or Tiger.

Do you have some other examples of sudden, catastrophic breaks of  
crypto algorithms that were formerly considered safe by cryptographers?

Not coincidentally, I posted a note about this to the SHA-3  
discussion list this week.  I'll append my note to this message.

> Another point to take home is that the entire HTTPS / PKI  
> infrastructure is only as strong as the weakest Certificate Authority.

This is a good point.

> The only real assumption to make here is, given how easily (in  
> relative terms) they pulled this off, that someone else with bigger  
> pockets and a stronger incentive to do so must have already done it.

That's interesting!  But if anybody had done *this* then we would  
have probably found out about it -- this particular attack leaves  
indeniable evidence lying around.

I imagine that groups such as nation-state "cyber war" agencies,  
wealthy criminals, etc. may have plenty of tools like these all  
primed and ready but they aren't using them because once they are  
revealed then people will build defenses against them.



------- begin appended message that I posted to the SHA-3 discussion  

	From: 	  zooko at zooko.com
	To:		multiple recipients of list
	Subject: 	will SHA-3 replace the current standard secure hash  
algorithm -- MD5?
	Date: 	December 22, 2008 15:02:40 PM MST


Below, I re-post a letter that I wrote to this list last February.  I  
think that this letter, which some of you may not have seen, casts  
light on why we have different assumptions about valid security/ 
performance trade-offs.  It is because secure hashes are used today  
for more than just their original purpose.

Since I wrote this letter, I did some more snooping around, and I  
learned that the situation is even more extreme than I thought -- for  
some areas of endeavour, MD5 is actually the standard secure hash  
algorithm in 2008.

I chatted with a couple of friends who are information security  
consultants -- they get paid big bucks by household-name corporations  
to audit source code and systems for security flaws.  I asked them  
what kinds of secure hash functions they see used in the wild.  They  
answered that MD5 was the most common, occasionally SHA-1, in large  
part because it is a default value on the Java Cryptography  
Extensions, and they have never seen any other secure hash functions  
in client systems.

I chatted with a friend who works at the Internet Archive -- all  
files stored at the Internet Archive are identified by their MD5 hashes.

I noticed that there was a new release of the Haskell compiler GHC.   
One of the new features is that it uses MD5 to identify code modules.

I learned more about the "computer forensics" field.  MD5 appears to  
be the standard mechanism to identify files in that field.  I read  
discussion forums in which computer forensics practitioners asked  
each other whether the cryptographic attacks on MD5 that they had  
heard about meant that they needed to change their practice.  The  
consensus seemed to be that they could continue using MD5 for now.

Finally, I was intrigued to see that NIST, of all organizations, uses  
and recommends the use of MD5 (in addition to SHA-1), as part of its  
"National Software Reference Library", which supports digital  
forensics.  This document explaining why NIST believes that this is  
safe is fascinating:


The wide gap between the performance needs of using a secure hash  
function for public key cryptography versus using it for bulk data  
identification and integrity checking (which is what I use it for at  
my day job), make me wonder if SHA-3 should include variants or  
officially recommended tuning parameters so that people identifying  
large files can use a SHA-3 which is at least as fast as MD5 or  
Tiger, while people who are signing thousand-year documents can use a  
SHA-3 which is more expensive but safer.  (By the way, I tend to  
think that HMAC shouldn't be weighted heavily as a use case for SHA-3  
simply because people should stop using HMAC and start using Carter- 
Wegman MACs instead such as Poly1305 or VMAC.)


Zooko O'Whielacronx

------- begin appended message re-post
From: zooko
Date: February 5, 2008 12:15:53 PM MST
To: Multiple recipients of list
Subject: bulk data use cases -- SHA-256 is too slow

Cryptographic hash functions were invented for hashing small variable- 
length strings, such as human-readable text documents, public keys,  
or certificates, into tiny fixed-length strings in order to sign  
them. When considering such usage, the inputs to the hash function  
are short -- often only hundreds or thousands of bytes, rarely as  
much as a million bytes. Also, the computational cost of the hash  
function is likely to be swamped by the computational cost of the  
public key operation.

Later, hash functions were pressed into service in MACs as  
exemplified by HMAC. In that usage, the inputs to the hash function  
tend to be small -- typically hundreds of bytes in a network packet.  
Also, the network is often the limiting factor on performance, in  
which case the time to compute the MAC is not the performance  

I would like to draw your attention to another way that cryptographic  
hash functions have been pressed into service -- as core security  
mechanisms in a myriad of bulk data systems. Examples include local  
filesystems (e.g. ZFS [1]), decentralized filesystems (e.g. a project  
that I hack on: allmydata.org [2]), p2p file-sharing tools (e.g.  
BitTorrent [3], Bitzi [4]), decentralized revision control tools  
(e.g. monotone [5], git [6], mercurial [7], darcs [8]), intrusion  
detection systems (e.g. Samhain [9]), and software package tools  
(e.g. Microsoft CLR strong names [10], Python setuptools [11], Debian  
control files [12], Ubuntu system-integrity-check [13]).

Commonly in this third category of uses the size of the data being  
hashed can be large -- millions, billions or even trillions of bytes  
at once -- and there is no public key operation or network delay to  
hide the cost of the hash function. The hash function typically sits  
squarely on the critical path of certain operations, and the speed of  
the hash function is the limiting factor for the speed of those  

Something else common about these applications are that the designers  
are cryptographically unsophisticated, compared to designers in the  
earlier two use cases. It is not uncommon within those communities  
for the designers to believe that hash collisions are not a problem  
as long as second pre-image attacks are impossible, or to believe  
that the natural redundancy and structure of their formats protect  
them ("only meaningless files can have hash collisions", they say).

A consequence of these conditions is that raw speed of a hash  
function is very important for adoption in these systems. If you  
browse the references I've given above, you'll find that SHA-1,  
Tiger, and MD5 (!!) are commonly used, and SHA-256 is rare. In fact,  
of all the examples listed above, SHA-256 is used only in my own  
project -- allmydata.org. It is available in ZFS, but it is never  
turned on because it is too slow compared to the alternative non- 
cryptographic checksum.

I should emphasize that this is not just a matter of legacy -- it is  
not just that these older hash functions have been "grandfathered  
in". Legacy is certainly a very important part of it, but newly  
designed and deployed systems often use SHA-1. Linus Torvalds chose  
to use SHA-1 in his newly designed "git" decentralized revision  
control tool, *after* the original 2005-02-15 Wang et al. attack was  
announced, and roundly mocked people who suggested that he choose a  
more secure alternative [6]. I recently plead with the developers of  
the "darcs" revision control tool that they should not use SHA-1 for  
their new, backwards-incompatible design. (The issue currently hangs  
on whether I can find a sufficiently fast implementation of SHA-256  
or Tiger with Haskell bindings.)

Because of my exposure to these systems, I was surprised to see a few  
comments recently on this mailing list that SHA-256 is fast enough.  
My surprise abated when I decided that the commentors are coming from  
a background where the first two use cases -- public key signatures  
and MACs -- are common, and they may not be aware that SHA-256 is  
potentially too slow for some other use cases.


Zooko O'Whielacronx

[1] http://www.solarisinternals.com/wiki/index.php/ 
[2] http://allmydata.org
[3] http://en.wikipedia.org/wiki/BitTorrent_%28protocol%29
[4] http://bitzi.com/developer/bitprint
[5] http://www.venge.net/mtn-wiki/FutureCryptography
[6] http://www.gelato.unsw.edu.au/archives/git/0506/5299.html
[7] http://www.selenic.com/pipermail/mercurial/2005-August/003832.html
[8] http://www.nabble.com/announcing-darcs-2.0.0pre3- 
[9] http://la-samhna.de/samhain/manual/hash-function.html
[10] http://blogs.msdn.com/shawnfa/archive/2005/02/28/382027.aspx
[11] http://peak.telecommunity.com/DevCenter/setuptools
[12] http://www.debian.org/doc/debian-policy/ch-controlfields.html#s- 
[13] https://wiki.ubuntu.com/IntegrityCheck

More information about the cap-talk mailing list