Talk:SHA-1/Archive 1

Latest comment: 14 years ago by Intgr in topic SHA-1 and SHA-2 split
Archive 1

rot? shr?

In the "comparison" section, the primitive operations listed include "rot" and "shr". If this is rotation (i.e. left bit shift) and shift-right, then these abbreviations should, I think, be clearer. If they're something else ... then that should be clearer, too. --Jim Mahoney (talk) 14:42, 28 April 2009 (UTC)

80 000 CPU hours?!

(80 000/24)/7 = 476.190476190476.. Slightly more than a year.

Has such itanium2-based supercomputers even existed for that long???

The computer had 256 processors running in parallel, so you'll have to divide the time by that number. Fredrik | talk 23:38, 7 Apr 2005 (UTC)
Which gives a number slightly less than two days of continuous running assuming that such a multi-core network has no loss of efficiency. That said, the cost of such a supercomputer would be rather large. --I (talk) 06:15, 28 January 2009 (UTC)

Patent status?

But primary question is, is this method unencumbered by patents or other protections? ~ender 2003-04-19 01:50 MST

It is generally considered to be patent-free (although that doesn't mean it is). In particular, see this letter from NIST. --Zundark 20:53, 2 Jun 2004 (UTC)

What's the deal with the patent referred to in the article? The article does appear to relate to the SHA variants, yet I can't find anybody talking about it on the web, and the IETF lists no IP disclosures relating to the SHA-224 RFC. What gives? Lunkwill 00:33, 27 September 2005 (UTC)

As I understand - they are free to use. Some implemintations can be covered with patent, not algorithm itself. Check this out: http://grouper.ieee.org/groups/1363/P1363/letters/NIST.txt--213.208.190.162 15:45, 30 October 2005 (UTC)
The letter linked above is from NIST and refers to SHA and SHA-1, not SHA-256 and SHA-512, which were patented by NSA and after NIST's letter was written (patent #6,829,355).
Patent, yes, but there's no copyright in the USA, since works of the United States Government inherently are public domain.

Naming

This article is about five or six different variants of SHA, which is fine, but we could probably do with a slightly more general name. Suggestions include:

— Matt 07:50, 4 Aug 2004 (UTC)

Matt, One problem will be that people will have only heard of (or read about) SHA or SHA-1 or something. Perhaps a forest of redirects? ww 20:48, 4 Aug 2004 (UTC)
Oh certainly (moving, of course, automatically creates a redirect from the old location anyway). I think SHA might be best, as it's short. — Matt 12:15, 5 Aug 2004 (UTC)
The problem with SHA is that it will very likely end up being a disambiguation page, as there are other things it can stand for (e.g., the Society for Historical Archaeology, which is currently the first Google hit for SHA). So Secure Hash Algorithm seems like a better option. --Zundark 13:02, 5 Aug 2004 (UTC)
Good point; any nay-sayers to Secure Hash Algorithm? — Matt 13:34, 5 Aug 2004 (UTC)
I suppose the best choice would be SHA family as there are several, including the insecure SHA-0. Or maybe Secure Hash Algorithm Family, maybe. ww 18:32, 6 Aug 2004 (UTC)
OK, I've gone with SHA family, and made a few redirects from the other suggestions. — Matt 13:26, 13 Aug 2004 (UTC)

Pseudocode error

There seems to be an error in the Pseudocode: The assignments:

a = h0
b = h1
c = h2
d = h3
e = h4

must be done for every 512 bit chunk, not only at initial startup of the algorithm. At least that's what I had to change when implementing this algorithm before it produced the same results as the SHA-1 checksum tool I already had. 11:25, 10 Aug 2004 80.129.56.5

You're exactly right... I fixed it. You can fix it too, you know =^_^= -- Myria 05:38, 2 Nov 2004 (UTC)

SHA-2 hashes

Hi folks, do you find it useful to have additional testvectors for the SHA-2 variants on the page? I don't want to add them until I have heard some opinions. Jonelo 08:04, 6 Oct 2004 (UTC)

Further F3 optimizations

Colin Plumb discovered that F3,

(40 ≤ i ≤ 59): f(b,c,d) = ((b and c) or (d and (b or c))

can also be expressed as

(40 ≤ i ≤ 59): f(b,c,d) = ((b and c) + (d and (b xor c))

This allows the compiler to optimize to make use of the associativity of addition, as well as enables the compiler to use the x86 lea instruction to do 3-operand addition (x = y + z).

  • Someone had changed the range to 60 ≤ i ≤ 79, changed that back. 200.175.25.158 18:32, 24 July 2005 (UTC)
  • The expressions given above have one more right than left pren in each. So at the very least they've been typo'd. No source is given, so I can check neither the original form, nor whether Colin Plumb authored them. (comment by Nahaj though not originally marked)
    • Not anymore =) Because the expression is usually a #define, the expression is traditionally written with parentheses around it. Somehow I forgot the one on the left. I wrote this section and put this F3 function into the article, but forgot to tag it here. Many versions of SHA-1 on the Internet have these comments before the F-functions:
      • The SHA f()-functions. The f1 and f3 functions can be optimized to save one boolean operation each - thanks to Rich Schroeppel, rcs@cs.arizona.edu for discovering this.
      • The f3 function can be modified to use an addition to combine the two halves rather than OR, allowing more opportunity for using associativity in optimization. (Colin Plumb)
      • One such version: [1]
That answers my request for a citation quite well. Nahaj 17:37, 28 October 2005 (UTC)
    • ... I would like to cite them but I don't know how and I don't know what we would use as the source, since all I've seen in the source is this. By the way, you can prove that the functions are identical by combining induction with a truth table. -- Myria 06:43, 28 October 2005 (UTC)

Hi guys. JulesH slabbed a "citation needed" tag on these f computing "equivalent expressions" and I too wasn't so sure they were equivalent. So I spent some minutes making truth tables with them. And the result is that all of them are correct. Even the "+" one worked right! Here are the tables:

(0 <= i <= 19): 

f := (b and c) or ((not b) and d)

B C D 

0 0 0    (0 and 0) or (1 and 0)  =  0 or 0  =  0
0 0 1    (0 and 0) or (1 and 1)  =  0 or 1  =  1
0 1 0    (0 and 1) or (1 and 0)  =  0 or 0  =  0
0 1 1    (0 and 1) or (1 and 1)  =  0 or 1  =  1
1 0 0    (1 and 0) or (0 and 0)  =  0 or 0  =  0
1 0 1    (1 and 0) or (0 and 1)  =  0 or 0  =  0
1 1 0    (1 and 1) or (0 and 0)  =  1 or 0  =  1
1 1 1    (1 and 1) or (0 and 1)  =  1 or 0  =  1


f := d xor (b and (c xor d))

B C D 

0 0 0     0 xor (0 and (0 xor 0))  =  0 
0 0 1     1 xor (0 and (0 xor 1))  =  1 
0 1 0     0 xor (0 and (1 xor 0))  =  0
0 1 1     1 xor (0 and (1 xor 1))  =  1
1 0 0     0 xor (1 and (0 xor 0))  =  0
1 0 1     1 xor (1 and (0 xor 1))  =  0
1 1 0     0 xor (1 and (1 xor 0))  =  1
1 1 1     1 xor (1 and (1 xor 1))  =  1


Both functions do give  0 1 0 1 0 0 1 1  thus all is ok.
(40 <= i <= 59):

f := (b and c) or (b and d) or (c and d)

B C D 

0 0 0     (0 and 0) or (0 and 0) or (0 and 0)  =  0
0 0 1     (0 and 0) or (0 and 1) or (0 and 1)  =  0
0 1 0     (0 and 1) or (0 and 0) or (1 and 0)  =  0
0 1 1     (0 and 1) or (0 and 1) or (1 and 1)  =  1
1 0 0     (1 and 0) or (1 and 0) or (0 and 0)  =  0
1 0 1     (1 and 0) or (1 and 1) or (0 and 1)  =  1
1 1 0     (1 and 1) or (1 and 0) or (1 and 0)  =  1
1 1 1     (1 and 1) or (1 and 1) or (1 and 1)  =  1


f := (b and c) or (d and (b or c))

B C D 

0 0 0     (0 and 0) or (0 and (0 or 0))  =  0
0 0 1     (0 and 0) or (1 and (0 or 0))  =  0
0 1 0     (0 and 1) or (0 and (0 or 1))  =  0
0 1 1     (0 and 1) or (1 and (0 or 1))  =  1
1 0 0     (1 and 0) or (0 and (1 or 0))  =  0
1 0 1     (1 and 0) or (1 and (1 or 0))  =  1
1 1 0     (1 and 1) or (0 and (1 or 1))  =  1
1 1 1     (1 and 1) or (1 and (1 or 1))  =  1


f := (b and c) or (d and (b xor c))

B C D 

0 0 0     (0 and 0) or (0 and (0 xor 0))  =  0
0 0 1     (0 and 0) or (1 and (0 xor 0))  =  0
0 1 0     (0 and 1) or (0 and (0 xor 1))  =  0
0 1 1     (0 and 1) or (1 and (0 xor 1))  =  1
1 0 0     (1 and 0) or (0 and (1 xor 0))  =  0
1 0 1     (1 and 0) or (1 and (1 xor 0))  =  1
1 1 0     (1 and 1) or (0 and (1 xor 1))  =  1
1 1 1     (1 and 1) or (1 and (1 xor 1))  =  1


f := (b and c) + (d and (b xor c))

B C D 

0 0 0     (0 and 0) + (0 and (0 xor 0))  =  0 + 0  =  0
0 0 1     (0 and 0) + (1 and (0 xor 0))  =  0 + 0  =  0
0 1 0     (0 and 1) + (0 and (0 xor 1))  =  0 + 0  =  0
0 1 1     (0 and 1) + (1 and (0 xor 1))  =  0 + 1  =  1
1 0 0     (1 and 0) + (0 and (1 xor 0))  =  0 + 0  =  0
1 0 1     (1 and 0) + (1 and (1 xor 0))  =  0 + 1  =  1
1 1 0     (1 and 1) + (0 and (1 xor 1))  =  1 + 0  =  1
1 1 1     (1 and 1) + (1 and (1 xor 1))  =  1 + 0  =  1

All four functions do give  0 0 0 1 0 1 1 1  thus all is ok.

Note, we never did get 1+1 thus no bit did overflow
to the next bit in the word. So the "+" here sort of 
just works as a bitwise "or".

As you can see, they are correct. Funny. --David Göthberg 07:09, 27 January 2007 (UTC)

This is original research, you missed the point of the {{fact}} tag. -- intgr 13:18, 27 January 2007 (UTC)
Intgr: Which part do you mean is original research? That people over time added these extra variants to the article? Or that I spent some minutes doing a trivial check if the variants are correct? (That is, trivial for me as a programmer.) If some one states in an article that 1234 * 2345 = 2893730 then it would be silly to ask for a reference, instead we would just check it on our calculators. This case is really the same thing.
But yeah, since those variants do look a bit weird some kind of link to an explanation would probably be good but I don't find it that necessary. By the way, I just checked several very old SHA1 source codes I have lying around and I found all but one of the variants in those source codes. So apparently these optimisations have been known for years.
--David Göthberg 14:08, 27 January 2007 (UTC)
Well, strictly, both are original research as long as they don't come from a source. But I didn't really give them a thought earlier, and they do appear quite straightforward boolean function optimizations. So I agree that they probably don't need to be sourced.
And by the way, why did you omit the 7th combination (b=1, c=1, d=0) from your proof? Although all these alternatives also produce consistent results in this case. I was first confused by this, as I assumed their equivalence was somehow data-dependent. -- intgr 15:43, 27 January 2007 (UTC)
Oops! Slightly embarrassing for me... Ah well, just goes to show that "proofs" are way too susceptible to the human factor to really be proofs. But thankfully this is Wikipedia where many eyeballs make all bugs shallow. Now I updated the truth table with the 1 1 0 case. And yes, thankfully the expressions are still equivalent.
And yeah, if those expressions didn't come from a source then they would in the strict sense be original research. But I have found all but one of them in different SHA1 libs that I have laying around. Thus at least most of them are well known and there are sources. Question is if we should bother to state which crypto libs use which variant? That is, to state some sources? Seems unnecessary to me since this is as you said "straight forward boolean function optimizations". And any one implementing SHA1 should anyway test their implementation carefully by comparing with test vectors and other crypto libs and should then discover any bugs in their implementation.
--David Göthberg 05:37, 28 January 2007 (UTC)

Explanation

The article really doesn't explain why there are so many SHA versions, is SHA-512 better (but slower to do) then SHA-1? If someone knows, please add it in. --ShaunMacPherson 12:52, 6 Nov 2004 (UTC)

(I'll try and check this then add it to the article): As I understand it, I believe that the different SHA versions are used to correspond to different block cipher key lengths, so that all the components in a cryptosystem provide the same level of security. Say you have a block cipher with an n-bit key size, then to match that security with a hash function that you use in the same system, you need a hash size of 2n, because even a perfect hash function is subject to birthday attacks with a complexity corresponding to the square root of the hash size — i.e. n if the hash size is 2n. So SHA-1 has 160-bits, which matches Skipjack (cipher) (keysize 80 bits). SHA-256, SHA-384 and SHA-512 match AES with its possible key sizes of 128, 192 and 256 bits. SHA-224 was introduced to match the security of two-key Triple DES (2×56=112-bit key). — Matt 17:58, 6 Nov 2004 (UTC)

Design questions

Does anyone have any insight into why one of the arguments of the rotations in the compression function is a factor of the other, and why the 30-bit left rotation (taken as a 2-bit right rotation) divides the 32-bit word size? I don't have time to analyze it in depth at the moment, but it seems under casual inspection that you would get better mixing if the arguments were coprime... User:Inkling 10:51, 17 Jan 2005 (UTC)

Extra SHA-pseudocode

An anonymous user has kindly contributed pseudocode for SHA-256, in addition to the pseudocode for SHA-1. This is great to have, but I'm concerned that the article is a little over-burdened with pseudo-code at present. We don't want to lose this information, so perhaps we could move the SHA-256 description elsewhere: maybe a SHA-256 pseudocode page, or onto WikiSource...what do people think? — Matt Crypto 19:18, 14 Mar 2005 (UTC)

Hi, I recently added hash-it.net to the list of external links, and it was removed and marked as "spam". I would like to discuss this because I feel it is a valuable resource to people who would like to hash relatively short strings. I think that hash 'em all is fantastic because of it's wide support, but there is no need for a page request every time you want to hash something. Also, it's kind of weird to have "Online Hash Calculators" as a large heading and with just one link underneath it... —Preceding unsigned comment added by 98.216.116.86 (talk) 05:00, 7 September 2008 (UTC)

Hi -- I replied to your comments here. Feel free to re-add the link. Feezo (Talk) 05:48, 7 September 2008 (UTC)

It doesn't matter much, but I think a good rule of thumb is to have only one good link for a certain type of thing -- "Wikipedia is not a link repository", and all that. If we already have one link to a Javascript SHA calculator, why do we need another? Eventually, the External Links section starts to clutter up as developers inevitably add more links to their particular SHA implementations. To me, it seems easier just to say, "we only need one of those, thanks" at the outset. — Matt Crypto 01:07, 18 Mar 2005 (UTC)

Yes, right now we have three different Javascript tools, one should be enough also in my humble opinion, because all of them do the same: they calculate a SHA1 from a user input. However, the Javascript tools don't really take the character encoding into account and they are not able to calculate the message digest from a file, but the Java tool can. IMHO a Wikipedia user should have the freedom to choose the tool she/he needs, because usually there is a good reason why she/he went to Wikipedia. Therefore I think we should keep not only one Javascript link, but also the C and the Java link (the Java link has been added by myself ;-) Jonelo 18:57, 18 Mar 2005 (UTC).
I've removed a few links that provide bad hashes. Here are the correct SHA-1 hashes for some characters online calculators usually get tripped up on:
' = bb589d0621e5472f470fa3425a234c74b1e202e8
" = 2ace62c1befa19e3ea37dd52be9f6d508c5163e6
# = d08f88df745fa7950b104e4a707a31cfce7b5841
\ = 08534f33c201a45017b502e90a800f1b708ebcb3
& = 7c4d33785daa5c2370201ffa236b427aa37c9996
? = 5bab61eb53176449e25c2c82f172b82cb13ffb9d
'"#\&? = 699b7d43c2ceb01791d98817222b96c35714d1f6
I'll get around to cleaning up that section, eventually. ~MDD4696 23:06, 8 February 2006 (UTC)
at least http://serversniff.net/hash.php should provide correct hashes now - thanks for the hint, Mdd4696! Thomas Springer 12:26, 24 February 2006 (UTC)

QUESTION

I'd like to know more about how these functions are used and why they are important. Is it true that you run one of these algorithms over some text and have a code that cannot be converted back to the text? How is that useful? What are some common applications?

If this is is explained elsewhere perhaps you could link me in the right direction.

How about the cryptographic hash function article, and the hash function article linked from there? Lunkwill 07:38, 8 September 2005 (UTC)

Pseudocode clarification

This line in the code (both versions) is unclear.

"append length of message, in bits as 64-bit big-endian integer to message"

Does it refer to the original message length, or the message length after it has been padded and had the length added? I assume it is probably the original length. If so can someone change it to "appeng length of original message, in bits...."?

I thought it was pretty obvious when I wrote it. Yes, it's the length of the original message. What do others think about this request? -- Myria 04:30, 26 September 2005 (UTC)
I think "...append original length..." is better than "...append length...". Nahaj 12:24, 28 October 2005 (UTC)

Numbers, cant figure them out.

Taken from the article 25/09/05: "SHA-0 and SHA-1 produce a 160-bit digest"

If it's 160 bit, what does that mean ? does it mean that a input like "Red juice" has 160 bit diffrent outputs?

Taken from the article 25/09/05: "from a message with a maximum size of 2^64 bits" so i cant digest a message above 2^64 bit (2,3 exabytes) ?

Taken from the article 25/09/05: "At CRYPTO 98, two French researchers presented an attack on SHA-0 (Chabaud and Joux, 1998): collisions can be found with complexity 2^61; less than the 2^80 for an ideal hash function of the same size."

Where does 2^80 come from ? SHA0 is 160 bit, that would be 2^160 right ?

160 bit output means that the function mashes, stretches and squishes any input string into a 160-bit (20 byte) long output string. Correct, the hash functions won't work on messages longer than 2^64 bits (which, I might add, are rather hard to come by...). And 2^80 comes from the birthday paradox. Lunkwill 20:58, 25 September 2005 (UTC)

Optimizations

There are machines where a rotate or shift by N eats N cycles. Obviously the rotate left 30 could be replaced by a rotate right 2, speeding it up on such machines, and not slowing it on any machines I'm aware of. (: And there is a Validated implementation that does the rotate right 2 :) How much optimization is worth doing on pseudo code? (comment by Nahaj, tagged by Myria)

  • I personally draw the line at the point at which it ought to be the compiler's responsibility. The optimizations to the F-functions are universal - they improve the calculation on all systems, or at least grant a theoretical advantage (such as the associativity of addition). No reasonable compiler could look at the original F-functions and immediately deduce the rather odd changes needed to remove a boolean operation. However, using the symmetry of the rotate commands is very obvious and is easy to program a compiler to do. -- Myria 07:16, 28 October 2005 (UTC)
    • On another note, I wonder if we should mention that the expression (x << y) | (x >> (N - y)), where N == sizeof(x) * 8, automatically gets optimized to a rotate instruction on many compilers for platforms that have it like x86-GCC. -- Myria 07:16, 28 October 2005 (UTC)

How much does adding salt increase security

I'm assuming the SHA-1 collisions were done without adding a salt to the original message. Does anyone know how much a salt of a given size would add to the difficulty of creating a collision? e.g. would adding a 16 bit salt to the original message increase the best collision algorithm iterations from 2 to the 63 power to maybe 2 to the 66 power? -- Jweiler 19:24, 10 January 2006

These answers are based on the attack they found in 2005. But note that I don't know how the new optimised attack works, but I heard it is similar. Also note that I am not a cryptanalyst so I might be totally wrong about this.
Short answer: No, in some usage cases adding a salt does not make the attack harder. But under some circumstances a salt can stop the attack entirely. It depends on when and where the salt is added.
Long answer: As I understood it the attack they found in 2005 on SHA-1 is similar to the MD5 attack from the same year. That means they are changing some bytes in a special position close to the beginning of the message. That is, the attack method changes the same position, no matter what message you are working with at the moment. That means if the attacker creates the message, then he can change bytes in that position to create two messages that collide.
Due to the nature of hashes if you append something to the end of two colliding messages you will get a new hash, but the same hash for both messages. So still a collision. Thus appending the same salt to both messages will keep the collision and thus not protect you.
If you prepend anything to the two messages the position of the changed bytes will move. That means the attack won't work at all. So the two messages will not collide any more. So if you insert a salt or a message header or whatever to the very beginning of the message then you are sort of ok. At least against the 2005 attack.
However, if the attacker at the time he creates the two messages knows what header you are going to prepend to the message he can take that into account when he makes the message pair. That is, he will add the same header and then manipulate bytes in the proper position to find colliding messages. (And then of course remove the header before he gives you the messages.)
So this means that you must create an unpredictable random salt AFTER the messages are given/uploaded/sent to you by the attacker. And prepend it to the message, not append it.
Another approach is to prepend a fixed header that is so long that it runs over the bytes that the attacker needs to manipulate to do the attack. Thus he can not manipulate those bytes. And the header can be well known and public.
But note, now that they found two attacks it is likely that they will find more attacks. For instance, I think there are many more positions further into the message where you can manipulate bytes. So these "fixes" are not even worth the trouble. You better use something better instead. I think the wise thing is to do like the TLS standard does: Use two different hashes and combine them. That approach is probably still secure even if there are some attacks against both the used hashes.
Oh, but of course salts are a good thing for many other reasons.
--David Göthberg 19:48, 30 October 2006 (UTC)

Removed "implementations" section

There are 370 implementations of SHA-1 according to the article. We can't list them all, and the most notable ones are the ones we don't need to link to because they are bundled with Perl, Python, Java and so forth. The only reasonable way to avoid this section growing without bound is to remove it altogether.

I think the same goes for "web-based tools" which I also removed, but perhaps I'm wrong? — ciphergoth 21:40, 19 February 2006 (UTC)

It would be nice if someone could find an online directory that lists the web-based tools, and link to that instead. While the section was getting quite cluttered, I did find the links useful. ~MDD4696 01:38, 20 February 2006 (UTC)
I don't see any reason why we can't arbitrarily pick one or two good implementations and web based tools. It's a natural next step for people reading the article. Lunkwill 08:07, 20 February 2006 (UTC)
I think that would have some advantages, but it's very hard to be arbitrary here. When someone wants to add their pet implementation to the list, what reason can we give for reverting it? — ciphergoth 09:28, 20 February 2006 (UTC)
See WP:EL. We can't link arbitrarily, and there are too many good online hashers available to list them all. ~MDD4696 23:44, 20 February 2006 (UTC)

We do seem to have a lot of software authors adding their particular hashing site/program to Wikipedia; a comparable situation is MD5. I think it would be useful to include a link to one or two implementations, but no more. The question is, how do we decide? I believe we can be arbitrary, but the easiest route is to pick a few links which seem to be amongst of the best of its type (as Jason suggests), and then be very strict with new additions. If the authors are upset or think it's unfair, then stuff 'em; there is no obligation for us to advertise their project at all. Most people shouldn't be adding their site anyway as per the WP:EL guideline. The ideal would be to have a link to a web site which lists implementations, but I don't think one exists for SHA. — Matt Crypto 17:07, 21 February 2006 (UTC)


I don't know how someone would know there there are 370 implimentations if there wasn't list somewhere.

I guess I am still trying to understand why this is a problem. I think that the implimentations section and the web-based tools are very helpful to programmers and others who are seeking to do a bit more hands-on research. Removing those sections would greatly reduce the value that this resource provides. Johnmaguire 19:04, 24 February 2006 (UTC)

The thing is, Wikipedia isn't a repository for any data that might be useful to someone; it's first and foremost an encyclopedia. An infinitely better place to list and maintain web links is the Open Directory Project; it's simply outside our scope. — Matt Crypto 19:13, 24 February 2006 (UTC)
I suspect that the 370 number comes from NIST's list of validated implementation at the time (Up to 450 now... although the removed implimentations list leaned to unvalidated versions for some reason. There are at least two open source implementations in NIST's list, so I'd expect that if you were going to list one or two they would be good candidates. (Disclaimer, I'm the author of one.) (My bias towards validated ones are because a number of flat out missimplementations exists on the net.) My personal opinion is that the article shouldn't have an implementations section, it should just link to NIST's list, since they have a legal mandate to keep it up to date... any list in the article is going to be rapidly out of date.Nahaj 02:24, 5 April 2006 (UTC)

I agree if we talk about very popular pages like SHA and MD5, because those algorihms are well known to the public already and too many links (both free and nonfree, both trivial and complex) have been posted in the past to those topics. Furthermore it is easy to find SHA and MD5 software by a seach engine easily today rather than using Wikipedia. However, for not so famous pages, a link where Wikipedia users can find source and/or software, make absolutely sense in my opinion. BTW, I absolutely disagree with your argument called "we don't need to link to because they are bundled with Perl, Python, Java", because actually in most cases those software is open source, platform independent and run also on entirely free platforms rather than just only on one particular OS. Jonelo 18:02, 29 March 2006 (UTC)

Licensing of the pseudocode

Can I derive proprietary code from the psuedo code on this page? I am concerned that the Free Document License won't allow it, but what is psuedo code for if not to be derived from? 168.215.177.66 21:37, 6 July 2006 (UTC)

Combining SHA1 / MD5

I am a cryptography novice, but wonder given the recent break of SHA1 and break of MD5 if the security of both can be improved by combining them?

For example, if the function SHA1(data+MD5(data)) were used (i.e. append the MD5 of the data to the data, and take the SHA1 of the combination), would it be a lot harder to find collisions? My (rudimentary) understanding of the collision finding process is that to break the algorithm a data stream to fill the internal state with specific data is desired - and that the complexity of finding this data stream depends on the size of the internal state. Thus by effectively adding MD5 internal state onto SHA1 this problem would become much harder.

This would be practical in the following (common) situation : you have to interface with a legacy application that does not support newer hash functions (e.g. SHA-2 family), and yet you are worried about the potential for a SHA-1 break (you don't use MD5, because breaking that is far easier according to the above quoted articles). A good example would be MySQL, which has internal functions for MD5 and SHA1, while stating in the manual "Exploits for the MD5 and SHA-1 algorithms have become known. You may wish to consider using one of the other encryption functions described in this section instead" - without providing any other cryptographic hash functions in version 5.0.

I assume (possibly naively) that the combination will be at least as strong as the outer function. Is this a reasonable assumption? Second, I intuitively think (which probably means it is wrong) that the combination will be stronger than the outer function. Any thoughts?

Shamus Husheer - s dot husheer at cantab dot net - 22:46, 30 August 2006

I've asked the gurus at sci.crypt under the thread "Probably naive question - SHA1 + MD5 combination" on 31 Aug 2006, and had the following answer:
(1) See http://ai.stanford.edu/~xb/crypto06b/index.html Boneh and Boyen's paper `On the impossiblity of efficiently combining collision resistant hash functions', which shows that if both functions are good hash functions, then combining them in this way is not useful,
(2) The only "at least as strong as" construction I [Kristian Gjøsteen] know of is "g(m) = h_1(m) || ... || h_n(m)" [appending one to the other]. If you find a collision in this hash function, you've also found a collision for all the constituent hash functions. Kristian also suggested a way to more practically break the SHA1(data+MD5(data)) function in the discussion.
Hopefully this helps anyone who has come searching for the same information.
Shamus - 06:59, 4 September 2006
Hi Shamus. I am not a cryptanalyst or so but have some knowledge. To me the method you suggest to combine two hashes seems pretty good: SHA1(data + MD5(data) ). But there is a better variant of it, see below. Another well known method is SHA1(data) XOR MD5(data). Of course, the most secure one is to concatenate the two hashes: SHA1(data) + MD5(data). However that gives a rather big hash and some systems don't use the whole result but instead just chop off enough bytes they want. Thus it is nice if all bits in the result depends on both hashes. So the two first methods here seems better for normal use. For the XOR method there is a slight problem that MD5 gives a 16 byte hash while SHA1 gives a 20 byte hash. If a program then chops of and uses the wrong end of the result then not all bits depend on both hashes. But that can be handled by reusing 4 bytes of the MD5 to XOR over the other part of the SHA1. I have seen many other suggested methods too. Here I list some of them including the three above:
hash = SHA1( data + MD5(data) )
hash = SHA1(data) XOR MD5(data)
longhash = SHA1(data) + MD5(data)
tmp = MD5(data), hash = SHA1(tmp + data + tmp)
hash = MD5(SHA1(data)) XOR SHA1(MD5(data))
hash = SHA1(MD5(data) + SHA1(data))
"+" here means concatenation.
Oh, and shamus and every one else, if you like to talk more about crypto come to the IRC chat #crypto on irc.freenode.net.
--David Göthberg 22:27, 21 September 2006 (UTC)

Great job!

I just wanted to thank everyone worked on this page because it's probably one of the bests I've found on the wiki. In particular, I needed to implement a checking routine for file integrity (non critical performance) and obviously started from CRC and Md5. I was very surprised SHA-512 to be "so easy" to implement (I'm not in cryptography at all) so I guess I'll give it a try.

As another note, maybe it would be good to note other methods to be "somewhat surpassed"... i bounced on many links before getting this one. I would do this myself, but I'm not really sure what's the best way to do it.
81.211.227.153 20:11, 21 September 2006 (UTC)

Are SHA hashes keyed?

Hi, I've got what is probably a basic question: Can you key SHA hashes? That is, using a secret key of some sort in the hash algorithm so that no one can calculate the hash in question without knowing the key used in the hashing? (That might be what the initializations are for in the pseudocode.) -- Joebeone (Talk) 16:15, 14 October 2006 (UTC)

I've poked around some more and it appears that hash functions are generally unkeyed... sorry for the clutter. -- Joebeone (Talk) 18:06, 14 October 2006 (UTC)
Don't be sorry, it is a good question. Yes, you can key any cryptographic hash function and that turns it into a message authentication code (MAC). The most popular (and also very secure) method to do that is the HMAC method. Usually when we send or store messages we do MAC them and thus make it "impossible" for an attacker to tamper with (change) our messages without us detecting it. --David Göthberg 05:32, 16 October 2006 (UTC)

ccv?

Hi,

in the pseudocode for SHA-2, someone changed "for i from 16 to 63" to "for i from 16 to 63ccv" again (I had removed "ccv"); before reverting it, does anyone think that it has a meaning? —Gennaro Prota•Talk 23:23, 26 November 2006 (UTC)

Date linking

Hi again,

would anyone object to delinking all year-only dates in the article? I don't think the links add anything relevant in this context (see Partial dates). —Gennaro Prota•Talk 06:27, 2 December 2006 (UTC)

"Applications" section

The "applications" section looks like it's starting to get out of hand – it already mentions some specific products where it's used, such as Xbox and git. If we were going to link all the products that SHA functions are used then the list would be incomprehensible. Thus, I'm removing these for now. -- intgr 04:00, 17 December 2006 (UTC)

I support the idea of not listing all the products supporting a specific feature. Being SHA a very common algo, the list would be huge. I believe this is not a problem right now and I think the actual few examples provide useful information, especially to the casual reader. I would consider adding back the notes: a few real world examples are important. The inclusion of another one should be considered on a case by case basis.
MaxDZ8 talk 10:18, 17 December 2006 (UTC)

I feel the same. The section didn't seem out of hand yet, though we should consider the long term: what about creating a list page? —Gennaro Prota•Talk 10:50, 17 December 2006 (UTC)
I agree, the list wasn't out of hand. It's true that listing every place where SHA is used would result in a huge list, but some real-world usage examples are needed. rst 14:37, 17 December 2006 (UTC)
I added a link to cryptographic hash function#Applications of hash functions, do you think that is not enough for background, for what hash functions are generally used for? There are a lot more notable uses of SHA algorithms than Xbox and git, which the article used to reflect. (While the git example was indeed useful, the Xbox example did not even mention where or how it was used)
As for a "list of" article: there used to be a category of programs that used the AES cipher, but it was deleted as it was considered useless. -- intgr 16:01, 17 December 2006 (UTC)
All good points, IMHO. I think we need further discussion. What I can notice is: a) the "Applications of hash functions" section gives (as you say, too) a general overview of hash functions applications. The section in this article instead was specifically about "notable usages" of SHA. So they had different intents. In practice a mention like the Xbox one is pretty much useless but a mention of Git like the one we had may well "open a whole new world" to the reader (that's something that has happened often to me: I reached some article only because of a -strictly non necessary- link from some other page and become passioned about the new subject... it's one of the strength of Wikipedias over more traditional encyclopedias) b) linking to a specific section of another article (or even of the same article) is a bit dangerous in that there's no effective way to detect if the link breaks (if the target article's section is renamed, for instance). —Gennaro Prota•Talk 16:32, 17 December 2006 (UTC)
The statment about how it is used by Linus in git is totally out of place and adds nothing to the section IMHO.

Pronunciation

For all these various tech acryonyms, providing a clue as to the common pronounciation would be helpful. Eg, for SHA-1, it's /shah-one/ or /shaw-one/ or the like. Perhaps providing the IPA or whatever the standard is for pronunciation. (I'll defer this to those more knowledgeable of wikipedia standards on the issue.) - Michael (talk|contrib) 02:08, 19 January 2007 (UTC)

Good question. How do you guys really pronounce SHA-1 etc in English? I know how we pronounce it in Swedish but not really what the most common way is in English. (In Swedish we spell it out so translated to English that would mean this pronunciation: Ess Eich Ei One. But it is much harder to say that in English.) --David Göthberg 12:40, 28 January 2007 (UTC)

The only person I've heard pronounce it like shaw-1 was from Canada. You don't want to get me started on that... The best, most logical in my opinion comes from Paolo Perrotta who has Git training on Pluralsight. He says SHA-1 as if it were SHAONE, or like the word 'shown'. Makes so much more sense than adding a 'w' out of nowhere.


New SHA-1 vulnerabilities

Professor Xiaoyun Wang, the same person who found the SHA-1 vulnerability in August 2005, has recently published a paper [2] that appears to make it feasible to find SHA-1 collisions in practice. I don't have the technical understanding to make sense of the paper myself, but perhaps someone else here can read it and update the article appropriately. 71.224.239.10 23:50, 20 January 2007 (UTC)

There is an (English) article about the subject. --JVersteeg 10:43, 21 January 2007 (UTC)
This article is useless and misleading, it doesn't mention a single relevant detail about the attack itself. -- intgr 15:23, 21 January 2007 (UTC)
The paper that you refer to, does not even talk about SHA-1. Rather it shows attaks against MD5, RIPEMD and SHA-0. How is this supposed to show a new SHA-1 vulnerability? The article in epochtimes is so badly written that it is not possible to determine whether they are talking about a new attack or one that has already been published. If there are no other reports of a new attack, then it is probably safe to assume that epochtime is just recycling an old story. 85.2.45.127 21:59, 27 January 2007 (UTC)
I've updated the page with my paper explaining the details of Wang's 2^63 attack on SHA-1. After over 2 years of lack of details about the methods, I decided to figure it out myself from her slides and type it up. Martin.cochran (talk) 23:49, 6 February 2008 (UTC)

VB implementation

I have trawled through John Taylor's VB implementation quite a bit, and I think it's unnecessarily complicated. I'm currently adapting it for my own purposes, and it quickly gets shorter, more efficient, and more readable as well. Shinobu 07:24, 19 April 2007 (UTC)

I'd agree that this code is too complicated and not efficient. Especially, the rotates implemented using loops must be killing performance. Even tutorial code should take advantage of existing language features and not neglect them. Real code is very often more educational than tutorial code, because the authors have spend some time optimizing the code. 85.2.38.56 10:00, 19 April 2007 (UTC)

That's not all :-) I've got it working, and I changed a lot. Of course I added my own bit of iffyness: since only a few rotates are used, I used three or so special functions like U32RotateLeft30 etc. The generic rotate needs a 2^n function, and although I did write a reasonably fast implementation for that, I decided in the end that it's a waste if not really used. The addition bit was shortened and I changed datatypes and such around. Partly because it's conventional to use Byte arrays for data, not Strings, and partly for efficiency and versatility (didn't tie myself to strings containing hex numbers). Perhaps I should post it somewhere. Thing is I'm half afraid I changed so much, that it might go wonky in some obscure situation. Anyway, it works for me, and in any case, people shouldn't trust random code grabbed from the web - and I would like the feedback. I'll think about it. Shinobu 12:03, 19 April 2007 (UTC)

I'm not sure if I understood you. A rotate left by n bits of a Uint32 variable x is usually implemented as (x<<n)|(x>>(32-n)). Decent compilers recognize this construction, optimize it and generate just one assembly instruction: a rotate. I'd expect to learn such implementation details from a good tutorial code. 85.2.114.9 13:05, 19 April 2007 (UTC)

You probably understood me... but what you perhaps don't know is that VB doesn't have native rotates and shifts, or unsigned operations. I'm not sure, but it probably was a design decision, like in Java. VB numbers are all signed (except bytes, which are unsigned, unlike in Java). It's a bit of a dilemma, really. Leave them out and you get people like us complaining and working around it, but leave them in and you get the rest of the world doing utterly wrong things with it. *sigh* Shinobu 22:32, 19 April 2007 (UTC)

Are you sure? I did check the online documentation before making my comments. VB does seem to support both unsigned int types and shifts since Visual Studio 2005 .net 2.0. I can't say how I'd implement these functions without unsigned support, because the VB documentation lacks details. But as you say, the sample code by Taylor can be improved in a number of places. Assuming, there are many people that use VB versions without unsigned support, it might indeed be interesting to see what the most efficient work arounds look like. 85.2.127.155 09:29, 20 April 2007 (UTC)

I'm pretty sure unsigneds were added in .NET, which is incompatible with old-style VB in a number of ways. For various reasons people sometimes stick with older versions of software. Reasons like compatibility, size, memory usage, etc. As for "the best"... how do you ascertain what the best is? I mean

Function U32Add(ByVal A As Long, ByVal B As Long) As Long
If (A Xor B) < 0 Then
    U32Add = A + B
Else
    U32Add = (A Xor &H80000000) + B Xor &H80000000
End If
End Function

looks okay, and certainly a lot better than

Public Function i32add(operanda As Long, operandb As Long) As Long
'//////////////////////////////////////////////////////////////////////////////////////
'does 32 bit add of two 32 bit numbers as if they were unsigned int32's
'this has to be done this way because of quirk in VB where an add overflow
'into the sign bit is kicked out as an error.  This would not be a problem if
'an unsigned int32 were allowed in VB!
'
'result=operanda + operandb
'
'use this function for (a+b)
'
'Not, And, Or, Xor all work the same for signed and unsigned int32 because there are
'no carries or borrows for VB to deal with
'//////////////////////////////////////////////////////////////////////////////////////
Dim operand_ax As Long
Dim operand_bx As Long
Dim upper_a As Integer
Dim upper_b As Integer
Dim result As Long
Dim topbits As Integer
'//////////////////////////////////////////////////////////////////////////////////////
'trim off offending bits
operand_ax = operanda And &H3FFFFFFF
operand_bx = operandb And &H3FFFFFFF
upper_a = ((operanda And &HC0000000) / &H40000000) And 3
upper_b = ((operandb And &HC0000000) / &H40000000) And 3

'do math on lower order bits
result = operand_ax + operand_bx

'do math on upper order bits
topbits = upper_a + upper_b

'if there was an overflow into upper 2 bits, increment the accumulator
If result And &H40000000 Then
  topbits = topbits + 1
End If
  
'get rid of an overflow into upper 2 bits in lieu of separate math below
result = result And &H3FFFFFFF

'now adjust the upper bits for the side calculation results
If topbits And 1 Then
  result = result Or &H40000000
End If
If topbits And 2 Then
  result = result Or &H80000000
End If

i32add = result

End Function

but how can you tell it's best? Shinobu 03:34, 21 April 2007 (UTC)

At some point it becomes necessary to know the constructs that can be compiled into efficient code by the compiler. E.g., your proposal, which btw is a really nice trick, could also be written as
Function U32Add(ByVal A As Long, ByVal B As Long) As Long
Dim HiBit as Long
HiBit = (A Xor B Xor &H8000000) And &H80000000
U32Add = ((A Xor HiBit) + B) Xor HiBit
End Function
Depending on how good the compiler is this should take 1-2 operations more on average, but it avoids an unpredictable jump and hence might be faster in some cases. However, I don't think all code has to be optimized completely. I'm still looking for some guidelines on what kind of code should be linked to in wikipedia and what should better be avoided. I'd think that if a reader generally could find improvements by comparing his code with some sample code then this sample code would be worth to be listed here. Otherwise there is no point of having a link. Taylor's code doesn't contain any non-obvious tricks, your code does. Hence your code would seem to be better qualified to be listed here. 85.2.69.234 18:36, 22 April 2007 (UTC)

I'd put it in my user space, but there might be a policy against linking from article space to user space. I'll ask around about what to do. Shinobu 10:18, 23 April 2007 (UTC)

Okay, please have a look here: http://vb.wikia.com/wiki/SHA-1.bas

Apparently they don't mind it being there. Before you add a link here, please read it through a few times. It works for me, but I'd rather be sure it works for others as well before the link is added here. Try a few test vectors too. You know what it's like... a little cut 'n paste error can cause horrible bugs. And sorry for the lack of comments. I don't like excessive commenting and most of the code speaks for itself. Shinobu 14:31, 9 May 2007 (UTC)

Your code looks definitively better to me than Taylor's code. I'd support to replace Taylor's code with this implementation. 84.73.231.90 21:26, 23 May 2007 (UTC)

SHA-1 Expansion

The crypto analysis is not exactly a cogent explanation of how the algorithm works. This article says nothing about the actual working of the SHA-1. I think this needs to be included. -Weedrat 09:40, 21 May 2007 (UTC)

Did you notice the pseudocode close to the end of the article? It looks good enough to me to actually implement SHA-1. 84.73.231.90 10:25, 21 May 2007 (UTC)

Basic language question

The introduction describes SHA-1 as "a condensed digital representation...that is, to a high degree of probability, unique for a given input data sequence." Shouldn't it say "unique *to* a given input..." (i.e. there is likely only a single input that goes with that output) rather than "unique for" (i.e. there likely is only a single output for a given input)? The current use of "for" says simply that there's just one output for each input, which is true of many mathematical functions... MacScoop 18:27, 23 May 2007 (UTC)

Agreed, sounds better to me, though I am not a native English speaker. -- intgr #%@! 21:11, 23 May 2007 (UTC)
Sounds incorrect to me. Since there are many more possible inputs than outputs it is very likely that for every possible output there exists many inputs mapping to that output. The whole first sentence is incomprehensible. Hash functions are defined as functions for which it is computationally infeasible to invert them and to find collisions (as described later). 84.73.231.90 21:34, 23 May 2007 (UTC)

I copy-edited the intro to make things clearer, I hope.--agr 14:06, 30 July 2007 (UTC)

FIPS 180-3 Draft

The draft version of FIPS 180-3 is available at http://csrc.nist.gov/publications/drafts/fips_180-3/draft_fips-180-3_June-08-2007.pdf. he comment period closes on September 10, 2007. Reid Hochstedler 19:11, 2 July 2007 (UTC)

Question

How do commitment schemes work? They seem handy, but complicated. How do I get a committed identity? ionas68224|talk|contribs|email 11:03, 21 July 2007 (UTC)

  • Alice takes a random input x, and computes h(x) where h() represents a cryptographic hash function. Alice then asks Bob to guess the first bit of x. Because h(x) is computationally too hard to invert, Bob has only 50% chance of guessing this bit correctly. After Bob gives Alice his guess, Alice reveals x. If Bob guessed the bit right, he wins this "coin flip", otherwise he loses. Note that if Alice can find a collision in h, Alice can always win by giving Bob messages x or x' depending on his guess. If the cryptographic hash function compresses the input and represents a random mapping, Bob cannot gain any information on the first bit of x due to information theory.

—Preceding unsigned comment added by 193.190.253.144 (talkcontribs)

      • Maybe you're thinking of zero-knowledge proofs instead of bit commitment schemes? The latter are for commitment to values, but zero-knowledge proofs can be used to prove your identity. Or are you thinking about the use of hash functions in digital signatures? Your question is unclear to me, but I think I've given you enough links now to find the answer.

I think the questioner is referring to the technique described at Template:User committed identity.--agr 11:41, 2 August 2007 (UTC)

Hi. There are probably different links for SHA-1, SHA-2, SHA-512, etc. The problem is, what exactly are those links? The Template:User committed identity gives the link to SHA-1, and the article gives the links to a few, but where are the links to all the SHA websites, say for example where is the link for calculating SHA-512? Can someone list me the links of those websites, and where you got it from? I don't want links to sites other than those used to directly calculate a string into a hash code, so I don't want any sites with info about the SHA websites, I want the actual links themselves. I will also ask this question on the refernce desk. Thanks. ~AH1(TCU) 20:41, 4 October 2007 (UTC)

TOC on the right?

Is there a specific reason why the TOC is on the right?

It could be considered a minor issue, though when I opened the article, for a second I was like "errrrm... Where's the TOC?" (spatial memory anyone?), then I saw it was on the right, and it just seemed odd.

It's also not underneath the intro, but rather beside it. It seems inconsistent with about every other article I ever saw in WP (and I've been here for years) --W2bh 20:40, 15 November 2007 (UTC)

You're right—It should be on the left: default. The manual of style suggests right TOC placement for articles where a photo appears at the left (suggested for biographical articles with a right facing headshot). But there is no such consideration in this article. —EncMstr 20:47, 15 November 2007 (UTC)
I agree. --Morten LJ 09:31, 16 November 2007 (UTC)

This might be it...

The sentence "Gilbert and Handschuh (2003) have studied the newer variants and found no weaknesses" has a fact tag. It looks like it's referring to a paper, I did a quick search, and I think this might be it:

Henri Gilbert, Helena Handschuh: Security Analysis of SHA-256 and Sisters. Selected Areas in Cryptography 2003: 175-193.

Anyway, I did a quick google search and didn't find any other 2003 papers by them. Can someone with more knowledge on the topic who maybe is able to access the paper take a look and see? It'd be nice to get rid of that fact tag. Thanks, 4.21.209.231 (talk) 04:19, 1 January 2008 (UTC)

Merger proposal

I suggest that sha1sum be merged into this article, under an "implementations" section, or similar.
--Jerome Potts (talk) 19:01, 22 March 2008 (UTC)

I'm not opposed to this in theory, but the content should be seriously revised upon the merge. most of the content is meaningless uninformative or redundant, especially in the context of a subsection of this article. The revised content should also make an effort to establish that this is one of the most common tools used for calculating sha1 hashes at the execution level (a presumption on my part). Similar merges could be possibly proposed for md5sum and chksum. Those are my thoughts anyway. -Verdatum (talk) 20:37, 23 April 2008 (UTC)
I don't think a merger is appropriate. This article discusses a Unix utility for calculating SHA1 check sums. There are similar utilities for other operating systems and many other programs that calculate SHA hashes. The main article on SHA is not a suitable place for all this information.--agr (talk) 17:25, 20 August 2008 (UTC)

Operations

Is "rotfl" a binary operation? I'd suggest to revise the alg/operation table :( —Preceding unsigned comment added by 81.182.160.76 (talk) 07:08, 7 April 2008 (UTC)

Unclear/incorrect verbage

I'm only vaguely familiar with the subject matter so I hesitate to edit, but I think the first sentence under Cryptanalysis, "For a hash function which violates the first criterion listed above..." should have read, "For a hash function that does NOT violate the first criterion above..."

Here is where I'm on shakier ground, but does that statement not apply to all hashes, whether they violate the first criterion or not? I think 2^L is an upper bound, isn't it? A hash function that violates the first criterion (do they have names? if so we should use them instead of "first criterion" etc...) allows you to find a message for a given hash value in less than 2^L.

If that's correct, then the first sentence should read, "For a hash function which meets the first criterion listed above, finding a message that corresponds to a given message digest requires a brute force search in 2^L..."

Eshafto (talk) 15:34, 23 May 2008 (UTC)

Naming redundancy

The current name for this article is "SHA hash functions", which means "Secure Hash Algorithm hash functions". The name is obviously redundant, and should be renamed something more appropriate, such as "SHA functions" or "Secure Hash Algorithm functions" (although I prefer the former simply because it is shorter, and the acronym is more common). miro modo 00:42, 4 August 2008 (UTC)

In a cryptography book this title would indeed be a little odd. But here on wikipedia the title has some disambiguation purposes. It should be clear from the the title that the article is about the hash function SHA and not the airport SHA or anything else that uses the same acronym. Hence the redundancy in the title may actually be necessary. 85.2.17.222 (talk) 05:28, 4 August 2008 (UTC)
Yeah, but I don't see a reason not to use the title Secure Hash Algorithm functions, or Secure Hash Algorithm family or just Secure Hash Algorithm. The full name is preferable to an acronym per Wikipedia:NAME#Prefer_spelled-out_phrases_to_abbreviations.--agr (talk) 11:17, 4 August 2008 (UTC)
Naming conventions do say to use the spelled-out name "unless the term you are naming is almost exclusively known only by its abbreviation and is widely known and used in that form", so I retract my former statement and concede that the full title is preferable. "Secure Hash Algorithm" (or the other suggestions of ArnoldReinhold all seem fine). By not using the acronym, we should be able to avoid confusion or need for disambiguation. miro modo 13:10, 4 August 2008 (UTC) —Preceding unsigned comment added by Miro modo (talkcontribs)
But I claim that the SHA hash functions are almost exclusively known only by the abbreviation. Seriously, how often have you ever seen Secure Hash Algorithm written out? Ntsimp (talk) 14:36, 4 August 2008 (UTC)
A Google search on "Secure Hash Algorithm" (quoted string) gives 110,000 hits. I've advertised this discussion on the WikiProject Cryptography discussion page to get some more voices.--agr (talk) 15:07, 4 August 2008 (UTC)
I agree with Ntsimp; WP:COMMONNAME suggests that "Use the most common name of a person or thing that does not conflict with the names of other people or things"; as far as I can tell, the only thing that "SHA" conflicts with is the Shanghai airport IATA code, which is adequately served by the note at the beginning of the article. And there is no question about it that the abbreviation "SHA" is used nearly exclusively.
However, I think the best "common name" for the article would actually be SHA-1 and SHA-2; given that most of the content is specific to one of these algorithms, does anyone think that maybe a split would be worthwhile? I think that the unified article may be more confusing. Compare this with MD4/MD5, or RC4/RC5 for example. -- intgr [talk] 15:15, 4 August 2008 (UTC)
The policy also says "Avoid the use of abbreviations, including acronyms, in page naming unless the term you are naming is almost exclusively known only by its abbreviation and is widely known and used in that form." I would agree with you if the article was about a single algorithm, such as SHA-1. But it isn't and the full name is widley used when refering to the family as a whole. Even if we had articles on the individual algorithms, we would still want an article on the family, since much of the history, etc. is common.--agr (talk) 08:41, 5 August 2008 (UTC)
Well, I did the Google test: "Secure Hash Algorithm" gets 107,000 hits. While 'SHA hash' gets 4,120,000 hits. (I added 'hash' to avoid the hits for other things that is named SHA.) So using the short form "SHA" is about 40 times more common than using the full name "Secure Hash Algorithm". Also, now that we know that some of the SHA hashes are not secure anymore the full name is kind of a misnomer. To me the name is "SHA", and to make the article name clear we need to add the "hash functions" part, so I find the current article name "SHA hash functions" is the best. (And I find it better to use the plural "functions" instead of the confusing word "family".) And we already have redirects from a bunch of other names and links from other hash related articles so should be no problem for readers to find the article. And take a look at the interwiki links, almost all the other language Wikipedias use the short form "SHA".
--David Göthberg (talk) 13:12, 5 August 2008 (UTC)
When I did a Google search on"SHA hash" with the two words quoted to look for occurrences of the phrase, I got 29,100 hits. The WP:NAME preference for the spelled out title does not depend on frequency. But this is not the most pressing issue of the day and if there isn't more support for the change, I'm willing to drop it.--agr (talk) 19:04, 5 August 2008 (UTC)
I should have explained better. You should search for "SHA hash" without the quotes, since then you get pages with the word "SHA" and the word "hash" but not necessarily right next to each other. Thus you avoid most hits for other SHA things that are not SHA hashes, but still get pretty much all SHA hash related pages. If you search like that then you get 4,120,000 hits.
--David Göthberg (talk) 21:24, 5 August 2008 (UTC)

Comparison of SHA functions table

The "Collision" column is not clear at all; what is this supposed to represent? Surely not whether the hashing function has collisions or not, since that is a given from /being/ a hashing function. Some more commentary or a better header would be good. W (talk) 10:35, 2 September 2008 (UTC)

It means whether an attack or an actual collision sample has ever been published. I renamed the column header to "Collisions found", is this any better? -- intgr [talk] 16:57, 3 September 2008 (UTC)

Question about reversibility

I'm probably missing something here, but I was wondering, what makes the SHA algorithm difficult to reverse? In the "Cryptographic hash functions" article, it says that good cryptographic hash functions are computationally difficult to reverse. I.e., given an output of a hash function, it would take a tremendous of computing power to find an input that corresponds to that output. But it seems to me that the only way to create such a hash function would be to use a task for which no polynomial algorithm is known. For example, consider this hash function:

  • Take the input and map it to two large prime numbers
  • Multiply the two prime numbers together; this product of the two primes is the output of the hash function.

To me, the above algorithm actually does seem difficult to reverse, because reversing it would involve factoring the integer, and there is no known polynomial algorithm for integer factorization. But the SHA algorithms contain no multiplication of large primes, or NP-complete problems, or anything like that. They only contain AND, XOR, OR, and leftrotate operations. It seems like all four of those operations would be easy to reverse. If one can reverse every step of the algorithm, then one can reverse the whole thing. Is that right? Again, I feel like I must be missing something, because SHA is widely believed to be secure. Navigatr85 17:34, 8 September 2008 (UTC)

Confusion and diffusion. While the article talks about block ciphers, all symmetric cryptography primitives share this in their design. The basic idea is to just interleave a large amount of linear operations (addition, XOR, bitwise rotation etc) and non-linear substitutions (S-boxes) — both very fast on conventional CPUs — so that no useful reverse function can be derived.
Problems like integer factorization are commonly used in asymmetric cryptography, but they are very slow — so oftentimes, instead of digitally signing a long message, instead the application calculates a hash of the longer message, and then signs signs the resulting hash. The same is done with encryption.
Another quality of symmetric cryptography is that its security increases exponentially with the size; for a block cipher with a 256-bit key it will take 2256 operations to search the whole key space — if a faster attack is found, the cipher is denounced broken. In comparison, the complexity function of most asymmetric attacks (like GNFS) is sub-exponential and very elaborate. (Yes, SHA-1 is considered broken now that it has a 263 collision attack — much less than its natural birthday bound, 280). -- intgr [talk] 07:38, 9 September 2008 (UTC)
Navigatr85: I guess it is time for me to bring out my trusty old beginners example. XOR is usually considered a very reversible operation. But consider this case:
a xor b = c
If a and b is part of the secret internal state and c is the output that the attacker gets to see, then the attacker can not figure out any of a or b based on just knowing c.
Some find it easier to understand if we use addition and actual numbers. Say we use numbers between 0 and 9. Then consider this case:
5 + 3 = 8
If 5 and 3 is part of the secret internal state and 8 is the output that the attacker gets to see, then the attacker can not figure out any of 5 or 3 based on just knowing 8. Since 0 + 8 = 8 and 1 + 7 = 8 and 2 + 6 = 8 and so on.
--David Göthberg (talk) 08:14, 9 September 2008 (UTC)
Thanks David, that helps a little bit, but I'm still confused. Isn't a hash considered to be insecure when one is able to find ANY input that hashes to a given output? Using your example, "53" is the secret internal state, and the hash function is to add the digits. So I, the attacker, see the output "8". Then, knowing the hash function, I can easily determine, without searching through the whole keyspace, that the input "17" will have the hash output value 8. So, assume the secret internal state is a password. Then I just use "17" as my password, and then the system hashes it. The system will think it's a valid password because it hashes to the correct value, even though the real password is "53".
The example above would be like finding a "collision," if I understand correctly. And the ability to find collisions makes a hash function insecure. More formally, a collision is two different messages with the same digest. Using your addition example, we can take the output "8" and easily reverse it in two different ways to get two different inputs with the same output, like 1 + 7 = 8 and 2 + 6 = 8. It seems to me like one could do the same thing for any algorithm that is simply a combination of addition, AND, XOR, etc. Navigatr85 22:30, 13 September 2008 (UTC)
That is indeed a collision. With such a trivial hash algorithm, it's easy to exhaustively guess passwords and find a collision rather quickly. That's why the number of bits in the hash is the leading indicator of the strength of the hash. If 16 bit values are used in David's example, there are 65,536 combinations of a and b giving a particular hash value (assuming overflow is ignored). The purpose of addition is for that kind of irreversibility. Imagine a multi-stage obfuscation function which adds some bits, rotates others into play and adds different bits. After not many stages, it's very difficult to analyze the permutations, let alone decode any of them. —EncMstr (talk) 22:58, 13 September 2008 (UTC)
I wasn't talking about exhaustively guessing passwords, I was talking about using another algorithm to reverse the hash function. It still seems to me like any algorithm that uses simple functions would be relatively easy to reverse. For example, you said to "add some bits, rotate others into play, and add." Here's an example of an algorithm that does that:
1. Add zeroes to the end of the input until the number of bits is 48. Assign this to the variable x.
2. Split x into three 16-bit chunks, x[0], x[1], x[2]
3. b = x[0] leftrotate 3
4. y = b + x[1]
5. h = y XOR x[2]
6. h is the output value.
Now here's an algorithm that reverses the above algorithm and finds a collision:
1. x[2] = 0x0000
2. x[1] = 0x0000
3. b = h
4. x[0] = b rightrotate 3
5. f = x[0] append x[1] append x[2]
I guess one could say that the above algorithm is also trivial. So, I guess what I'm asking is, what makes the SHA algorithm nontrivial? The operations AND, XOR, left rotation, addition, etc. are all easy to reverse. And SHA is just made up of many reversible steps, so why isn't the whole thing reversible? Could someone give an example of a short hash function that's not reversible like the one above? Navigatr85 00:01, 21 September 2008 (UTC)
No one has responded after a week, whereas my other posts got responses within one day. Does anyone have any thoughts? Navigatr85 19:22, 29 September 2008 (UTC) —Preceding unsigned comment added by Navigatr85 (talkcontribs)
Consider the difficulty of reversing the function a + b: The result is 197383812. What is a, where b is a secret internal value? This is the essence of the irreversibility of functions like SHA-1 and SHA-2. All that is known is a and b have the same number of bits. —EncMstr (talk) 22:20, 29 September 2008 (UTC)
OK, you're saying the function is a + b, and the input consists of both a and b. In that case, you're right, we can't get either a or b. But we can easily find a collision. To find a collision, we simply set a to be any value we want and then subtract a from the result. We won't find your secret internal value, but we'll find another input that hashes to the given output. For your example of 197383812, I could set a=111111111, and then b = 197383812 - 111111111 = 86272701. Navigatr85 16:38, 3 October 2008 (UTC) —Preceding unsigned comment added by Navigatr85 (talkcontribs)
Navigatr85: Okay, brace yourself, you made me bring out the Davies-Meyer one-way compression function. Let's look at a slightly simplified version of it, a "one-way" function but without the "compression" part. Here goes:
Assume you have a decent mixer function. It takes a 128 bit input, mixes it with good avalanche effect so the output looks random and all output bits depend on all input bits, and the output is also 128 bits. Like this:
output = mix( input )
Say this mixer function is fully reversible. It can for instance be a block cipher with a fixed key that everyone knows about. So you have an unmix function that everyone knows about, like this:
input = unmix( output )
Now to make a one-way function of this mixer function do like this:
output = mix( input ) XOR input
Or you can do like this:
output = mix( input ) + input
Where "+" here means addition with modular arithmetic. That is, values that become larger than 128 bits are wrapped around and start over from 0.
Now that is much harder to reverse. Since if you know the output you can of course split it into two values, a and b, that when added becomes the output. But one of the two values must be unmixed, and then it has to become equal to the other value.
output = a + b
unmix( a ) == b ?
But finding two values that equals each other after one of them is unmixed is hard. If the mix function is complex enough then the easiest way to find a working value is to try all 2128 possible inputs. This makes it hard to find out what the input was. And even if you know the input it is hard to find another input that gives the same output. That is, it is hard to find collisions.
After you have pondered the above for a while I suggest you take a look at the one-way compression function article. Worth noting is that SHA hash functions internally do use the "Davies-Meyer one-way compression function". (I just double checked, it is there in the pseudo code, just not so easy to see.)
--David Göthberg (talk) 20:17, 3 October 2008 (UTC)
Thanks, David, that clears things up. After reading your post, I invented a short function, to help convince myself. I considered the following function, with a four-bit input:
output = input + (0101 XOR input)
There really does seem to be no algorithm to reverse that, without searching through all 16 possible inputs. Incidentally, one possibility you suggested was using a block cipher for the "mix" function, and also doing output = mix(input) XOR input. If I understand correctly, a block cipher involves XORing the input with some known key. So for that example, you would have:
output = mix(input) XOR input = (key XOR input) XOR input
That actually wouldn't work, because the output would always be the key, regardless of the input. So I guess the combination of different operations, like XOR combined with addition, is what makes these algorithms hard to reverse.
The reason this was bugging me was that reversing the SHA function doesn't involve solving an NP-complete problem or anything like that. Since the NP-complete problems are the ones most likely to be outside of P, it would seem that a hash function based on an NP-complete problem would be the most secure. It seems to me that the reversal of the SHA algorithm is in NP, and believed to be outside of P, but not proven to be outside of P, and also not proven to be NP-complete. Is that right? Navigatr85 18:29, 21 October 2008 (UTC) —Preceding unsigned comment added by Navigatr85 (talkcontribs)
Navigatr85: You really should learn to sign your comments. If you look above the edit window there is a special button for it if you have javascript enabled in your browser. Otherwise just add four tildes: ~~~~, which gets translated to your username and time and date when you save your edit.
Nice four bit example. And right, the mixer function must be different from the extra adding or xoring in of the input, otherwise you just undo the job of the mixer function.
And regarding block ciphers: No, they do much more than just xoring in the key. They do a lot of mixing. Usually every output bit depends on every input bit. (Read about that in avalanche effect.) You can build the mixer function from a block cipher like this:
output = mix( input ) = encrypt( key, input)
input = unmix( input ) = decrypt( key, output)
So the full one-way function can look like this:
output = mix( input ) + input = encrypt( key, input) + input
And as I said, the key can be well known by everyone. The key can even be standardised and published. The key isn't really needed, just that the block cipher expects a key. For most decent block ciphers you can even simply just feed "0000" as the key in this case. (Although some block ciphers do bad mixing if they get such a key, so usually a more random looking key is used.)
And regarding the NP an N: I don't know much about that, since I am not a mathematician. But I think I know this: Sure, building cryptographic primitives by using the (believed to be) NP-complete problems probably gives an extra security guarantee. Although as you say the algorithms used in the cryptographic building blocks of today perhaps are just as strong, only that we don't know that for sure. But the problem is that as far as I understand the known NP-complete problems are very slow to handle in computers and usually need huge keys. That's why the NP-complete problems normally are not used. (Well, they are used for public-key cryptography, which indeed is very slow and needs huge keys...) But don't quote me on anything NP and N related, since I really don't know much about that.
--David Göthberg (talk) 13:17, 23 October 2008 (UTC)

Bit-rotted text

Under "Cryptanalysis and Validation," the phrase "the first criterion listed above" refers to some text which is no longer present in the article (see the intro paragraphs in this revision: http://en.wikipedia.org/w/index.php?title=SHA_hash_functions&oldid=213893308). The paragraph should be rewritten in light of this missing antecedent. 131.179.192.130 (talk) 20:12, 14 October 2008 (UTC)

Your opinion: is this useless fancruft?

I see this article was tagged as "fancruft" briefly -- as containing details almost nobody cares about. Interesting -- the most detailed part I can see is the pseudocode, which matches what we have for other important algorithms (e.g., RC4).

The history, analysis info, and so forth seem relevant to many real-world engineers using crypto, and isn't collected in one place under a free-content license. It's also useful as context for anyone trying to understand the current NIST hash competition, the 2008 MD5 crack, and future changes in the SSL protocol. We could perhaps improve the article by removing redundancy, but most of what's here is useful to real people. It's a little analogous to the math articles describing obscure (to non-mathematicians) functions, theorems, and equations: not for everyone but clearly valuable to a real audience.

And the most boring sentence probably provides as much value to humanity as the Lost episode descriptions that seemingly *aren't* fancruft. :)

(Incidentally, I'd be for deleting articles about some ciphers that were published and then forgotten. "Published" doesn't mean "notable.")

I'm certainly a crypto fanboy, though. Any other perspectives? Particular sections people think don't help? 67.119.195.43 (talk) 22:10, 2 February 2009 (UTC)

sha-1 collision down to 2^52 now claims Macquarie University and Qualcomm, Australia

see paper at: http://eurocrypt2009rump.cr.yp.to/837a0a8086fa6ca714249409ddfae43d.pdf —Preceding unsigned comment added by 87.145.199.71 (talk) 19:15, 29 April 2009 (UTC)

This paper has since been withdrawn because the authors state that their 252 estimation was wrong. -- intgr [talk] 17:47, 31 March 2010 (UTC)

Critique removal

I removed the following unsourced "critique": "It is well known that if a hash of bit length m is produced from messages of bit length n, then on average $2^(n-m)$ messages will share the same hash key, and consequently for a small fixed value of m, say, 1024, and messages with length of more than 1024/8=128 characters, collisions are going to be extremely likely." It is certainly true that the set of all 1024 bit messages, if SHA1 hashed, would produce large numbers of collisions. All cryptographic hashes promise is that finding two different messages with the same hash is very difficult. Also note that if every human on Earth sends one message a second, that amounts to about 2^58 messages a year, not enough to raise the probability of a single 128-bit hash collision above 50%.--agr (talk) 23:15, 27 October 2009 (UTC)

Pseudocode formatting

In the 11:23, 2 October 2009 update, the pseudo code was reformatted in such a way that the loops are no longer clear when they occur. The previous version used indentation to show when the loops occur, now it's left up to the reader to determine. I would like to change it back, but want to see if others agree that it is a problem. Xylaan (talk) 19:12, 6 November 2009 (UTC)

New competition dates

A new hash standard, SHA-3, is currently under development – the function will be selected via an open competition running between fall 2008 and 2012.

"fall", is that autumn? Of which hemisphere? Southern or Northern? Is it during the months of "fall 2008", and then again during the months of "fall 2012"? Jesselong (talk) 16:47, 12 January 2010 (UTC)

Good catch. Too many people forget about the whole round-planet thing! Also distracting is the use of future tense for something that started in the past. CosineKitty (talk) 20:48, 12 January 2010 (UTC)
I just made a revision based on your comments. What do you think about it? CosineKitty (talk) 21:07, 12 January 2010 (UTC)
Awesome, thanks for that, looks great! —Preceding unsigned comment added by Jesselong (talkcontribs) 11:10, 14 January 2010 (UTC)

SHA-1 and SHA-2 split

I find that the article is needlessly confusing because most of the information actually only applies to SHA-1. I also think that SHA-0 shouldn't be listed as a hash function in its own right, but rather just a preliminary version of SHA-1, because it was never actually employed in any public standards (as far as I know).

Reasons for splitting the article:

  • The two algorithms aren't really very similar.
  • SHA-2 is currently considered "the" replacement for the insecure SHA-1, so it helps to distinguish the broken function from the secure one.
  • We can now add hash infoboxes to the article without needing to cover both functions.
  • SHA-1 article can be listed in Category: Broken hash functions without implying that SHA-2 is broken.
  • Is consistent with other crypto articles where versions have separate articles, like MD4 and MD5 (the two are quite similar, moreso than SHA-1 and SHA-2), or TEA and XTEA, or Blowfish and Twofish.
  • SHA-3 will be chosen in a competition and will have no resemblance nor relation to SHA-1 or SHA-2, so it will need a separate article anyway.

I have posted initial drafts in my user space for review:

Currently I haven't done too many changes to the either of the forks, just created it to get a rough idea. Obviously, if the split will be carried out, the articles will diverge further (I'm willing to undertake a modest amount of work). But I want somene else's opinion first. -- intgr [talk] 18:41, 31 March 2010 (UTC)

Agree It's a worthwhile split. I would, however, mover some of the SHA-1/SHA2/SHA-3 discussion to a "Secure Hash Algorithm" pseudo-disambiguation page which explains the differences nd summarizes the history. In particular, there was originally "SHA". Very quickly, "SHA-1" was announced, with no reason given. The original was given the retronym SHA-0. May I edit your drafts? 71.41.210.146 (talk) 05:48, 1 April 2010 (UTC)
I was thinking that the SHA-1 lead section can server as such a "pseudo disambiguation". IMO the naming history isn't relevant for SHA-2 because it was published years later.
Sure, you can edit the drafts, but I'd hold off any larger edits until the new articles are in place. -- intgr [talk] 07:20, 1 April 2010 (UTC)

Well nobody has opposed so far and I'm really itching to finish this, so I'm going ahead with the split. -- intgr [talk]

I completed the split today (sorry, wasn't around for the weekend) -- intgr [talk] 17:45, 4 April 2010 (UTC)

Disambiguation text (proposed)

The Secure Hash Algorithm is one of a number of cryptographic hash functions published by the National Institute of Standards and Technology as a U.S. Federal Information Processing Standard. There are currently three generations of Secure Hash Algorithm:

  • SHA-1 is the original 160-bit hash function. Resembling the earlier MD5 algorithm, this was designed by the National Security Agency (NSA) to be part of the Digital Signature Algorithm. Originally just called "SHA", it was withdrawn shortly after publication due to an undisclosed "significant flaw" and replaced by the slightly revised version SHA-1. The original withdrawn algorithm is now known by the retronym "SHA-0".
  • SHA-2 is a family of two similar hash functions, with different block sizes, known as SHA-256 and SHA-512. They differ in the word size; SHA-256 uses 32-bit words where SHA-512 uses 64-bit words. There are also truncated versions of each standardized, known as SHA-224 and SHA-384. These were also designed by the NSA.
  • SHA-3 is a future hash function standard still in development. This is being chosen in a public review process from non-government designers. An ongoing NIST hash function competition is scheduled to end with the selection of a winning function, which will be given the name SHA-3, in 2012.

I'd title the overview article "Secure Hash Algorithm" and make "SHA hash functions" a redirect to that. 71.41.210.146 (talk) 05:29, 2 April 2010 (UTC)

Sounds good. However, previous discussions have preferred using "SHA" in the name, rather than "Secure Hash Algorithm" — because the acronym SHA is ubiquitous, but people often don't even know the expansion. See prior discussions: #Naming and #Naming redundancy.
How about creating the "disambiguation" at SHA hash functions? Or maybe even merge with Sha (disambiguation)? -- intgr [talk] 17:51, 4 April 2010 (UTC)
Er... that's what redirects are for. I strongly prefer (and am about to implement with edits) having the "real" article title being the full formal name, even if that is unfamiliar, and the familiar names can be redirects. 71.41.210.146 (talk) 17:52, 5 April 2010 (UTC)
Well that's against established Wikipedia practice, see WP:COMMONNAME. -- intgr [talk] 19:00, 5 April 2010 (UTC)
You're quite right. Still, given that SHA is out of bounds because it's too short, the copetition is actually between "SHA hash algorithms" and "Secure Hash Algorithm". I don't see how the former is better; It's only marginally shorter, and less obvious if you do know the initialism. (One question is whether the primary page should be Secure Hash Algorithm or Secure hash algorithm; it is a proper name...) 71.41.210.146 (talk) 21:21, 5 April 2010 (UTC)
True, I have changed my mind. Let's leave it there and see if anyone else notices. ;)
Most articles now link to SHA-1 and SHA-2 directly, so I think it doesn't even matter much.
Upper-cased "Secure Hash Algorithm" is correct (like Advanced Encryption Standard, Data Encryption Standard etc) -- intgr [talk] 21:34, 5 April 2010 (UTC)

Archive 1