On 6/17/07, Ira Rosen wrote: > > "Daniel Berlin" wrote on 16/06/2007: > > > On 6/16/07, Dorit Nuzman wrote: > > > > > Do you have specific examples where SLP helps performance out of loops? > > > > hash calculations. > > > > For md5, you can get a 2x performance improvement by straight-line > > vectorizing it > > sha1 is about 2-2.5x > > > > (This assumes you do good pack/unpack placement using something like > > lazy code motion) > > > > See, for example, http://arctic.org/~dean/crypto/sha1.html > > > > (The page is out of date, the technique they explain where they are > > doing straight line computation of the hash in parallel, is exactly > > what SLP would provide out of loops) > > I looked at the above page (and also at MD5 and SHA1 implementations). I > found only computations inside loops. ????? As I said, the implementations rarely use loops, and instead, unroll the entire thing. While SHA1 can be written (in part) as 0 <= t <= 15: W[t] = M[t] // the input message block 16 <= t <= 79: W[t] = ROL(W[t-3] ^ W[t-8] ^ W[t-14] ^ W[t-16], 1) For t = 0 to 79: { T = ROL(a, 5) + f_t(b,c,d) + e + K[t] + W[t] e = d d = c c = ROL(b, 30) b = a a = T } You will never see this. Instead, you will see the loop unrolled 79 times. Feel free to look at the source to sha1 implementations. It usually looks like this: sha1_calculate (block) { sha1_input (input) { For each input_block sha1_calculate (block); } > Could you please explain what exactly you refer to as SLP out of loops in > this benchmark? In the actual source code to every single high performance MD5/SHA1 implementation, there are no loops in the calculation function I have attached one for you. Here is an md5 implementation with no loops http://www.google.com/codesearch?hl=en&q=+md5.c+show:ZVngmiY1toU:AvE0QNdzNpo:BLUu-vu0yzU&sa=N&cd=2&ct=rc&cs_p=http://ftp.mozilla.org/pub/mozilla.org/mozilla/releases/mozilla1.7.6/source/mozilla-source-1.7.6.tar.bz2&cs_f=mozilla/security/nss/lib/freebl/md5.c#a0 and another http://www.google.com/codesearch?hl=en&q=+md5.c+show:N5imOwqyfOo:uFsNBMdLlb0:uDlvT_lhB9A&sa=N&cd=1&ct=rc&cs_p=http://ftp.mozilla.org/pub/mozilla.org/mozilla/releases/mozilla1.8b1/source/mozilla-source-1.8b1.tar.bz2&cs_f=mozilla/db/sqlite3/src/md5.c#a0 These are not hard to find. They do, in some other function, loop around the input (they have to), but the reality is that whether or not these are vectorized should not depend on whether they have loops and whether we inline or not. The whole purpose of SLP was to enable straight line code vectorization outside of loops. Simply because you can't find cases in SPEC2000 doesn't mean it's not useful.