|
It is currently 30 Jul 2010, 11:53
|
View unanswered posts | View active topics
 |
|
 |
|
| Author |
Message |
|
Sc00bz
|
Post subject: Speed of Rainbow Tables Posted: 13 Aug 2008, 07:30 |
Joined: 03 Dec 2007, 11:37 Posts: 725
|
|
So it's been about 3 weeks since I said I'd get real benchmarks of the fastest method for a CPU I could make for a rainbow table generator.
I tested it with MD5 loweralpha-numeric lengths 1-6 chain length of 10,000 with one 32 bit thread. Xero 2.8 GHz got 9.9 million links/sec "990 chains/sec" (CentOS 5.2). Xero 3.2 GHz got 10.8 million links/sec "1,080 chains/sec" (CentOS 5.2, this computer is running Xen with a bunch of virtual machines which is probably why it didn't get 11.3 million links/sec "9.9/2.8*3.2"). P4 2.8 GHz got 10.0 million links/sec "1,000 chains/sec" (Knoppix 5.1).
Yeah I know it's a small key space I was just testing it. Only problem is that it is set to work only for this specific character set (well any character set that is 36 characters) with password lengths 1 to 6. I cheated a little by reversing the MD5 which only works for password lengths 1 to 7 and 8 to 8 (or smaller). If you use a larger length then those you will need to do 62.5 MD5 steps/link instead of 55.5 MD5 steps/link. So the MD5 part will take 12.6% longer. Just like to share that using the prefetchT1 instruction helps a lot for speed.
As for NTLM which uses MD4 it will be faster than the MD5 version since it will only need 43.5 MD4 steps/link.
Good news, the rt calculator is almost done.
_________________ http://www.tobtu.com/
|
|
|
|
 |
|
neinbrucke
|
Post subject: Re: Speed of Rainbow Tables Posted: 13 Aug 2008, 08:38 |
Joined: 30 Mar 2008, 15:37 Posts: 847
|
hey welcome back  if i'm not mistaken, those are some nice figures  a link is a hash+reduction right? i'm curious, what exactly did you reverse&cheat on? and did you rewrite the reduction function, using multiplications only? still compatible with the current rainbowtables?
|
|
|
|
 |
|
Sc00bz
|
Post subject: Re: Speed of Rainbow Tables Posted: 13 Aug 2008, 14:25 |
Joined: 03 Dec 2007, 11:37 Posts: 725
|
neinbrucke wrote: if i'm not mistaken, those are some nice figures  Yes actually it's faster than most brute forcers. neinbrucke wrote: a link is a hash+reduction right? yes. neinbrucke wrote: i'm curious, what exactly did you reverse&cheat on? There are 4 variables a, b, c, and d. Only one variable is modified each step and it repeats every 4 steps with "a, d, c, b" being the pattern. MD4 and MD5 work on 16, 4 byte chunks 00 to 15. Messages are padded with 0x80 and then all zeros. The last 64 bits is the size of the message in bits encoded in little endian. For a password that is 7 characters long it will fill the first 2 (00 and 01), 4 byte chunks with the password then a 0x80 and the size is in chunk 14 and the rest are just zeros. MD5 rounds: Round 1: 00, 01,02,03,04,05,06,07,08,09,10,11,12,13, 14,15 Round 2: 01,06,11, 00,05,10,15,04,09, 14,03,08,13,02,07,12 Round 3: 05,08,11, 14, 01,04,07,10,13, 00,03,06, 09,12,15,02 Round 4: 00,07, 14,05,12, 03,10, 01,08,15,06,13, 04,11, 02,09 << is left rotate Round 1: a = b + ((a + ((b and c) or ((not b) and d)) + data + const) << s) Round 2: a = b + ((a + ((b and d) or (c and (not d))) + data + const) << s) Round 3: a = b + ((a + (b xor c xor d) + data + const) << s) Round 4: a = b + ((a + (c xor (b or (not d))) + data + const) << s) Brute forcing: data[09] in round 3: Last full step you need to do then check variable a (1 in 4 billion chance will need to do the next 3 steps). You hold all the data chunks constant and change only data[00]. Therefore you can reverse almost all of round 4 because all the data besides data[00] is constant. You are left with a = a + data[00]. So at each guess you do a - data[00] which gives you the value of a after the step that uses data[09] in round 3. When you have finish changing data[00] you then need to modify the other data chunk(s) and re-reverse round 4 with the new constant data. Rainbow tables: 64 bit hash needed (Instead of doing a full step for the last step you can do: a = a + (c xor (b or (not d)))): data[03] in round 4: last step you need to do for lengths 1-7 or 8-8 chars use a and d for the 64 bit hash. data[04] in round 4: last step you need to do for lengths 1-35 chars use a and b for the 64 bit hash. 128 bit hash needed (Instead of doing a full step for the last step you can do: a = a + data): data[01] in round 4: last step you need to do for lengths 1-7 or 8-8 chars use a, b, c and d for the 128 bit hash. data[02] in round 4: last step you need to do for lengths 1-35 chars use a, b, c and d for the 128 bit hash. Almost forgot the half of the first step can be pre-calculated (more than likely done for you if you write this in c++ but for my case I did the whole MD5 in x86). a = b + (( a + ((b and c) or ((not b) and d)) + 0xd76aa478 + data[0]) << 7); a = b + (( 0x67452301 + ((0xefcdab89 and 0x98badcfe) or ((not 0xefcdab89) and 0x10325476)) + 0xd76aa478 + data[0]) << 7); // This is just one big constant a = b + (( 0x56e21bef + data[0]) << 7); I only added the data from 00, 01, and 14 since the rest are just zero. As noted before you need to do more steps if the character length is 1-8 or larger (technically 7-8 or larger). MD4 works the same except the data is added in at different steps, only as 3 rounds, and the steps are slightly different. << is left rotate Round 1: a = (a + ((b and c) or ((not b) and d)) + data) << s Round 2: a = (a + (((b and c) or d) and (b or c)) + data + 0x5a827999) << s Round 3: a = (a + (b xor c xor d) + data + 0x6ed9eba1) << s Note that round 2 is a simplified version which is equivalent to the original "a = (a + ((b and c) or (b and d) or (c and d)) + data + 0x5a827999) << s" You can also simplify round 1 to "a = (a + (((c xor d) and b) xor d) + data) << s" but since SSE2 has a "andn" instruction, which does "((not b) and d)", you should use that to make it flow better. This also applies to MD5 rounds 1 and 2. I glossed over this a little if you need more information just look up the MD5 and MD4 algorithms. Reversing is just start at the end with a hash, do every thing in reverse order, and when there's an add you subtract. neinbrucke wrote: and did you rewrite the reduction function, using multiplications only? Yes, I may change it to be compatible with a similar method that uses CUDA's instruction/function __umul24(). Only problem is that this will slow my code down a little. neinbrucke wrote: still compatible with the current rainbowtables? No, you can't have everything it's either speed or compatibility. Well there's the middle ground with maybe 7 million links/sec which would be a few times faster then WinRTGen (1.8 million steps/sec ahh I call them links since we're talking about chains and each "step" would be a link in a chain) which is faster than FRT's but the fast compatible version will only prevent people from going to the really fast version that will work nicely with CUDA. Side note: I sooooo wish there was a SSE2 instruction to multiply 2 vectors of 4, 32 bit integers and return the lower 32 bits then I would totally use the CUDA method. Ah found it in SSE4.1 PMULLD I'm pretty sure this is exactly what I wanted. SSE4.1 isn't supported on AMD processors well I guess that doesn't effect me.
_________________ http://www.tobtu.com/
|
|
|
|
 |
|
PowerBlade
|
Post subject: Re: Speed of Rainbow Tables Posted: 13 Aug 2008, 16:41 |
Joined: 11 Oct 2007, 21:17 Posts: 1218 Location: Copenhagen, Denmark
|
|
Hi Sc00bz.
It sounds like you did some nice work. I fully understand why the tables aren't backwards compatible, but I'm just curious to how the reduction function works now. If your using multiplications as opposed to divisions, i suppose you needed to increase the index size as well to be able to make tables with high keyspaces. Is that what you did? And how does it affect the size of the rainbow tables?
|
|
|
|
 |
|
Sc00bz
|
Post subject: Re: Speed of Rainbow Tables Posted: 13 Aug 2008, 23:07 |
Joined: 03 Dec 2007, 11:37 Posts: 725
|
PowerBlade wrote: If your using multiplications as opposed to divisions, i suppose you needed to increase the index size as well to be able to make tables with high keyspaces. Is that what you did? I did two extra steps for every chain: one at the beginning and one at the end. The step at the beginning is basically the rainbow crack method that uses divide to turn an index into a password. Then the last step turns the hash into an index. So it goes index to password to hash to password to ... to hash to password to hash to index... instead of going index to password to hash to index to ... to password to hash to index. Ohh right the answer is the index size is exactly the same.
_________________ http://www.tobtu.com/
|
|
|
|
 |
|
Sc00bz
|
Post subject: Re: Speed of Rainbow Tables Posted: 14 Aug 2008, 00:15 |
Joined: 03 Dec 2007, 11:37 Posts: 725
|
|
Ohh wow umm apparently I'm on crack WTF is "Xero" but anyway the Xeon 3.2 GHz is actually a 3.06 GHz. Which is why it got it's expected 10.8 million links/sec. I found this out because I was trying to see if it had SSE4.1 which was highly unlikely but I tried anyway. I also found out that it doesn't even have SSE3 so this CPU is old as dirt.
_________________ http://www.tobtu.com/
|
|
|
|
 |
|
neinbrucke
|
Post subject: Re: Speed of Rainbow Tables Posted: 14 Aug 2008, 00:37 |
Joined: 30 Mar 2008, 15:37 Posts: 847
|
you really did some cool things, although i'm somewhat doubting about the use in this project. On the other hand, the new rainbowtables aren't compatible with cain and old rcrack either. So as we already entered this state of 'incompatibility', it shouldn't be a problem to have new tables generated in this optimized way : ). I'm curious about powerblades thoughts about this one. Sc00bz wrote: Side note: I sooooo wish there was a SSE2 instruction to multiply 2 vectors of 4, 32 bit integers and return the lower 32 bits then I would totally use the CUDA method. Ah found it in SSE4.1 PMULLD I'm pretty sure this is exactly what I wanted. SSE4.1 isn't supported on AMD processors well I guess that doesn't effect me. i did not know exactly why, but i really wanted one of the new intel cpu's (8xxx and q9xxx series) because of SSE4.1 support... now i'm glad i did 
|
|
|
|
 |
|
the_drag0n
|
Post subject: Re: Speed of Rainbow Tables Posted: 14 Aug 2008, 07:11 |
| Perfect Table |
 |
Joined: 12 May 2008, 11:02 Posts: 829
|
Sc00bz wrote: Side note: I sooooo wish there was a SSE2 instruction to multiply 2 vectors of 4, 32 bit integers and return the lower 32 bits then I would totally use the CUDA method. Ah found it in SSE4.1 PMULLD I'm pretty sure this is exactly what I wanted. SSE4.1 isn't supported on AMD processors well I guess that doesn't effect me. side note: if you do limit the rt production to intels then freert.com will fail. I think there are few people with amd (including me) btw : SSE4.1==SSE4A ??
_________________
|
|
|
|
 |
|
PowerBlade
|
Post subject: Re: Speed of Rainbow Tables Posted: 14 Aug 2008, 07:42 |
Joined: 11 Oct 2007, 21:17 Posts: 1218 Location: Copenhagen, Denmark
|
Sc00bz wrote: PowerBlade wrote: If your using multiplications as opposed to divisions, i suppose you needed to increase the index size as well to be able to make tables with high keyspaces. Is that what you did? I did two extra steps for every chain: one at the beginning and one at the end. The step at the beginning is basically the rainbow crack method that uses divide to turn an index into a password. Then the last step turns the hash into an index. So it goes index to password to hash to password to ... to hash to password to hash to index... instead of going index to password to hash to index to ... to password to hash to index. Ohh right the answer is the index size is exactly the same. Ah.. thats a nice way to do it  Good thinking
|
|
|
|
 |
|
PowerBlade
|
Post subject: Re: Speed of Rainbow Tables Posted: 14 Aug 2008, 07:45 |
Joined: 11 Oct 2007, 21:17 Posts: 1218 Location: Copenhagen, Denmark
|
neinbrucke wrote: you really did some cool things, although i'm somewhat doubting about the use in this project. On the other hand, the new rainbowtables aren't compatible with cain and old rcrack either. So as we already entered this state of 'incompatibility', it shouldn't be a problem to have new tables generated in this optimized way : ). I'm curious about powerblades thoughts about this one.
Well, my thought is that it would be useful. As long as the code can be integrated into rcracki, i don't see a problem with it. How is the cracking time with the new code, Sc00bz? I guess its faster aswell?
|
|
|
|
 |
|
Sc00bz
|
Post subject: Re: Speed of Rainbow Tables Posted: 14 Aug 2008, 07:49 |
Joined: 03 Dec 2007, 11:37 Posts: 725
|
SSE4.1 is not the same as SSE4a. Also there would be different versions. I already know how to check if your CPU supports SSE4.1. I'm not completely sure how to check if your CPU is 64 bit capable. I'd really only use it so I can be like go download the 64 bit version because it's faster and then continue anyway. #include <stdio.h>
int main() { unsigned int out1, out2, out3; unsigned int mmx, sse, sse2, sse41, x86_64, long_mode, long_mode_intel;
#ifdef _WIN32 __asm { mov eax,0x80000001 cpuid mov out3,edx mov eax,1 cpuid mov out1,ecx mov out2,edx } #else asm( "movl $2147483649,%%eax\n\t" "cpuid\n\t" "movl %%edx,%2\n\t" "movl $1,%%eax\n\t" "cpuid\n\t" : "=c"(out1), "=d"(out2), "=m"(out3) // output : // input : "eax", "ebx"); // used #endif sse41 = (out1 >> 19) & 1; mmx = (out2 >> 23) & 1; sse = (out2 >> 25) & 1; sse2 = (out2 >> 26) & 1; x86_64 = (out2 >> 30) & 1; long_mode = (out3 >> 29) & 1; long_mode_intel = (out3 >> 11) & 1; printf("MMX: %u\n" "SSE: %u\n" "SSE2: %u\n" "SSE4.1: %u\n" "x86-64?: %u\n" "long mode?: %u\n" "long mode intel?: %u\n", mmx, sse, sse2, sse41, x86_64, long_mode, long_mode_intel); /* if (sse41) { genChainsSSE41(); } else if (sse2) { genChainsSSE2(); } else if (mmx) { genChainsMMX(); } else { genChains(); } */ return 0; }
_________________ http://www.tobtu.com/
|
|
|
|
 |
|
Sc00bz
|
Post subject: Re: Speed of Rainbow Tables Posted: 14 Aug 2008, 07:54 |
Joined: 03 Dec 2007, 11:37 Posts: 725
|
PowerBlade wrote: How is the cracking time with the new code, Sc00bz? I guess its faster aswell? Haven't tried it since it only does loweralpha-numeric 1-6. It will be faster because you can generate the indexes to lookup faster. Well it really depends on which is the bottle neck the CPU or the hard drive.
_________________ http://www.tobtu.com/
|
|
|
|
 |
|
Sc00bz
|
Post subject: Re: Speed of Rainbow Tables Posted: 15 Aug 2008, 10:49 |
Joined: 03 Dec 2007, 11:37 Posts: 725
|
|
New benchmark on a CPU that doesn't suck. I always knew a Core 2 would run faster per clock than a P4, but I did not expect it to be over twice as faster per clock.
I tested it with MD5 loweralpha-numeric lengths 1-6 chain length of 10,000 with one 32 bit thread. I just tested it on a core 2 quad 2.5 GHz and got 21.9 million links/sec "2,190 chains/sec" (Knoppix 5.1). That's per core so in total it gets 87.6 million links/sec "8,760 chains/sec."
I thought it might be doing something weird like using multiple cores to run a single thread (some how). So I ran 4 different instances at once with each saying 21.9 million links/sec. Good news is I can test out SSE4.1 and x86-64 code on it too.
_________________ http://www.tobtu.com/
|
|
|
|
 |
|
the_drag0n
|
Post subject: Re: Speed of Rainbow Tables Posted: 15 Aug 2008, 12:06 |
| Perfect Table |
 |
Joined: 12 May 2008, 11:02 Posts: 829
|
THAT is fast  i only get 650 chains/s with 4 * 2,2Ghz  if this will be implented in a newer linux client id like to test it on my quadcore.
_________________
|
|
|
|
 |
|
neinbrucke
|
Post subject: Re: Speed of Rainbow Tables Posted: 15 Aug 2008, 13:36 |
Joined: 30 Mar 2008, 15:37 Posts: 847
|
lol sweet  if you need a core2quad 2.66 ghz... got one 
|
|
|
|
 |
|
|
 |
|
 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|