|
It is currently 30 Jul 2010, 11:49
|
View unanswered posts | View active topics
 |
|
 |
|
|
jci
|
Post subject: Re: Speed of Rainbow Tables Posted: 21 Nov 2008, 21:21 |
Joined: 05 Nov 2007, 01:55 Posts: 103
|
As a reply to both the above posts, all I can say is i would love to have a go at it, but am completely broke to get myself a new card... I'm praying for some dough on Christmas... Regarding "worrying about CPU implementations", why not? All I have done was implementing the MD4/MD5 as per RFC, and then a few optimizations for small lengths (0 to 16) - the full, clean source code for MD4 is ~300 lines and it's way faster than openssl's implementation...
|
|
|
|
 |
|
jci
|
Post subject: Re: Speed of Rainbow Tables Posted: 22 Nov 2008, 00:14 |
Joined: 05 Nov 2007, 01:55 Posts: 103
|
This just popped into my mind: aMask32 = crc32(nChainPos) ^ nTableIndex; aMask128 = _mm_set1_epi32(aMask32);
Simple, yet effective! Update: adding bench/corrected values $ gcc -O3 -march=prescott -msse2 crc32.c -o crc32 ./crc32 Running benchmark, please wait... Hash performance: 68.293M/sec
|
|
|
|
 |
|
PowerBlade
|
Post subject: Re: Speed of Rainbow Tables Posted: 22 Nov 2008, 01:14 |
Joined: 11 Oct 2007, 21:17 Posts: 1218 Location: Copenhagen, Denmark
|
the_drag0n wrote: it is true but as there is noone to write cuda stuff we cant use it. Not exactly true. As Bitweasil says its not that hard. The reason not to port the rainbowcrack code to CUDA is because 1) I was busy with the DistrRTgen client (believe me it took a lot of time!) 2) The current RainbowCrack code doesn't translate very well to cuda. Doing 64 bit divides on CUDA is almost impossible according to my results. So there won't be CUDA code before we find a new reduction function, which is work in progress.
|
|
|
|
 |
|
jci
|
Post subject: Re: Speed of Rainbow Tables Posted: 22 Nov 2008, 16:21 |
Joined: 05 Nov 2007, 01:55 Posts: 103
|
Here are preliminary bench results for "octo" MD5 (SSE2), single core C2D 4500, 2.2Ghz ./md5_sse Running benchmark, please wait... Hash performance: 2.347M/sec (18.779M/sec total)
My guess is a fast quad core CPU could do about 100M/s - which is in the same order of magnitude as a CUDA implementation. Will update with md4 results soon
|
|
|
|
 |
|
Corni
|
Post subject: Re: Speed of Rainbow Tables Posted: 22 Nov 2008, 16:53 |
Joined: 15 Nov 2008, 20:45 Posts: 34
|
|
Hi, will you post your implementation of md4 as well? Would be interesting to see how better it performs against the one from Bitweasil. If you say, you optimize for lentgths <16, do you mean it's not possible to use lengths>=16, or just that it's slower? Corni
|
|
|
|
 |
|
jci
|
Post subject: Re: Speed of Rainbow Tables Posted: 22 Nov 2008, 17:08 |
Joined: 05 Nov 2007, 01:55 Posts: 103
|
Corni wrote: Hi, will you post your implementation of md4 as well? Would be interesting to see how better it performs against the one from Bitweasil. If you say, you optimize for lentgths <16, do you mean it's not possible to use lengths>=16, or just that it's slower? Corni Works for any length, though 0 thru 15 is faster. The code isn't "production ready" (needs cleaning up, merge of the 4/8 hash SSE variants, etc), so no actual release yet, but PM for a link to the current working version. I'd like to benchmark the one from Bitweasil, if i can find it...
|
|
|
|
 |
|
Corni
|
Post subject: Re: Speed of Rainbow Tables Posted: 22 Nov 2008, 17:40 |
Joined: 15 Nov 2008, 20:45 Posts: 34
|
|
|
|
 |
|
jci
|
Post subject: Re: Speed of Rainbow Tables Posted: 22 Nov 2008, 17:56 |
Joined: 05 Nov 2007, 01:55 Posts: 103
|
Corni wrote: you'll get a PM in a minute  here's Bitweasils implementation: viewtopic.php?f=6&t=904&start=0#p7940edit: you have disallowed PM's by normal members... Oh, thats just the "reference" algorithm. not a full implementation. If you want a full MD4 (and not a version with reversed steps) you pretty much have to follow that... The optimization part is done on preparing variables, moving memory from one place to another, order of instructions, etc...
|
|
|
|
 |
|
Bitweasil
|
Post subject: Re: Speed of Rainbow Tables Posted: 22 Nov 2008, 23:52 |
Joined: 02 Aug 2008, 08:09 Posts: 214
|
|
I can benchmark code on a Core 2 Quad 2.33, if anyone is interested. I've got access to a few of those. PM me.
My code is just the reference implementation, with unused inputs zeroed out, and only a single pass. I didn't do anything fancy with it.
With the reduction function, you're probably looking at around 80M links/s on a quadcore, which is quite respectable and a very significant improvement on the RainbowCrack implementation.
|
|
|
|
 |
|
neinbrucke
|
Post subject: Re: Speed of Rainbow Tables Posted: 23 Nov 2008, 00:42 |
Joined: 30 Mar 2008, 15:37 Posts: 847
|
i have a core2quad 2.66 @ 3.2 Ghz on winxp 32bit if you want to run a test 
|
|
|
|
 |
|
Bitweasil
|
Post subject: Re: Speed of Rainbow Tables Posted: 23 Nov 2008, 01:21 |
Joined: 02 Aug 2008, 08:09 Posts: 214
|
|
... far, FAR too many people run Windows on otherwise perfectly good hardware.
I suppose I need a copy of Visual Studio.
|
|
|
|
 |
|
mleo2003
|
Post subject: Re: Speed of Rainbow Tables Posted: 23 Nov 2008, 01:37 |
Joined: 19 Sep 2008, 04:42 Posts: 15
|
|
Visual Studio Express is free, and should have everything you need on it.
|
|
|
|
 |
|
jci
|
Post subject: Re: Speed of Rainbow Tables Posted: 23 Nov 2008, 16:10 |
Joined: 05 Nov 2007, 01:55 Posts: 103
|
Bitweasil wrote: ... far, FAR too many people run Windows on otherwise perfectly good hardware.
I suppose I need a copy of Visual Studio. So true I just XP on VirtualBox if i really need to, takes 3 seconds from click to working desktop  I do my coding on Linux, also need to get VS installed for building/testing.
|
|
|
|
 |
|
jci
|
Post subject: Re: Speed of Rainbow Tables Posted: 25 Nov 2008, 01:22 |
Joined: 05 Nov 2007, 01:55 Posts: 103
|
Just to sum up a few things on possible optimizations : - New reduction function
- one possible method is a XOR mask, along with a fixed plaintext Len and "mapping" of at most 16 bytes to plain at that charset position. - another method would be using a PRNG, would need more details on specific implementation - expected gain: depends on the hash algo being used, but a figure of 30% for whole chain isn't too far off. - Computation of chains in parallel
- To allow the use of SSE-optimized (and possibly CUDA) versions, computation of hashes in parallel needs to be used with different chains. - Also, the same app could be multi-threaded, using all the CPU cores in 1 single instance. - expected gain: from some benchmarks of an 8-in-1 MD5 SSE2/3 version, expect from 200% to 350% gain for a single core (for the hashing alone). - Optimized hash routines
- Implementation of optimized hash routines and dropping of openssl. - speed gains for single-hash versions vary, but i would expect anything between 10 to 100%. Also, see above for multiple-hash routines.
A note on the first line: - that idea for a new reduction function would mean fixed-len plaintexts... not sure how this would impact others (1 table per len) - the relative speed gains can be much higher, if the speed of hashing (using new hash routines) is also higher - as the current reduction function would be "relatively slower" (not sure if what i typed makes any sense, hopefully it does) Feel to discuss these points / add more ideas.
|
|
|
|
 |
|
|
 |
|
 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|