|
It is currently 30 Jul 2010, 11:57
|
View unanswered posts | View active topics
 |
|
 |
|
| Author |
Message |
|
jci
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 09:22 |
Joined: 05 Nov 2007, 01:55 Posts: 103
|
Sc00bz wrote: BarsMonster wrote: Make sure to compare your result to BarsWF SSE2  Pshh, you can't compare brute force to rainbow tables and expect it to be near. Brute force does 69% of an MD5 while rainbow tables need to do 100% of an MD5 (to be compatible with RainbowCrack). Also generating the next password for brute force is as simple as adding a constant while rainbow tables need to divide N times. Exactly. It's a real MD5 implementation, not a fake one  Anyhow, the SSE2 thing seems the most interesting, as a multi-hash function could be used in chain generation (do multiple chains at once) I will have a go at MD4, should be simple...
|
|
|
|
 |
|
jci
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 14:13 |
Joined: 05 Nov 2007, 01:55 Posts: 103
|
Bitweasil wrote: No real loss to me to post this, I suppose.
Just the reference implementation, single stage. Hardly "optimized." Though a good bit faster than libssl's MD4, as it isn't doing nearly the same amount of work.
b0-15, abcd are 32-bit unsigned integer types.
Data goes in b0 to wherever, padding bit gets set per RFC, length gets set. Remember your little endian byte ordering.
If you have a good compiler, it will optimize out unneeded registers. If not, do it manually.
Thanks for the help, I've got an implementation benchmarking at 8.5 - 9M/sec 
|
|
|
|
 |
|
neinbrucke
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 14:34 |
Joined: 30 Mar 2008, 15:37 Posts: 847
|
|
somehow i failed to implement that code, didn't get the right results :$ (bare with me please, i might miss the obvious) i don't really get what is should put in b0-b15
i started by converting unsigned chars to UINT4's and then put a 4 character (1 uint4) into b0.
you set: // LOAD DATA INTO b0 ... whatever here. b0 = 0x01;
// Padding bit per RFC b4 = 0x00000080; // Length b14 = 128;
are those examples? or should i use the same values? (like b0 = input, b1 = 0x00000080 for padding ?)
|
|
|
|
 |
|
jci
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 14:46 |
Joined: 05 Nov 2007, 01:55 Posts: 103
|
neinbrucke wrote: somehow i failed to implement that code, didn't get the right results :$ (bare with me please, i might miss the obvious) i don't really get what is should put in b0-b15
i started by converting unsigned chars to UINT4's and then put a 4 character (1 uint4) into b0.
you set: // LOAD DATA INTO b0 ... whatever here. b0 = 0x01;
// Padding bit per RFC b4 = 0x00000080; // Length b14 = 128;
are those examples? or should i use the same values? (like b0 = input, b1 = 0x00000080 for padding ?) - First, don't forget processing is done in 64-byte blocks - b0, b1, b2 etc are 4-byte words, and you're getting N chars from your "text" data (char = 1 byte), so you will need to "convert" them - either by pointer reference, copying memory, a loop with either case, etc... - The padding, 0x80 is put after the last char - if your data is 4 chars, the 5th should be 0x80. - b14, Length is data length * 8 (or << 3)
|
|
|
|
 |
|
neinbrucke
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 15:40 |
Joined: 30 Mar 2008, 15:37 Posts: 847
|
so if i have an unsigned char * input which contains "test": i first tried code from the rfc to encode/decode: b0 = ((UINT4)input[0]) | (((UINT4)input[1]) << 8) | (((UINT4)input[2]) << 16) | (((UINT4)input[3]) << 24); but easier was: UINT4 * pUiIn = (UINT4 *) input; b0 = pUiIn[0]; (which i stole from md5_new) gave the same result. so then i should probably do: b1 = 0x80 and then length, is that the default 128 because of padding? or is length, the length of the input, thus 4... and thus; b14 = 4 << 3; (can't test right now, but does it seem like i understand now?  )
|
|
|
|
 |
|
BarsMonster
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 16:50 |
Joined: 30 Aug 2008, 20:01 Posts: 34
|
Quote: Pshh, you can't compare brute force to rainbow tables and expect it to be near. Sorry, you right  I've missed that you are doing that for RT. Anyway, consider working with 12 keys at once - 3 SSE2 registers (Tss! That's top secred optimization)
|
|
|
|
 |
|
Bitweasil
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 17:26 |
Joined: 02 Aug 2008, 08:09 Posts: 214
|
neinbrucke wrote: UINT4 * pUiIn = (UINT4 *) input; b0 = pUiIn[0]; ... b1 = 0x80 ... and then length, is that the default 128 because of padding? or is length, the length of the input, thus 4... and thus; b14 = 4 << 3;
You appear to be pretty close. Yes, if you have 4 input bytes in b0, b1 will be 0x00000080, and length will be the length of your input in bits, so 4*8 = 32. You don't count the padding bit. And, yes, the UINT4 pointer conversion works fine. The RFC implementation uses that bit shifting method so it will work properly on both a big endian and little endian system, but as x86 CPUs are all little endian and video cards as well, that option is just a memcpy, which is faster. The reason for my values was that it was for an 8 byte NTLM input, which is 16 bytes when converted to UTF16, so 128 bits. I've not benchmarked it on a CPU, but I suspect your numbers are around where it should be running. I may throw together a CPU implementation for benchmarking, as it would be good to know what my algorithm runs at on a host CPU for comparison purposes.
|
|
|
|
 |
|
Sc00bz
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 18:10 |
Joined: 03 Dec 2007, 11:37 Posts: 725
|
BarsMonster wrote: Quote: Pshh, you can't compare brute force to rainbow tables and expect it to be near. Sorry, you right  I've missed that you are doing that for RT. Anyway, consider working with 12 keys at once - 3 SSE2 registers (Tss! That's top secred optimization) How is that top secret? 16 registers you have 3*4 working registers and you need 3*1 temp registers to do anything then you can use the extra register for loading the current constant into. Oh no I gave away the secret. Well then on the forth round you need to load -1 into the extra register so you can do a not operation. The top top secret thing is to interlace ... and ... on 32 bit systems. Oh then the other super top top secret thing for brute forcing is to do the thing with the thing to generate the next password. These two things are real and I'll probably tell you if you can't get them. PS BarsMonster it's two things you aren't doing. Ohh right I already said the second one somewhere on this forum (I think).
_________________ http://www.tobtu.com/
|
|
|
|
 |
|
Sc00bz
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 18:19 |
Joined: 03 Dec 2007, 11:37 Posts: 725
|
Hmm can't find the post so it may have been lost when the server crashed last time because I know a few posts of mine were lost  or I have no sense of time and it was a long time ago
_________________ http://www.tobtu.com/
|
|
|
|
 |
|
PowerBlade
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 18:38 |
Joined: 11 Oct 2007, 21:17 Posts: 1218 Location: Copenhagen, Denmark
|
Sc00bz wrote: Hmm can't find the post so it may have been lost when the server crashed last time because I know a few posts of mine were lost  or I have no sense of time and it was a long time ago I'm running DB backup everyday around 0:00 UTC so everything from that time and until the crash is lost.
|
|
|
|
 |
|
PowerBlade
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 18:39 |
Joined: 11 Oct 2007, 21:17 Posts: 1218 Location: Copenhagen, Denmark
|
|
What would be the recommendation regarding chain generation to utilize these advanced features? Should each thread work on 2 or more chains in parallel and then take the hash routine combined with the chains together?
|
|
|
|
 |
|
Sc00bz
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 20:54 |
Joined: 03 Dec 2007, 11:37 Posts: 725
|
Each thread for a 64 bit system you would generate 12 chains in parallel (SSE2 interlaced 3 times) and for a 32 bit system you would generate 6 chains in parallel (using "top top secret thing"  ). The hash routine would hash several passwords in parallel. Then you would need to generate the next password from each of the hashes (one at a time  ). Then repeat. When AVX comes around, in 2010, the 32 bit version might be most optimal doing 4 chains in parallel. Ohh right the LM hash routine would do 64, 128 or 192 chains in parallel (you can steal the 64 and 128 code from JTR) and 256 chains in parallel with AVX. I don't know if the 192 chains in parallel code would be any faster than the 128 chains in parallel (probably only faster on 32 bit systems). If someone smart could create an optimized bit slicing algorithm geared towards 8 and 16 registers that would be awesome. Since the current best optimized bit slicing algorithm doesn't take into account a limited number of registers which is OK if you're on a PS3 (128 registers).
_________________ http://www.tobtu.com/
|
|
|
|
 |
|
mleo2003
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 21:15 |
Joined: 19 Sep 2008, 04:42 Posts: 15
|
I'm not sure how helpful this would be, but I found this the other day while searching for more information on Bitslice: http://www.eseidel.com/dev/current/bitslice/His paper goes into great detail on DES, implementation issues, how it could apply to other works, and shows his version that works on AltiVec VPU. If nothing else, it might help others understand it more, as it did me. I think he discussed efficient use of registers in the paper, but I don't believe he found a good solution to it.
|
|
|
|
 |
|
Sc00bz
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 21:29 |
Joined: 03 Dec 2007, 11:37 Posts: 725
|
I just realized that most people probably don't know what bit slicing is. Well DES, the encryption algorithm behind LM hashes, was made to be very slow in software but very fast in hardware. The way they did this is with several P-Boxes (permutation boxes). P-Boxes shuffle the bits of a variable which requires several bit shifts, ands, and ors. With bit slicing each variable is it's own bit of the variable so when you go through a P-Box you don't need to anything besides know that variable X is now the Yth bit. The only problem with DES is the S-Boxes (substitution boxes). S-Boxes take input bits and then substitute them with a value in a lookup table. You can't use a simple lookup table with bit slicing because all the input bits are spread out between several variables. The way around that is to create a logical function to create the same output as the lookup table. This is the heart of bit slicing. The current best is an average of 56 operations/S-Box with standard gates (and, or, and xor) and 51 operations/S-Box with non-standard gates (and, or, xor, andn, and nxor). Since there is no nxor instruction these would be replaced by 2 instructions meaning it's 53.375 operations/S-Box. For more info and code go to http://www.darkside.com.au/bitslice/
_________________ http://www.tobtu.com/
|
|
|
|
 |
|
mleo2003
|
Post subject: Re: Optimized hash routines needed! Posted: 19 Nov 2008, 22:27 |
Joined: 19 Sep 2008, 04:42 Posts: 15
|
|
What finally helped me figure out bitslicing was the fact that it does multiple hashes at one time, by changing the way the data is stored. If you think of the normal way of storing the info to hash as going horizontal, bitslice takes that data, and instead runs it vertically (in a round-about way). Doing it that way, the bit shifts that are so hard to do horizontally one hash at a time can all be done just like Scoobz said, by just moving whole variables around, which does it for every variable there, as well as the xors and such can be done all at one time as well. It makes very efficient use of things, and works a whole lot smarter, not harder.
|
|
|
|
 |
|
|
 |
|
 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|