It is currently 30 Jul 2010, 11:57

All times are UTC + 1 hour [ DST ]




 Page 2 of 5 [ 65 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 09:22 
Brute Force

Joined: 05 Nov 2007, 01:55
Posts: 103
Sc00bz wrote:
BarsMonster wrote:
Make sure to compare your result to BarsWF SSE2 :-)

Pshh, you can't compare brute force to rainbow tables and expect it to be near. Brute force does 69% of an MD5 while rainbow tables need to do 100% of an MD5 (to be compatible with RainbowCrack). Also generating the next password for brute force is as simple as adding a constant while rainbow tables need to divide N times.


Exactly. It's a real MD5 implementation, not a fake one ;)
Anyhow, the SSE2 thing seems the most interesting, as a multi-hash function could be used in chain generation (do multiple chains at once)

I will have a go at MD4, should be simple...


Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 14:13 
Brute Force

Joined: 05 Nov 2007, 01:55
Posts: 103
Bitweasil wrote:
No real loss to me to post this, I suppose.

Just the reference implementation, single stage. Hardly "optimized." Though a good bit faster than libssl's MD4, as it isn't doing nearly the same amount of work.

b0-15, abcd are 32-bit unsigned integer types.

Data goes in b0 to wherever, padding bit gets set per RFC, length gets set. Remember your little endian byte ordering.

If you have a good compiler, it will optimize out unneeded registers. If not, do it manually.


Thanks for the help, I've got an implementation benchmarking at 8.5 - 9M/sec :D


Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 14:34 
Developer

Joined: 30 Mar 2008, 15:37
Posts: 847
somehow i failed to implement that code, didn't get the right results :$ (bare with me please, i might miss the obvious)
i don't really get what is should put in b0-b15

i started by converting unsigned chars to UINT4's and then put a 4 character (1 uint4) into b0.

you set:
// LOAD DATA INTO b0 ... whatever here.
b0 = 0x01;

// Padding bit per RFC
b4 = 0x00000080;
// Length
b14 = 128;

are those examples? or should i use the same values? (like b0 = input, b1 = 0x00000080 for padding ?)


Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 14:46 
Brute Force

Joined: 05 Nov 2007, 01:55
Posts: 103
neinbrucke wrote:
somehow i failed to implement that code, didn't get the right results :$ (bare with me please, i might miss the obvious)
i don't really get what is should put in b0-b15

i started by converting unsigned chars to UINT4's and then put a 4 character (1 uint4) into b0.

you set:
// LOAD DATA INTO b0 ... whatever here.
b0 = 0x01;

// Padding bit per RFC
b4 = 0x00000080;
// Length
b14 = 128;

are those examples? or should i use the same values? (like b0 = input, b1 = 0x00000080 for padding ?)


- First, don't forget processing is done in 64-byte blocks
- b0, b1, b2 etc are 4-byte words, and you're getting N chars from your "text" data (char = 1 byte), so you will need to "convert" them - either by pointer reference, copying memory, a loop with either case, etc...
- The padding, 0x80 is put after the last char - if your data is 4 chars, the 5th should be 0x80.
- b14, Length is data length * 8 (or << 3)


Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 15:40 
Developer

Joined: 30 Mar 2008, 15:37
Posts: 847
so if i have an unsigned char * input which contains "test":
i first tried code from the rfc to encode/decode:
b0 = ((UINT4)input[0]) | (((UINT4)input[1]) << 8) | (((UINT4)input[2]) << 16) | (((UINT4)input[3]) << 24);

but easier was:
UINT4 * pUiIn = (UINT4 *) input;
b0 = pUiIn[0];
(which i stole from md5_new)
gave the same result.

so then i should probably do:
b1 = 0x80

and then length, is that the default 128 because of padding? or is length, the length of the input, thus 4... and thus; b14 = 4 << 3;

(can't test right now, but does it seem like i understand now? :))


Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 16:50 
Guesser

Joined: 30 Aug 2008, 20:01
Posts: 34
Quote:
Pshh, you can't compare brute force to rainbow tables and expect it to be near.


Sorry, you right :-) I've missed that you are doing that for RT.
Anyway, consider working with 12 keys at once - 3 SSE2 registers (Tss! That's top secred optimization)


Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 17:26 
Rainbow Table

Joined: 02 Aug 2008, 08:09
Posts: 214
neinbrucke wrote:
UINT4 * pUiIn = (UINT4 *) input;
b0 = pUiIn[0];
...
b1 = 0x80
...
and then length, is that the default 128 because of padding? or is length, the length of the input, thus 4... and thus; b14 = 4 << 3;


You appear to be pretty close. Yes, if you have 4 input bytes in b0, b1 will be 0x00000080, and length will be the length of your input in bits, so 4*8 = 32. You don't count the padding bit. And, yes, the UINT4 pointer conversion works fine. The RFC implementation uses that bit shifting method so it will work properly on both a big endian and little endian system, but as x86 CPUs are all little endian and video cards as well, that option is just a memcpy, which is faster.

The reason for my values was that it was for an 8 byte NTLM input, which is 16 bytes when converted to UTF16, so 128 bits.

I've not benchmarked it on a CPU, but I suspect your numbers are around where it should be running. I may throw together a CPU implementation for benchmarking, as it would be good to know what my algorithm runs at on a host CPU for comparison purposes.


Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 18:10 
Developer

Joined: 03 Dec 2007, 11:37
Posts: 725
BarsMonster wrote:
Quote:
Pshh, you can't compare brute force to rainbow tables and expect it to be near.


Sorry, you right :-) I've missed that you are doing that for RT.
Anyway, consider working with 12 keys at once - 3 SSE2 registers (Tss! That's top secred optimization)

How is that top secret? 16 registers you have 3*4 working registers and you need 3*1 temp registers to do anything then you can use the extra register for loading the current constant into. Oh no I gave away the secret. Well then on the forth round you need to load -1 into the extra register so you can do a not operation.

The top top secret thing is to interlace ... and ... on 32 bit systems. Oh then the other super top top secret thing for brute forcing is to do the thing with the thing to generate the next password.

These two things are real and I'll probably tell you if you can't get them. PS BarsMonster it's two things you aren't doing. Ohh right I already said the second one somewhere on this forum (I think).



_________________
http://www.tobtu.com/
Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 18:19 
Developer

Joined: 03 Dec 2007, 11:37
Posts: 725
Hmm can't find the post so it may have been lost when the server crashed last time because I know a few posts of mine were lost :( or I have no sense of time and it was a long time ago



_________________
http://www.tobtu.com/
Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 18:38 
Site Admin

Joined: 11 Oct 2007, 21:17
Posts: 1218
Location: Copenhagen, Denmark
Sc00bz wrote:
Hmm can't find the post so it may have been lost when the server crashed last time because I know a few posts of mine were lost :( or I have no sense of time and it was a long time ago


I'm running DB backup everyday around 0:00 UTC so everything from that time and until the crash is lost.


Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 18:39 
Site Admin

Joined: 11 Oct 2007, 21:17
Posts: 1218
Location: Copenhagen, Denmark
What would be the recommendation regarding chain generation to utilize these advanced features? Should each thread work on 2 or more chains in parallel and then take the hash routine combined with the chains together?


Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 20:54 
Developer

Joined: 03 Dec 2007, 11:37
Posts: 725
Each thread for a 64 bit system you would generate 12 chains in parallel (SSE2 interlaced 3 times) and for a 32 bit system you would generate 6 chains in parallel (using "top top secret thing" :)).

The hash routine would hash several passwords in parallel.
Then you would need to generate the next password from each of the hashes (one at a time :().
Then repeat.

When AVX comes around, in 2010, the 32 bit version might be most optimal doing 4 chains in parallel.

Ohh right the LM hash routine would do 64, 128 or 192 chains in parallel (you can steal the 64 and 128 code from JTR) and 256 chains in parallel with AVX. I don't know if the 192 chains in parallel code would be any faster than the 128 chains in parallel (probably only faster on 32 bit systems).

If someone smart could create an optimized bit slicing algorithm geared towards 8 and 16 registers that would be awesome. Since the current best optimized bit slicing algorithm doesn't take into account a limited number of registers which is OK if you're on a PS3 (128 registers).



_________________
http://www.tobtu.com/
Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 21:15 
Shoulder Surfer

Joined: 19 Sep 2008, 04:42
Posts: 15
I'm not sure how helpful this would be, but I found this the other day while searching for more information on Bitslice:

http://www.eseidel.com/dev/current/bitslice/

His paper goes into great detail on DES, implementation issues, how it could apply to other works, and shows his version that works on AltiVec VPU. If nothing else, it might help others understand it more, as it did me. I think he discussed efficient use of registers in the paper, but I don't believe he found a good solution to it.


Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 21:29 
Developer

Joined: 03 Dec 2007, 11:37
Posts: 725
I just realized that most people probably don't know what bit slicing is. Well DES, the encryption algorithm behind LM hashes, was made to be very slow in software but very fast in hardware. The way they did this is with several P-Boxes (permutation boxes). P-Boxes shuffle the bits of a variable which requires several bit shifts, ands, and ors. With bit slicing each variable is it's own bit of the variable so when you go through a P-Box you don't need to anything besides know that variable X is now the Yth bit. The only problem with DES is the S-Boxes (substitution boxes). S-Boxes take input bits and then substitute them with a value in a lookup table. You can't use a simple lookup table with bit slicing because all the input bits are spread out between several variables. The way around that is to create a logical function to create the same output as the lookup table. This is the heart of bit slicing. The current best is an average of 56 operations/S-Box with standard gates (and, or, and xor) and 51 operations/S-Box with non-standard gates (and, or, xor, andn, and nxor). Since there is no nxor instruction these would be replaced by 2 instructions meaning it's 53.375 operations/S-Box.

For more info and code go to http://www.darkside.com.au/bitslice/



_________________
http://www.tobtu.com/
Offline
 Profile  
 
 Post subject: Re: Optimized hash routines needed!
PostPosted: 19 Nov 2008, 22:27 
Shoulder Surfer

Joined: 19 Sep 2008, 04:42
Posts: 15
What finally helped me figure out bitslicing was the fact that it does multiple hashes at one time, by changing the way the data is stored. If you think of the normal way of storing the info to hash as going horizontal, bitslice takes that data, and instead runs it vertically (in a round-about way). Doing it that way, the bit shifts that are so hard to do horizontally one hash at a time can all be done just like Scoobz said, by just moving whole variables around, which does it for every variable there, as well as the xors and such can be done all at one time as well. It makes very efficient use of things, and works a whole lot smarter, not harder.


Offline
 Profile  
 
Display posts from previous:  Sort by  
 Page 2 of 5 [ 65 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC + 1 hour [ DST ]


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  

cron