Free Rainbow Tables | Forum

Home of the Distributed Generator and Cracker
It is currently 16 Apr 2014, 23:54

All times are UTC + 1 hour [ DST ]




Post new topic Reply to topic  [ 11 posts ] 
Author Message
 Post subject: Compilers
PostPosted: 09 Nov 2010, 02:34 
Offline
MΩth √G∑∏∫∪≤

Joined: 03 Dec 2007, 11:37
Posts: 1059
I think GCC got better at compiling. A few years ago when I tested this GCC would compile "r = a % b; a = a / b;" as two divides if b was not in the stack. I tried testing this in a small scale program but failed to get GCC to generate two divides when using -O3. I thought it was something I was doing so compiled ChainWalkContext.cpp from rcracki_mt 0.6.5.2 and got this:

I compiled it with:
g++ -O3 -S ChainWalkContext.cpp

This is the original:
Code:
...
.L93:
#APP
   mov -4(%rsp), %eax
   xor %edx, %edx
   divl 256(%rcx)
   mov %eax, -4(%rsp)
   mov %edx, -8(%rsp)
#NO_APP
   mov   -8(%rsp), %edx
   movslq   %r11d,%rax
   movzbl   (%rcx,%rdx), %edx
   movb   %dl, 16(%rax,%rbx)
   jmp   .L88
.L71:
...

Then I replaced the inline assembly with:
Code:
__asm__ __volatile__ ("nop"); // marker
nTemp = nIndexOfX32 % m_vCharset[j].m_nPlainCharsetLen;
nIndexOfX32 /= m_vCharset[j].m_nPlainCharsetLen;
__asm__ __volatile__ ("nop"); // marker
m_Plain[i] = m_vCharset[j].m_PlainCharset[nTemp];

and compiled to this which is the same except it has two less movs and two movs using registers instead of memory:
Code:
...
.L93:
#APP
   nop
#NO_APP
   xorl   %edx, %edx
   movl   %r8d, %eax
   divl   256(%r9)
   movl   %eax, %r8d
#APP
   nop
#NO_APP
   mov   %edx, %edx
   movslq   %r11d,%rcx
   movzbl   (%r9,%rdx), %eax
   movb   %al, 16(%rcx,%rbx)
   jmp   .L88
.L71:
...


At the very least change it to:
Code:
__asm__ __volatile__ ("xor %%edx, %%edx\n\t"
                      "divl %3"
                      : "=a"(nIndexOfX32), "=d"(nTemp)
                      : "a"(nIndexOfX32), "rm"(m_vCharset[j].m_nPlainCharsetLen)
                      : );

Which compiles to only one extra register mov which doesn't make sense to me... you'll see what I mean:
Code:
.L93:
   movl   %r8d, %eax
#APP
   xor %edx, %edx
   divl 256(%r9)
#NO_APP
   mov   %edx, %edx  <---- ?
   movl   %eax, %r8d
   movslq   %r11d,%rcx
   movzbl   (%r9,%rdx), %eax
   movb   %al, 16(%rcx,%rbx)
   jmp   .L88
.L71:

_________________
http://www.tobtu.com/


Top
 Profile  
 
 Post subject:
Posted: 10 Nov 2010, 05:21 


Top
  
 
 Post subject: Re: Compilers
PostPosted: 10 Nov 2010, 05:21 
Offline
MΩth √G∑∏∫∪≤

Joined: 03 Dec 2007, 11:37
Posts: 1059
Apparently GCC understands "c64 = (uint64) a32 * b32;" as just one instruction 32 bit multiply "mull."

main.cpp:
Code:
#include <stdio.h>

#ifdef _WIN32
   #define uint32    unsigned __int32
   #define uint64    unsigned __int64
#else
   #define uint32    unsigned int
   #define uint64    unsigned long long
#endif

union uint64u
{
   uint64 u64;
   struct
   {
//#ifdef BIG_ENDIAN // TODO find how to test if compiled for big or little endian
//      uint32 hi32, lo32; // Hmm apparently BIG_ENDIAN is 4321 on some systems
//#else
      uint32 lo32, hi32;
//#endif
   };
};

int main()
{
   uint64u r64;
   unsigned int index1_32 = 0x4621d373, charSetLen = 36, chars;
   char charSet[] = "0123456789abcdefghijklmnopqrstuvwxyz";
   char pw[32];

   scanf("%u", &chars);
   if (chars > 31)
   {
      chars = 31;
   }
   r64.lo32 = index1_32;
   for (int a = 0; a < chars; a++)
   {
      r64.u64 = (uint64) r64.lo32 * charSetLen;
      pw[a] = charSet[r64.hi32];
   }
   pw[chars] = 0;
   printf("\"%s\"\n", pw);
   return 0;
}

g++ -S -O3 main.cpp
Code:
...
L3:
   movl   -32(%ebp), %eax
   mull   %edi
   movl   %eax, -32(%ebp)
   movl   %edx, -28(%ebp)
   movl   -28(%ebp), %eax
   movzbl   -101(%ebp,%eax), %eax
   movb   %al, (%ebx,%ecx)
   addl   $1, %ecx
   cmpl   %ecx, %esi
   ja   L3
...

_________________
http://www.tobtu.com/


Top
 Profile  
 
 Post subject: Re: Compilers
PostPosted: 10 Nov 2010, 05:26 
Offline
MΩth √G∑∏∫∪≤

Joined: 03 Dec 2007, 11:37
Posts: 1059
OK I guess I have no clue how to test for things during compiling.

So how do you test for:
* Big or little endian
* x86 vs non-x86, #ifdef __i386__ ?
* 64 bit x86 vs 32 bit x86, #ifdef _LP64 ?

_________________
http://www.tobtu.com/


Top
 Profile  
 
 Post subject: Re: Compilers
PostPosted: 10 Nov 2010, 05:30 
Offline
Total Hash Enlightenment

Joined: 15 Jul 2009, 22:38
Posts: 1483
Location: Dallas, TX, USA
[quote="Sc00bz"]

Code:
//#ifdef BIG_ENDIAN // TODO find how to test if compiled for big or little endian
//      uint32 hi32, lo32; // Hmm apparently BIG_ENDIAN is 4321 on some systems
//#else
      uint32 lo32, hi32;
//#endif


Doing it during runtime is trivial but during build time is a bit more of a pain. This poor hack is something I came up with a while back but I dropped the big endian thing for a while. This is if you want to do it at build time and you don't want to do the whole configure/autoconf thing. Feel free to consider this file to be in the public domain with no (C) attribution necessary.


Attachments:
File comment: endian testing at build time
endian.hpp [1.89 KiB]
Downloaded 225 times
Top
 Profile  
 
 Post subject: Re: Compilers
PostPosted: 10 Nov 2010, 05:33 
Offline
Total Hash Enlightenment

Joined: 15 Jul 2009, 22:38
Posts: 1483
Location: Dallas, TX, USA
Sc00bz wrote:
OK I guess I have no clue how to test for things during compiling.

So how do you test for:
* Big or little endian
* x86 vs non-x86, #ifdef __i386__ ?
* 64 bit x86 vs 32 bit x86, #ifdef _LP64 ?


Endianess is ugly and in another post.

For VC++/gcc (and some others):

64-bit
VC++: #ifdef _M_X64
gcc: #ifdef __x86_64__

32-bit
VC++: #ifdef _M_IX86
gcc: #ifdef __i386__


Top
 Profile  
 
 Post subject: Re: Compilers
PostPosted: 10 Nov 2010, 05:39 
Offline
Total Hash Enlightenment

Joined: 15 Jul 2009, 22:38
Posts: 1483
Location: Dallas, TX, USA
Sc00bz wrote:
So how do you test for:
* x86 vs non-x86, #ifdef __i386__ ?
* 64 bit x86 vs 32 bit x86, #ifdef _LP64 ?


_LP64 is VC++ only I think (maybe others?) but includes things like itanium and x86_64 which is kind of great hilarity.

For the x86/x86_64 split see global.h from rcracki_mt. You only really have to watch your use of size_t, time_t, long, etc. I posted about the whole long, etc. 32/64 hilarity in the never ending CUDA thread I believe.


Top
 Profile  
 
 Post subject: Re: Compilers
PostPosted: 10 Nov 2010, 05:42 
Offline
MΩth √G∑∏∫∪≤

Joined: 03 Dec 2007, 11:37
Posts: 1059
Awesome thanks a bunch!

_________________
http://www.tobtu.com/


Top
 Profile  
 
 Post subject: Re: Compilers
PostPosted: 10 Nov 2010, 05:45 
Offline
Total Hash Enlightenment

Joined: 15 Jul 2009, 22:38
Posts: 1483
Location: Dallas, TX, USA
Sc00bz wrote:
Apparently GCC understands "c64 = (uint64) a32 * b32;" as just one instruction 32 bit multiply "mull."


It should if a32 and b32 are both 32-bit. In c both * and type casts have the same precedence. In c++ if you use items like dynamic_cast, static_cast, etc. they have higher precedence than *.


Top
 Profile  
 
 Post subject: Re: Compilers
PostPosted: 15 Nov 2010, 02:33 
Offline
Total Hash Enlightenment

Joined: 15 Jul 2009, 22:38
Posts: 1483
Location: Dallas, TX, USA
Sc00bz wrote:
Then I replaced the inline assembly with:
Code:
__asm__ __volatile__ ("nop"); // marker
nTemp = nIndexOfX32 % m_vCharset[j].m_nPlainCharsetLen;
nIndexOfX32 /= m_vCharset[j].m_nPlainCharsetLen;
__asm__ __volatile__ ("nop"); // marker
m_Plain[i] = m_vCharset[j].m_PlainCharset[nTemp];

and compiled to this which is the same except it has two less movs and two movs using registers instead of memory:
Code:
...
.L93:
#APP
   nop
#NO_APP
   xorl   %edx, %edx
   movl   %r8d, %eax
   divl   256(%r9)
   movl   %eax, %r8d
#APP
   nop
#NO_APP
   mov   %edx, %edx
   movslq   %r11d,%rcx
   movzbl   (%r9,%rdx), %eax
   movb   %al, 16(%rcx,%rbx)
   jmp   .L88
.L71:
...


Which version of gcc? Also, temp registers to avoid data dependency stalls in the pipeline like r8, r9, etc. are used a lot on x86_64. That won't help the win32 or i686 users. Also, as much as I use to put faith into gcc -S...you can't on x86/x86_64 not with all the micro-ops and re-ordering it does at execution. While that code would seem to be optimal when you compile with -g and run the thing through gdb and then disas once you get to the relevant break point, the picture isn't so rosy.

Sc00bz wrote:
At the very least change it to:
Code:
__asm__ __volatile__ ("xor %%edx, %%edx\n\t"
                      "divl %3"
                      : "=a"(nIndexOfX32), "=d"(nTemp)
                      : "a"(nIndexOfX32), "rm"(m_vCharset[j].m_nPlainCharsetLen)
                      : );

Which compiles to only one extra register mov which doesn't make sense to me... you'll see what I mean:
Code:
.L93:
   movl   %r8d, %eax
#APP
   xor %edx, %edx
   divl 256(%r9)
#NO_APP
   mov   %edx, %edx  <---- ?
   movl   %eax, %r8d
   movslq   %r11d,%rcx
   movzbl   (%r9,%rdx), %eax
   movb   %al, 16(%rcx,%rbx)
   jmp   .L88
.L71:


That does look nicer and is easier to read. I benchmarked the changes on x86_64 linux and could find no measurable difference running single thread runs against all hash algorithms with hashes designed not to be found so it had to go through the entire set. Though, not all table sets of course. I did multiple runs of the old version and the new version and averaged the runs. I repeated the previous mentioned tests with the straight c non-asm version and suffered 20% or worse execution times across every algorithm and every single run.

That mov %edx, %edx it threw in might be some sort of pipeline data dependency nop or cache alignment fu that we don't grok.

I did think your changes would give a boost which is why I ran multiple tests because I was boggled that it didn't. Especially, on the input side changing "m"(m_vCharset[j].m_nPlainCharsetLen) to "rm"(m_vCharset[j].m_nPlainCharsetLen) on x86_64 should have given it more register flexibility tho GPRs on x86 is such a misnomer since divide needs edx:eax or rdx:rax so perhaps I shouldn't have expected a change.

I haven't tried this on i686 builds but I have no hope of it helping there if it didn't help on x86_64. Also, how in the world do you convert such proper, strict, and nice AT&T asm syntax to the equiv Intel x86? Are you always at the whim of the assembler for hints about registers, memory, etc?


Top
 Profile  
 
 Post subject: Re: Compilers
PostPosted: 15 Nov 2010, 05:27 
Offline
MΩth √G∑∏∫∪≤

Joined: 03 Dec 2007, 11:37
Posts: 1059
quel wrote:
Which version of gcc?

g++ (GCC) 4.1.2 20080704

quel wrote:
Also, how in the world do you convert such proper, strict, and nice AT&T asm syntax to the equiv Intel x86?

AT&T syntax has the inputs reverse order of Intel syntax: "movl %eax,%ebx" vs "mov ebx,eax" both are doing "ebx = eax;"

For things where there are no registers to tell the assembler the size like "divl 256(%r9)" size is only known because of the L in divL but in Intel it's "div DWORD PTR [r9+256]" (there's BYTE, WORD, DWORD, and QWORD vs b, w, L, and LL).

Memory addresses "segment_register:offset(base_register,index_register,scale)" vs "[segment_register:base_register+scale*index_register+offset]"

Almost forgot immediate values "movl $0xff,%eax" vs "mov eax,ffh"

quel wrote:
Are you always at the whim of the assembler for hints about registers, memory, etc?

For GCC you can say exactly which register a, b, c, d, D, and S (case sensitive) for ?ax, ?bx, ?cx, ?dx, ?di, and ?si (I don't know if you can tell it to use r8 to r15) or you can use g?... (haha just looked this one up it's the exact opposite of what I thought g is anything that is not a general purpose register).
Go here for more info on the constraints for inputs and outputs: http://www.ibiblio.org/gferg/ldp/GCC-In ... TO.html#s6

For VC++ you just pick them and the compiler works around you, but sometimes it says things like warning ebx is used in your inline assembly blah blah.

_________________
http://www.tobtu.com/


Top
 Profile  
 
 Post subject: Re: Compilers
PostPosted: 15 Nov 2010, 07:32 
Offline
Total Hash Enlightenment

Joined: 15 Jul 2009, 22:38
Posts: 1483
Location: Dallas, TX, USA
Sc00bz wrote:
g++ (GCC) 4.1.2 20080704


You really should upgrade my "ancient" debian stable is 4.3.2 (4.3.x, 4.4.x, and 4.5.x all have marked improvement in the optimizations.)

Sc00bz wrote:
AT&T syntax has the inputs reverse order of Intel syntax: "movl %eax,%ebx" vs "mov ebx,eax" both are doing "ebx = eax;"


That part I remember - that asm is right above the AT&T in the code and if this was irc I'd smack you with a trout :P

Sc00bz wrote:
For things where there are no registers to tell the assembler the size like "divl 256(%r9)" size is only known because of the L in divL but in Intel it's "div DWORD PTR [r9+256]" (there's BYTE, WORD, DWORD, and QWORD vs b, w, L, and LL).


Oh ya I don't miss that sort of syntax DWORD PTR (I did have to do intel x86 assembly for my CS degree a while ago guh the horror lives on.)

Sc00bz wrote:
For GCC you can say exactly which register a, b, c, d, D, and S (case sensitive) for ?ax, ?bx, ?cx, ?dx, ?di, and ?si (I don't know if you can tell it to use r8 to r15) or you can use g?... (haha just looked this one up it's the exact opposite of what I thought g is anything that is not a general purpose register).
Go here for more info on the constraints for inputs and outputs: http://www.ibiblio.org/gferg/ldp/GCC-In ... TO.html#s6


Just plain r is any GPR I believe. r8 to r15 are only 20+ years late. I think MIPS, PPC, and SPARC64 all have 32+ GPRs. Also, their GPR really is...we still have all the legacy of fighting for eax/ebx/ecx/edx and now rax/rbx/rcx/rdx as hard coded registers for some opcodes. Register starvation is just annoying as it makes me think of the Celeron...lets just starve it of L2 cache and laugh.

Ah yes I looked up the SPARC64...160 GPRs though only 32 visible at a time (and only 31 usable) intended for rapid context switching as you don't have to push/pop the stack. Apparently IA-64 implemented this too. This also reminds that this is why amd64/em64t went +8 instead of +24 because every additional register means a bigger penalty on context switches for the x86/x86_64 arch.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC + 1 hour [ DST ]


Who is online

Users browsing this forum: Google [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group