public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
@ 2012-04-27  3:42 adam at consulting dot net.nz
  2012-04-27  9:15 ` [Bug target/53133] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: adam at consulting dot net.nz @ 2012-04-27  3:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133

             Bug #: 53133
           Summary: XOR AL,AL to zero lower 8 bits of EAX/RAX causes
                    partial register stall (Intel Core 2)
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: adam@consulting.net.nz


Processor is Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz

#include <stdint.h>
#include <stdio.h>

uint32_t mem = 0;

int main(void) {
  uint64_t sum=0;
  for (uint32_t i=3000000000; i>0; --i) {
    asm volatile ("" : : : "memory"); //load data from memory each time
    uint64_t data = mem;

    //partial register stall
    sum += (data & UINT64_C(0xFFFFFFFFFFFFFF00)) >> 2;

    //no partial register stall
    //sum += (data >> 2) & UINT64_C(0xFFFFFFFFFFFFFFC0);
  }
  printf("sum is %llu\n", sum);
}

$ gcc-4.7 -O3 -std=gnu99 partial_register_stall.c && time ./a.out 
sum is 0

real    0m4.504s
user    0m4.500s
sys    0m0.000s

Each loop iteration is 4.5 cycles.

Relevant assembly code:

  400410:       8b 05 ee 04 20 00       mov    eax,DWORD PTR [rip+0x2004ee]    
   # 600904 <mem>
  400416:       30 c0                   xor    al,al
  400418:       48 c1 e8 02             shr    rax,0x2
  40041c:       48 01 c6                add    rsi,rax
  40041f:       83 ea 01                sub    edx,0x1
  400422:       75 ec                   jne    400410 <main+0x10>

mem is zero-extended into RAX. The lower 8 bits of RAX are zeroed via XOR AL,
AL. The result is shifted down by two.

An equivalent way of computing this is to first shift down by two and then mask
the lower six bits to zero. That is, replace the line:
   sum += (data & UINT64_C(0xFFFFFFFFFFFFFF00)) >> 2;
with:
   sum += (data >> 2) & UINT64_C(0xFFFFFFFFFFFFFFC0);

$ gcc-4.7 -O3 -std=gnu99 partial_register_stall.c && time ./a.out 
sum is 0

real    0m2.002s
user    0m2.000s
sys    0m0.000s

Each loop iteration is now 2 cycles.

Relevant assembly code:

  400410:       8b 05 fe 04 20 00       mov    eax,DWORD PTR [rip+0x2004fe]    
   # 600914 <mem>
  400416:       48 c1 e8 02             shr    rax,0x2
  40041a:       48 83 e0 c0             and    rax,0xffffffffffffffc0
  40041e:       48 01 c6                add    rsi,rax
  400421:       83 ea 01                sub    edx,0x1
  400424:       75 ea                   jne    400410 <main+0x10>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-15  5:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-27  3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
2012-04-27  9:15 ` [Bug target/53133] " rguenth at gcc dot gnu.org
2012-04-30 13:20 ` ubizjak at gmail dot com
2012-04-30 13:27 ` hjl.tools at gmail dot com
2012-05-01 16:42 ` hjl.tools at gmail dot com
2021-08-15  5:20 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).