public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
@ 2012-04-27  3:42 adam at consulting dot net.nz
  2012-04-27  9:15 ` [Bug target/53133] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: adam at consulting dot net.nz @ 2012-04-27  3:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133

             Bug #: 53133
           Summary: XOR AL,AL to zero lower 8 bits of EAX/RAX causes
                    partial register stall (Intel Core 2)
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: adam@consulting.net.nz


Processor is Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz

#include <stdint.h>
#include <stdio.h>

uint32_t mem = 0;

int main(void) {
  uint64_t sum=0;
  for (uint32_t i=3000000000; i>0; --i) {
    asm volatile ("" : : : "memory"); //load data from memory each time
    uint64_t data = mem;

    //partial register stall
    sum += (data & UINT64_C(0xFFFFFFFFFFFFFF00)) >> 2;

    //no partial register stall
    //sum += (data >> 2) & UINT64_C(0xFFFFFFFFFFFFFFC0);
  }
  printf("sum is %llu\n", sum);
}

$ gcc-4.7 -O3 -std=gnu99 partial_register_stall.c && time ./a.out 
sum is 0

real    0m4.504s
user    0m4.500s
sys    0m0.000s

Each loop iteration is 4.5 cycles.

Relevant assembly code:

  400410:       8b 05 ee 04 20 00       mov    eax,DWORD PTR [rip+0x2004ee]    
   # 600904 <mem>
  400416:       30 c0                   xor    al,al
  400418:       48 c1 e8 02             shr    rax,0x2
  40041c:       48 01 c6                add    rsi,rax
  40041f:       83 ea 01                sub    edx,0x1
  400422:       75 ec                   jne    400410 <main+0x10>

mem is zero-extended into RAX. The lower 8 bits of RAX are zeroed via XOR AL,
AL. The result is shifted down by two.

An equivalent way of computing this is to first shift down by two and then mask
the lower six bits to zero. That is, replace the line:
   sum += (data & UINT64_C(0xFFFFFFFFFFFFFF00)) >> 2;
with:
   sum += (data >> 2) & UINT64_C(0xFFFFFFFFFFFFFFC0);

$ gcc-4.7 -O3 -std=gnu99 partial_register_stall.c && time ./a.out 
sum is 0

real    0m2.002s
user    0m2.000s
sys    0m0.000s

Each loop iteration is now 2 cycles.

Relevant assembly code:

  400410:       8b 05 fe 04 20 00       mov    eax,DWORD PTR [rip+0x2004fe]    
   # 600914 <mem>
  400416:       48 c1 e8 02             shr    rax,0x2
  40041a:       48 83 e0 c0             and    rax,0xffffffffffffffc0
  40041e:       48 01 c6                add    rsi,rax
  400421:       83 ea 01                sub    edx,0x1
  400424:       75 ea                   jne    400410 <main+0x10>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/53133] XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
  2012-04-27  3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
@ 2012-04-27  9:15 ` rguenth at gcc dot gnu.org
  2012-04-30 13:20 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-04-27  9:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-04-27
     Ever Confirmed|0                           |1
      Known to fail|                            |4.3.6, 4.6.2, 4.7.0

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-04-27 09:14:45 UTC ---
Confirmed.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/53133] XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
  2012-04-27  3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
  2012-04-27  9:15 ` [Bug target/53133] " rguenth at gcc dot gnu.org
@ 2012-04-30 13:20 ` ubizjak at gmail dot com
  2012-04-30 13:27 ` hjl.tools at gmail dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2012-04-30 13:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133

--- Comment #2 from Uros Bizjak <ubizjak at gmail dot com> 2012-04-30 13:19:56 UTC ---
This is due to following splitter in i386.md:

(define_split
  [(set (match_operand 0 "ext_register_operand")
    (and (match_dup 0)
         (const_int -256)))
   (clobber (reg:CC FLAGS_REG))]
  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
   && reload_completed"
  [(set (strict_low_part (match_dup 1)) (const_int 0))]
  "operands[1] = gen_lowpart (QImode, operands[0]);")

However, Core architecture is not listed under X86_TUNE_PARTIAL_REG_STALL,
although my documentation says that following latency should be added due to
partial reg stall:

PPro, P2, P3  : 5
Core          : 1-5
Core2, Corei7 : 1-6

H.J., should we consider these processors as affected by partial reg stall?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/53133] XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
  2012-04-27  3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
  2012-04-27  9:15 ` [Bug target/53133] " rguenth at gcc dot gnu.org
  2012-04-30 13:20 ` ubizjak at gmail dot com
@ 2012-04-30 13:27 ` hjl.tools at gmail dot com
  2012-05-01 16:42 ` hjl.tools at gmail dot com
  2021-08-15  5:20 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: hjl.tools at gmail dot com @ 2012-04-30 13:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133

--- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> 2012-04-30 13:26:32 UTC ---
(In reply to comment #2)
> 
> H.J., should we consider these processors as affected by partial reg stall?

We will investigate.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/53133] XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
  2012-04-27  3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
                   ` (2 preceding siblings ...)
  2012-04-30 13:27 ` hjl.tools at gmail dot com
@ 2012-05-01 16:42 ` hjl.tools at gmail dot com
  2021-08-15  5:20 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: hjl.tools at gmail dot com @ 2012-05-01 16:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133

--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> 2012-05-01 16:42:32 UTC ---
(In reply to comment #2)
> However, Core architecture is not listed under X86_TUNE_PARTIAL_REG_STALL,
> although my documentation says that following latency should be added due to
> partial reg stall:
> 
> PPro, P2, P3  : 5
> Core          : 1-5
> Core2, Corei7 : 1-6
> 
> H.J., should we consider these processors as affected by partial reg stall?

8bit/16bit load ops need to save and restore the upper bits when
updating the lower 8bits/16bits.  They are expensive ops on Intel
Core, Core 2 and Core i7 processors.  We will check the overall
impact of X86_TUNE_PARTIAL_REG_STALL on Core i7.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/53133] XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
  2012-04-27  3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
                   ` (3 preceding siblings ...)
  2012-05-01 16:42 ` hjl.tools at gmail dot com
@ 2021-08-15  5:20 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-15  5:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |5.5.0
             Status|NEW                         |RESOLVED
      Known to work|                            |6.1.0
   Target Milestone|---                         |6.0
           Keywords|                            |missed-optimization
         Resolution|---                         |FIXED

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
.L2:
        movl    mem(%rip), %eax
        shrq    $2, %rax
        andq    %rcx, %rax
        addq    %rax, %rsi
        subl    $1, %edx
        jne     .L2

So both versions now match up because of r6-3841.

So closing as fixed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-15  5:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-27  3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
2012-04-27  9:15 ` [Bug target/53133] " rguenth at gcc dot gnu.org
2012-04-30 13:20 ` ubizjak at gmail dot com
2012-04-30 13:27 ` hjl.tools at gmail dot com
2012-05-01 16:42 ` hjl.tools at gmail dot com
2021-08-15  5:20 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).