public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
@ 2012-04-27 3:42 adam at consulting dot net.nz
2012-04-27 9:15 ` [Bug target/53133] " rguenth at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: adam at consulting dot net.nz @ 2012-04-27 3:42 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133
Bug #: 53133
Summary: XOR AL,AL to zero lower 8 bits of EAX/RAX causes
partial register stall (Intel Core 2)
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: adam@consulting.net.nz
Processor is Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
#include <stdint.h>
#include <stdio.h>
uint32_t mem = 0;
int main(void) {
uint64_t sum=0;
for (uint32_t i=3000000000; i>0; --i) {
asm volatile ("" : : : "memory"); //load data from memory each time
uint64_t data = mem;
//partial register stall
sum += (data & UINT64_C(0xFFFFFFFFFFFFFF00)) >> 2;
//no partial register stall
//sum += (data >> 2) & UINT64_C(0xFFFFFFFFFFFFFFC0);
}
printf("sum is %llu\n", sum);
}
$ gcc-4.7 -O3 -std=gnu99 partial_register_stall.c && time ./a.out
sum is 0
real 0m4.504s
user 0m4.500s
sys 0m0.000s
Each loop iteration is 4.5 cycles.
Relevant assembly code:
400410: 8b 05 ee 04 20 00 mov eax,DWORD PTR [rip+0x2004ee]
# 600904 <mem>
400416: 30 c0 xor al,al
400418: 48 c1 e8 02 shr rax,0x2
40041c: 48 01 c6 add rsi,rax
40041f: 83 ea 01 sub edx,0x1
400422: 75 ec jne 400410 <main+0x10>
mem is zero-extended into RAX. The lower 8 bits of RAX are zeroed via XOR AL,
AL. The result is shifted down by two.
An equivalent way of computing this is to first shift down by two and then mask
the lower six bits to zero. That is, replace the line:
sum += (data & UINT64_C(0xFFFFFFFFFFFFFF00)) >> 2;
with:
sum += (data >> 2) & UINT64_C(0xFFFFFFFFFFFFFFC0);
$ gcc-4.7 -O3 -std=gnu99 partial_register_stall.c && time ./a.out
sum is 0
real 0m2.002s
user 0m2.000s
sys 0m0.000s
Each loop iteration is now 2 cycles.
Relevant assembly code:
400410: 8b 05 fe 04 20 00 mov eax,DWORD PTR [rip+0x2004fe]
# 600914 <mem>
400416: 48 c1 e8 02 shr rax,0x2
40041a: 48 83 e0 c0 and rax,0xffffffffffffffc0
40041e: 48 01 c6 add rsi,rax
400421: 83 ea 01 sub edx,0x1
400424: 75 ea jne 400410 <main+0x10>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/53133] XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
2012-04-27 3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
@ 2012-04-27 9:15 ` rguenth at gcc dot gnu.org
2012-04-30 13:20 ` ubizjak at gmail dot com
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-04-27 9:15 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-04-27
Ever Confirmed|0 |1
Known to fail| |4.3.6, 4.6.2, 4.7.0
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-04-27 09:14:45 UTC ---
Confirmed.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/53133] XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
2012-04-27 3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
2012-04-27 9:15 ` [Bug target/53133] " rguenth at gcc dot gnu.org
@ 2012-04-30 13:20 ` ubizjak at gmail dot com
2012-04-30 13:27 ` hjl.tools at gmail dot com
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2012-04-30 13:20 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133
--- Comment #2 from Uros Bizjak <ubizjak at gmail dot com> 2012-04-30 13:19:56 UTC ---
This is due to following splitter in i386.md:
(define_split
[(set (match_operand 0 "ext_register_operand")
(and (match_dup 0)
(const_int -256)))
(clobber (reg:CC FLAGS_REG))]
"(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
&& reload_completed"
[(set (strict_low_part (match_dup 1)) (const_int 0))]
"operands[1] = gen_lowpart (QImode, operands[0]);")
However, Core architecture is not listed under X86_TUNE_PARTIAL_REG_STALL,
although my documentation says that following latency should be added due to
partial reg stall:
PPro, P2, P3 : 5
Core : 1-5
Core2, Corei7 : 1-6
H.J., should we consider these processors as affected by partial reg stall?
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/53133] XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
2012-04-27 3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
2012-04-27 9:15 ` [Bug target/53133] " rguenth at gcc dot gnu.org
2012-04-30 13:20 ` ubizjak at gmail dot com
@ 2012-04-30 13:27 ` hjl.tools at gmail dot com
2012-05-01 16:42 ` hjl.tools at gmail dot com
2021-08-15 5:20 ` pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: hjl.tools at gmail dot com @ 2012-04-30 13:27 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133
--- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> 2012-04-30 13:26:32 UTC ---
(In reply to comment #2)
>
> H.J., should we consider these processors as affected by partial reg stall?
We will investigate.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/53133] XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
2012-04-27 3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
` (2 preceding siblings ...)
2012-04-30 13:27 ` hjl.tools at gmail dot com
@ 2012-05-01 16:42 ` hjl.tools at gmail dot com
2021-08-15 5:20 ` pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: hjl.tools at gmail dot com @ 2012-05-01 16:42 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133
--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> 2012-05-01 16:42:32 UTC ---
(In reply to comment #2)
> However, Core architecture is not listed under X86_TUNE_PARTIAL_REG_STALL,
> although my documentation says that following latency should be added due to
> partial reg stall:
>
> PPro, P2, P3 : 5
> Core : 1-5
> Core2, Corei7 : 1-6
>
> H.J., should we consider these processors as affected by partial reg stall?
8bit/16bit load ops need to save and restore the upper bits when
updating the lower 8bits/16bits. They are expensive ops on Intel
Core, Core 2 and Core i7 processors. We will check the overall
impact of X86_TUNE_PARTIAL_REG_STALL on Core i7.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/53133] XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2)
2012-04-27 3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
` (3 preceding siblings ...)
2012-05-01 16:42 ` hjl.tools at gmail dot com
@ 2021-08-15 5:20 ` pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-15 5:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53133
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to fail| |5.5.0
Status|NEW |RESOLVED
Known to work| |6.1.0
Target Milestone|--- |6.0
Keywords| |missed-optimization
Resolution|--- |FIXED
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
.L2:
movl mem(%rip), %eax
shrq $2, %rax
andq %rcx, %rax
addq %rax, %rsi
subl $1, %edx
jne .L2
So both versions now match up because of r6-3841.
So closing as fixed.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-08-15 5:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-27 3:42 [Bug tree-optimization/53133] New: XOR AL,AL to zero lower 8 bits of EAX/RAX causes partial register stall (Intel Core 2) adam at consulting dot net.nz
2012-04-27 9:15 ` [Bug target/53133] " rguenth at gcc dot gnu.org
2012-04-30 13:20 ` ubizjak at gmail dot com
2012-04-30 13:27 ` hjl.tools at gmail dot com
2012-05-01 16:42 ` hjl.tools at gmail dot com
2021-08-15 5:20 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).