public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/50984] New: Boolean return value expression clears register too often
@ 2011-11-03 18:09 drepper.fsp at gmail dot com
2011-11-03 18:11 ` [Bug target/50984] " pinskia at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: drepper.fsp at gmail dot com @ 2011-11-03 18:09 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50984
Bug #: 50984
Summary: Boolean return value expression clears register too
often
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: drepper.fsp@gmail.com
Target: x86_64-linux
Compile this code with the current HEAD gcc (or 4.5, I tried that as well) and
you see less than optimal code:
int
f(int a, int b)
{
return a & 8 && b & 4;
}
For x86-64 I see this asm code:
xorl %eax, %eax
andl $8, %edi
je .L2
xorl %eax, %eax <----- Unnecessary !!!
andl $4, %esi
setne %al
.L2:
rep
ret
The compiler should realize that the second xor is unnecessary.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/50984] Boolean return value expression clears register too often
2011-11-03 18:09 [Bug tree-optimization/50984] New: Boolean return value expression clears register too often drepper.fsp at gmail dot com
@ 2011-11-03 18:11 ` pinskia at gcc dot gnu.org
2011-11-03 20:01 ` [Bug rtl-optimization/50984] " ubizjak at gmail dot com
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-11-03 18:11 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50984
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Component|tree-optimization |target
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-11-03 18:11:22 UTC ---
IIRC this is a target issue.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/50984] Boolean return value expression clears register too often
2011-11-03 18:09 [Bug tree-optimization/50984] New: Boolean return value expression clears register too often drepper.fsp at gmail dot com
2011-11-03 18:11 ` [Bug target/50984] " pinskia at gcc dot gnu.org
@ 2011-11-03 20:01 ` ubizjak at gmail dot com
2011-11-05 17:05 ` svfuerst at gmail dot com
2021-08-18 5:33 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: ubizjak at gmail dot com @ 2011-11-03 20:01 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50984
Uros Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2011-11-03
Component|target |rtl-optimization
Ever Confirmed|0 |1
--- Comment #2 from Uros Bizjak <ubizjak at gmail dot com> 2011-11-03 20:01:14 UTC ---
(In reply to comment #1)
> IIRC this is a target issue.
Partially true....
Current tree generates:
xorl %eax, %eax # 50 *movdi_xor [length = 2]
andl $8, %edi # 55 *andsi_2/1 [length = 3]
je .L2 # 10 *jcc_1 [length = 2]
xorl %eax, %eax # 46 *movsi_xor [length = 2]
andl $4, %esi # 54 *andsi_2/1 [length = 3]
setne %al # 48 *setcc_qi_slp [length = 3]
.L2:
rep # 56 simple_return_internal_long [length = 2]
ret
The first XOR is in fact load of zero in DImode, the second XOR is load of zero
in SImode. This all happens in peephole2 pass, converting
5 ax:SI=0
REG_EQUAL: 0
...
40 ax:QI=flags:CCZ!=0
REG_DEAD: flags:CCZ
41 ax:SI=zero_extend(ax:QI)
to
50 {ax:DI=0;clobber flags:CC;}
...
46 {ax:SI=0;clobber flags:CC;}
49 {flags:CCZ=cmp(si:QI&0x4,0);si:QI=si:QI&0x4;}
48 strict_low_part=flags:CCZ!=0
We can perform both clears in SImode, as with following patch:
--cut here--
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md (revision 180840)
+++ config/i386/i386.md (working copy)
@@ -17331,7 +17331,7 @@
&& peep2_regno_dead_p (0, FLAGS_REG)"
[(parallel [(set (match_dup 0) (const_int 0))
(clobber (reg:CC FLAGS_REG))])]
- "operands[0] = gen_lowpart (word_mode, operands[0]);")
+ "operands[0] = gen_lowpart (SImode, operands[0]);")
(define_peephole2
[(set (strict_low_part (match_operand 0 "register_operand" ""))
--cut here--
This results in the same assembly, but _.202r.peephole2 dump is now:
50 {ax:SI=0;clobber flags:CC;}
...
46 {ax:SI=0;clobber flags:CC;}
49 {flags:CCZ=cmp(si:QI&0x4,0);si:QI=si:QI&0x4;}
48 strict_low_part=flags:CCZ!=0
I'd expect that CE3 pass that follows peephole2 pass will eliminate (insn 46),
but for some reason this doesn't happen.
Confirmed as rtl-optimization issue.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/50984] Boolean return value expression clears register too often
2011-11-03 18:09 [Bug tree-optimization/50984] New: Boolean return value expression clears register too often drepper.fsp at gmail dot com
2011-11-03 18:11 ` [Bug target/50984] " pinskia at gcc dot gnu.org
2011-11-03 20:01 ` [Bug rtl-optimization/50984] " ubizjak at gmail dot com
@ 2011-11-05 17:05 ` svfuerst at gmail dot com
2021-08-18 5:33 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: svfuerst at gmail dot com @ 2011-11-05 17:05 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50984
Steven Fuerst <svfuerst at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |svfuerst at gmail dot com
--- Comment #3 from Steven Fuerst <svfuerst at gmail dot com> 2011-11-05 17:05:03 UTC ---
It can be done with zero clears of %eax via the magic of the lea instruction:
andl $8, %edi
andl $4, %esi
leal (%edi, %esi, 2), %eax
shr $4, %eax
ret
However, if the first condition is usually false, this method will be slower.
On the other hand, this version can be easily manipulated to have less register
pressure. (So it may always be better when the function is inevitably
inlined.)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/50984] Boolean return value expression clears register too often
2011-11-03 18:09 [Bug tree-optimization/50984] New: Boolean return value expression clears register too often drepper.fsp at gmail dot com
` (2 preceding siblings ...)
2011-11-05 17:05 ` svfuerst at gmail dot com
@ 2021-08-18 5:33 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-18 5:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50984
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Known to fail| |4.7.4
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So after r0-118261 (this just happen to have that side effect but it is a good
side effect that way), uncprop can happen and we can produce:
_3 = a_2(D) & 8;
if (_3 != 0)
goto <bb 3>;
else
goto <bb 4>;
<bb 3>:
_5 = b_4(D) & 4;
_1 = _5 != 0;
_6 = (int) _1;
<bb 4>:
# prephitmp_8 = PHI <_3(2), _6(3)>
Rather than having _3 in the PHI being 0.
And that is able to remove the extra xor.
As mentioned we should do better here though.
unsigned
f1(unsigned a, unsigned b)
{
return (((a & 8)>>3) & ((b & 4)>>2));
}
Is the best.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-08-18 5:33 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-03 18:09 [Bug tree-optimization/50984] New: Boolean return value expression clears register too often drepper.fsp at gmail dot com
2011-11-03 18:11 ` [Bug target/50984] " pinskia at gcc dot gnu.org
2011-11-03 20:01 ` [Bug rtl-optimization/50984] " ubizjak at gmail dot com
2011-11-05 17:05 ` svfuerst at gmail dot com
2021-08-18 5:33 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).