public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0)
@ 2022-04-21 17:47 denis.campredon at gmail dot com
2022-04-22 6:13 ` [Bug rtl-optimization/105338] [12 Regression] " rguenth at gcc dot gnu.org
` (14 more replies)
0 siblings, 15 replies; 16+ messages in thread
From: denis.campredon at gmail dot com @ 2022-04-21 17:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
Bug ID: 105338
Summary: Regression: jump or cmove generated for pattern (x ?
CST : 0)
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: denis.campredon at gmail dot com
Target Milestone: ---
int f(int i) {
return i ? 5 : 0;
}
int g(int i) {
return i ? -2 : 0;
}
int h(int b) {
return !!b * -2;
}
int i(int b) {
return !!b * 5;
}
int j(int b) {
if (!b) return 0;
return -2;
}
-------------
With -02 gcc 11.2 the five functions above output branchless code like:
f(int):
neg edi
sbb eax, eax
and eax, 5
ret
g(int):
neg edi
sbb eax, eax
and eax, -2
ret
...
--------------
Whereas -02 gcc 12 now outputs a cmov or jump depending of sign of the
constant:
f(int):
mov eax, edi
test edi, edi
mov edx, 5
cmovne eax, edx
ret
g(int):
mov eax, edi
test edi, edi
jne .L11
ret
.L11:
mov eax, -2
ret
h(int):
mov eax, edi
test edi, edi
jne .L17
ret
.L17:
mov eax, -2
ret
i(int):
mov eax, edi
test edi, edi
mov edx, 5
cmovne eax, edx
ret
j(int):
mov eax, edi
test edi, edi
mov edx, -2
cmovne eax, edx
ret
---------------------------
Alternatively, with the following code
int k(int b) {
bool b2 = b;
return b2 * 5;
}
----------------
Both gcc 12 and 11.2 are outputing
k(int):
xor eax, eax
test edi, edi
setne al
lea eax, [rax+rax*4]
ret
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
@ 2022-04-22 6:13 ` rguenth at gcc dot gnu.org
2022-04-22 9:17 ` jakub at gcc dot gnu.org
` (13 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22 6:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Regression: jump or cmove |[12 Regression] Regression:
|generated for pattern (x ? |jump or cmove generated for
|CST : 0) |pattern (x ? CST : 0)
Target Milestone|--- |12.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
2022-04-22 6:13 ` [Bug rtl-optimization/105338] [12 Regression] " rguenth at gcc dot gnu.org
@ 2022-04-22 9:17 ` jakub at gcc dot gnu.org
2022-04-22 9:48 ` pinskia at gcc dot gnu.org
` (12 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 9:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|needs-bisection |
CC| |jakub at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with r12-7687-g3a7ba8fd0cda387809e4902328af2473662b6a4a
All the *.optimized dump differences with that change look like:
@@ -8,14 +8,14 @@ int f (int i)
<bb 2> [local count: 1073741824]:
if (i_2(D) != 0)
- goto <bb 4>; [35.00%]
+ goto <bb 3>; [35.00%]
else
- goto <bb 3>; [65.00%]
+ goto <bb 4>; [65.00%]
- <bb 3> [local count: 697932184]:
+ <bb 3> [local count: 375809640]:
<bb 4> [local count: 1073741824]:
- # iftmp.0_1 = PHI <5(2), i_2(D)(3)>
+ # iftmp.0_1 = PHI <5(3), i_2(D)(2)>
return iftmp.0_1;
}
i.e. swapping whether true or false edge goes through the empty bb. Wonder why
we emit different code based on that...
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
2022-04-22 6:13 ` [Bug rtl-optimization/105338] [12 Regression] " rguenth at gcc dot gnu.org
2022-04-22 9:17 ` jakub at gcc dot gnu.org
@ 2022-04-22 9:48 ` pinskia at gcc dot gnu.org
2022-04-22 11:00 ` jakub at gcc dot gnu.org
` (11 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-04-22 9:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think this is the same issue as PR 105314 really.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (2 preceding siblings ...)
2022-04-22 9:48 ` pinskia at gcc dot gnu.org
@ 2022-04-22 11:00 ` jakub at gcc dot gnu.org
2022-04-22 11:11 ` rguenth at gcc dot gnu.org
` (10 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 11:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, if I change passes.def
NEXT_PASS (pass_sink_code, true /* unsplit edges */);
line
NEXT_PASS (pass_sink_code, false /* unsplit edges */);
I get back the 11.x code.
Before ce1 pass, in that case say f looks like:
8: flags:CCZ=cmp(r83:SI,0)
9: pc={(flags:CCZ!=0)?L22:pc}
REG_DEAD flags:CCZ
REG_BR_PROB 375809644
10: NOTE_INSN_BASIC_BLOCK 3
5: r82:SI=0
REG_DEAD r83:SI
; pc falls through to BB 5
22: L22:
21: NOTE_INSN_BASIC_BLOCK 4
4: r82:SI=0x5
15: L15:
18: NOTE_INSN_BASIC_BLOCK 5
use (r82)
i.e. essentially
if (tmp83 != 0)
tmp82 = 5;
else
tmp82 = 0;
But with vanilla trunk, it is instead:
7: flags:CCZ=cmp(r83:SI,0)
8: pc={(flags:CCZ==0)?L10:pc}
REG_DEAD flags:CCZ
REG_BR_PROB 697932188
9: NOTE_INSN_BASIC_BLOCK 4
4: r83:SI=0x5
10: L10:
11: NOTE_INSN_BASIC_BLOCK 5
use (r83)
i.e.
if (tmp83 != 0)
tmp83 = 5;
Before fwprop1, the code looks roughly the same except for the swapped
branches,
so in C
if (tmp83 != 0)
tmp82 = 5;
else
tmp82 = tmp83;
vs.
if (tmp83 == 0)
tmp82 = tmp83;
else
tmp82 = 5;
I think fwprop1 only works on extended basic blocks and therefore turns the
latter into
if (tmp83 == 0)
tmp82 = 0;
else
tmp82 = 5;
and not the former.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (3 preceding siblings ...)
2022-04-22 11:00 ` jakub at gcc dot gnu.org
@ 2022-04-22 11:11 ` rguenth at gcc dot gnu.org
2022-04-22 11:14 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22 11:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Last reconfirmed| |2022-04-22
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think none is more canonical so I wonder if there's an easy way to have ifcvt
recognize both? But yes, it's probably a duplicate, but at least for x86 and
with complete testcases.
Note one of the issues is that creating/removing forwarders can end up swapping
things. To avoid this (for GCC 13) we can try to not as aggressively remove
them (basically keep critical edges split) and maybe mark (temporarily) created
forwarders with a flag so that CFG cleanup can pick the "correct" one when
it is used as vehicle to remove them again (as in this case).
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (4 preceding siblings ...)
2022-04-22 11:11 ` rguenth at gcc dot gnu.org
@ 2022-04-22 11:14 ` rguenth at gcc dot gnu.org
2022-04-22 11:20 ` rguenth at gcc dot gnu.org
` (8 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22 11:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #3)
> Before fwprop1, the code looks roughly the same except for the swapped
> branches,
> so in C
> if (tmp83 != 0)
> tmp82 = 5;
> else
> tmp82 = tmp83;
> vs.
> if (tmp83 == 0)
> tmp82 = tmp83;
> else
> tmp82 = 5;
Specifically for this we could also look to canonicalize this in
uncprop which introduces the copies, making sure to rewrite this into
an equality compare?
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (5 preceding siblings ...)
2022-04-22 11:14 ` rguenth at gcc dot gnu.org
@ 2022-04-22 11:20 ` rguenth at gcc dot gnu.org
2022-04-22 12:28 ` jakub at gcc dot gnu.org
` (7 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22 11:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #5)
> (In reply to Jakub Jelinek from comment #3)
> > Before fwprop1, the code looks roughly the same except for the swapped
> > branches,
> > so in C
> > if (tmp83 != 0)
> > tmp82 = 5;
> > else
> > tmp82 = tmp83;
> > vs.
> > if (tmp83 == 0)
> > tmp82 = tmp83;
> > else
> > tmp82 = 5;
>
> Specifically for this we could also look to canonicalize this in
> uncprop which introduces the copies, making sure to rewrite this into
> an equality compare?
When un-propagating into a PHI in uncprop_into_successor_phis mark the
BB of the PHI node.
In uncprop_dom_walker::before_dom_children, when it is marked, match
a diamond/half-diamond with the immediate dominator and do the rewrite
of the conditional and swapping of edge flags.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (6 preceding siblings ...)
2022-04-22 11:20 ` rguenth at gcc dot gnu.org
@ 2022-04-22 12:28 ` jakub at gcc dot gnu.org
2022-04-22 13:59 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 12:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
--- gcc/config/i386/i386-expand.cc.jj 2022-04-13 15:42:39.000000000 +0200
+++ gcc/config/i386/i386-expand.cc 2022-04-22 14:18:27.347135185 +0200
@@ -3136,6 +3136,8 @@ ix86_expand_int_movcc (rtx operands[])
bool sign_bit_compare_p = false;
rtx op0 = XEXP (operands[1], 0);
rtx op1 = XEXP (operands[1], 1);
+ rtx op2 = operands[2];
+ rtx op3 = operands[3];
if (GET_MODE (op0) == TImode
|| (GET_MODE (op0) == DImode
@@ -3153,17 +3155,29 @@ ix86_expand_int_movcc (rtx operands[])
|| (op1 == constm1_rtx && (code == GT || code == LE)))
sign_bit_compare_p = true;
+ /* op0 == op1 ? op0 : op3 is equivalent to op0 == op1 ? op1 : op3,
+ but if op1 is a constant, the latter form allows more optimizations,
+ either through the last 2 ops being constant handling, or the one
+ constant and one variable cases. On the other side, for cmov the
+ former might be better as we don't need to load the constant into
+ another register. */
+ if (code == EQ && CONST_INT_P (op1) && rtx_equal_p (op0, op2))
+ op2 = op1;
+ /* Similarly for op0 != op1 ? op2 : op0 and op0 != op1 ? op2 : op1. */
+ else if (code == NE && CONST_INT_P (op1) && rtx_equal_p (op0, op3))
+ op3 = op1;
+
/* Don't attempt mode expansion here -- if we had to expand 5 or 6
HImode insns, we'd be swallowed in word prefix ops. */
if ((mode != HImode || TARGET_FAST_PREFIX)
&& (mode != (TARGET_64BIT ? TImode : DImode))
- && CONST_INT_P (operands[2])
- && CONST_INT_P (operands[3]))
+ && CONST_INT_P (op2)
+ && CONST_INT_P (op3))
{
rtx out = operands[0];
- HOST_WIDE_INT ct = INTVAL (operands[2]);
- HOST_WIDE_INT cf = INTVAL (operands[3]);
+ HOST_WIDE_INT ct = INTVAL (op2);
+ HOST_WIDE_INT cf = INTVAL (op3);
HOST_WIDE_INT diff;
diff = ct - cf;
@@ -3559,6 +3573,9 @@ ix86_expand_int_movcc (rtx operands[])
if (BRANCH_COST (optimize_insn_for_speed_p (), false) <= 2)
return false;
+ operands[2] = op2;
+ operands[3] = op3;
+
/* If one of the two operands is an interesting constant, load a
constant with the above and mask it in with a logical operation. */
fixes the first and last 2 functions, now to look what's going on with the
second and third one...
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (7 preceding siblings ...)
2022-04-22 12:28 ` jakub at gcc dot gnu.org
@ 2022-04-22 13:59 ` jakub at gcc dot gnu.org
2022-04-22 14:11 ` jakub at gcc dot gnu.org
` (5 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 13:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ok, so at least for the 2nd function, the problem is that we reject it from
noce_try_cmove as well from noce_try_cmove_arith based on costs.
That is the case for the NEXT_PASS (pass_sink_code, false /* unsplit edges */);
compilation too, though it seems the cost is 4 higher in the vanilla trunk
case.
I think that can be perhaps fixed with:
--- gcc/config/i386/i386-expand.cc.jj 2022-04-22 14:18:27.000000000 +0200
+++ gcc/config/i386/i386-expand.cc 2022-04-22 15:13:47.263829089 +0200
@@ -3224,8 +3224,7 @@ ix86_expand_int_movcc (rtx operands[])
}
diff = ct - cf;
- if (reg_overlap_mentioned_p (out, op0)
- || reg_overlap_mentioned_p (out, op1))
+ if (reg_overlap_mentioned_p (out, compare_op))
tmp = gen_reg_rtx (mode);
if (mode == DImode)
- at least I don't really see the point of using yet another temporary when we
already emitted the comparison earlier and all we emit is compare_op and
assignment to out.
Anyway, with NEXT_PASS (pass_sink_code, false /* unsplit edges */); it succeeds
then with cond_move_process_if_block. But it doesn't with vanilla trunk,
because it punts on:
4060 /* Don't try to handle this if the condition uses the
4061 destination register. */
4062 if (reg_overlap_mentioned_p (dest, cond))
4063 return FALSE;
I'd say it is reasonable to punt on that, because the whole
cond_move_process_if_block is meant to handle multiple cmoves, not just one,
and we handle all of them with the same cond. The only case that could be
handled is if it is the very last set in then_bb and else_bb is not present, or
if it is the last set in else_bb. I must say I'm also quite surprised we don't
really check any costs in the cond_move_process_if_block and just blindly
assume it will always be a win, so it seems it happily handles even the cases
of a single dest assignment that earlier noce_* routines attempt (but those do
check the costs and in this case punt).
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (8 preceding siblings ...)
2022-04-22 13:59 ` jakub at gcc dot gnu.org
@ 2022-04-22 14:11 ` jakub at gcc dot gnu.org
2022-04-22 14:24 ` [Bug target/105338] " jakub at gcc dot gnu.org
` (4 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 14:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, in both the first and second function if_info->original_cost is smaller
than seq_cost of the new sequence, but in the first function
if_info->max_seq_cost is larger than seq_cost of the new sequence, in the
second it is 0.
And the reason for that difference is that predictable_p is true in the second
function and not the first one, I presume because of the negative return value
predictor that predicts returning -2 as unlikely. But if the negative value
return heuristic is right (at least most of the time), then using conditional
jump seems to be the right choice.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (9 preceding siblings ...)
2022-04-22 14:11 ` jakub at gcc dot gnu.org
@ 2022-04-22 14:24 ` jakub at gcc dot gnu.org
2022-04-23 8:26 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 14:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 52850
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52850&action=edit
gcc12-pr105338.patch
Full untested patch for everything but the predictable cases.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (10 preceding siblings ...)
2022-04-22 14:24 ` [Bug target/105338] " jakub at gcc dot gnu.org
@ 2022-04-23 8:26 ` cvs-commit at gcc dot gnu.org
2022-04-25 14:48 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-04-23 8:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:1ceddd7497e15d262ead6f673f8f5ce79dd63714
commit r12-8233-g1ceddd7497e15d262ead6f673f8f5ce79dd63714
Author: Jakub Jelinek <jakub@redhat.com>
Date: Sat Apr 23 10:25:31 2022 +0200
i386: Improve ix86_expand_int_movcc [PR105338]
The following testcase regressed on x86_64 on the trunk, due to some GIMPLE
pass changes (r12-7687) we end up an *.optimized dump difference of:
@@ -8,14 +8,14 @@ int foo (int i)
<bb 2> [local count: 1073741824]:
if (i_2(D) != 0)
- goto <bb 4>; [35.00%]
+ goto <bb 3>; [35.00%]
else
- goto <bb 3>; [65.00%]
+ goto <bb 4>; [65.00%]
- <bb 3> [local count: 697932184]:
+ <bb 3> [local count: 375809640]:
<bb 4> [local count: 1073741824]:
- # iftmp.0_1 = PHI <5(2), i_2(D)(3)>
+ # iftmp.0_1 = PHI <5(3), i_2(D)(2)>
return iftmp.0_1;
}
and similarly for the other functions. That is functionally equivalent and
there is no canonical form for those. The reason for i_2(D) in the PHI
argument as opposed to 0 is the uncprop pass, that is in many cases
beneficial for expansion as we don't need to load the value into some
pseudo
in one of the if blocks.
Now, for the 11.x ordering we have the pseudo = i insn in the extended
basic
block (it comes first) and so forwprop1 undoes what uncprop does by
propagating constant 0 there. But for the 12.x ordering, the extended
basic
block contains pseudo = 5 and pseudo = i is in the other bb and so fwprop1
doesn't change it.
During the ce1 pass, we attempt to emit a conditional move and we have very
nice code for the cases where both last operands of ?: are constant, and
yet
another for !TARGET_CMOVE if at least one of them is.
The following patch will undo the uncprop behavior during
ix86_expand_int_movcc, but just for those spots that can benefit from both
or at least one operands being constant, leaving the pure cmov case as is
(because then it is useful not to have to load a constant into a pseudo
as it already is in one). We can do that in the
op0 == op1 ? op0 : op3
or
op0 != op1 ? op2 : op0
cases if op1 is a CONST_INT by pretending it is
op0 == op1 ? op1 : op3
or
op0 != op1 ? op2 : op1
2022-04-23 Jakub Jelinek <jakub@redhat.com>
PR target/105338
* config/i386/i386-expand.cc (ix86_expand_int_movcc): Handle
op0 == cst1 ? op0 : op3 like op0 == cst1 ? cst1 : op3 for the
non-cmov
cases.
* gcc.target/i386/pr105338.c: New test.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (11 preceding siblings ...)
2022-04-23 8:26 ` cvs-commit at gcc dot gnu.org
@ 2022-04-25 14:48 ` jakub at gcc dot gnu.org
2022-04-26 5:05 ` denis.campredon at gmail dot com
2022-04-26 8:21 ` jakub at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-25 14:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed. I think you can use if (__builtin_expect_with_probability (!b, 0, 0.5))
return -2; else return 0; if you want to override the return negative
heuristics.
Or if you use it in places where that heuristics doesn't trigger, say:
int bar (int);
int g(int i) {
return bar (i ? -2 : 0);
}
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (12 preceding siblings ...)
2022-04-25 14:48 ` jakub at gcc dot gnu.org
@ 2022-04-26 5:05 ` denis.campredon at gmail dot com
2022-04-26 8:21 ` jakub at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: denis.campredon at gmail dot com @ 2022-04-26 5:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
--- Comment #13 from denis.campredon at gmail dot com ---
Thanks a lots.
I have a question though: foo and bar are similar, foo produces a branchless
code whereas bar uses a jump.
int foo(int i) {
return !i ? 0 : -2;
}
int bar(int i) {
return i ? -2 : 0;
}
If I'm readding correctly in the two functions the probabilities are the same.
Is this "normal" or worth a new ticket ?
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
` (13 preceding siblings ...)
2022-04-26 5:05 ` denis.campredon at gmail dot com
@ 2022-04-26 8:21 ` jakub at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-26 8:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338
--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to denis.campredon from comment #13)
> Thanks a lots.
>
> I have a question though: foo and bar are similar, foo produces a branchless
> code whereas bar uses a jump.
>
> int foo(int i) {
> return !i ? 0 : -2;
> }
>
> int bar(int i) {
> return i ? -2 : 0;
> }
>
> If I'm readding correctly in the two functions the probabilities are the
> same. Is this "normal" or worth a new ticket ?
There is the bug where cond_move_process_if_block ignores the costs and another
where if the cost checks are added, it might be worth for the destination
overlap with comparison operand case force the comparison operand or
destination into a new temporary and add one more first or last instead of
giving up. The costs comparison would then catch that.
So yes, it is worth a new PR, but not something that will be addressed for GCC
12.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2022-04-26 8:21 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
2022-04-22 6:13 ` [Bug rtl-optimization/105338] [12 Regression] " rguenth at gcc dot gnu.org
2022-04-22 9:17 ` jakub at gcc dot gnu.org
2022-04-22 9:48 ` pinskia at gcc dot gnu.org
2022-04-22 11:00 ` jakub at gcc dot gnu.org
2022-04-22 11:11 ` rguenth at gcc dot gnu.org
2022-04-22 11:14 ` rguenth at gcc dot gnu.org
2022-04-22 11:20 ` rguenth at gcc dot gnu.org
2022-04-22 12:28 ` jakub at gcc dot gnu.org
2022-04-22 13:59 ` jakub at gcc dot gnu.org
2022-04-22 14:11 ` jakub at gcc dot gnu.org
2022-04-22 14:24 ` [Bug target/105338] " jakub at gcc dot gnu.org
2022-04-23 8:26 ` cvs-commit at gcc dot gnu.org
2022-04-25 14:48 ` jakub at gcc dot gnu.org
2022-04-26 5:05 ` denis.campredon at gmail dot com
2022-04-26 8:21 ` jakub at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).