* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
@ 2023-09-12 10:43 ` rguenth at gcc dot gnu.org
2023-10-17 12:34 ` rguenth at gcc dot gnu.org
` (15 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-09-12 10:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Codegen regression from |[14 Regression] Codegen
|i386 argument passing |regression from i386
|changes |argument passing changes
Target Milestone|--- |14.0
Keywords| |missed-optimization
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
2023-09-12 10:43 ` [Bug rtl-optimization/111267] [14 Regression] " rguenth at gcc dot gnu.org
@ 2023-10-17 12:34 ` rguenth at gcc dot gnu.org
2023-10-20 22:42 ` roger at nextmovesoftware dot com
` (14 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-17 12:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Last reconfirmed| |2023-10-17
Priority|P3 |P1
Status|UNCONFIRMED |NEW
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. We should at least understand what's going on.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
2023-09-12 10:43 ` [Bug rtl-optimization/111267] [14 Regression] " rguenth at gcc dot gnu.org
2023-10-17 12:34 ` rguenth at gcc dot gnu.org
@ 2023-10-20 22:42 ` roger at nextmovesoftware dot com
2023-10-20 22:49 ` roger at nextmovesoftware dot com
` (13 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-10-20 22:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
--- Comment #2 from Roger Sayle <roger at nextmovesoftware dot com> ---
Created attachment 56162
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56162&action=edit
proof-of-concept patch
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (2 preceding siblings ...)
2023-10-20 22:42 ` roger at nextmovesoftware dot com
@ 2023-10-20 22:49 ` roger at nextmovesoftware dot com
2024-01-05 10:43 ` jakub at gcc dot gnu.org
` (12 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-10-20 22:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
--- Comment #3 from Roger Sayle <roger at nextmovesoftware dot com> ---
This patch addresses the regression, but probably isn't the correct fix.
The issue is that the backend now has a way of representing the concatenation
of two registers (for example, TI is constructed for two DI mode registers):
(set (reg:TI 111 [ bD.2764 ])
(ior:TI (ashift:TI (zero_extend:TI (reg:DI 142))
(const_int 64 [0x40]))
(zero_extend:TI (reg:DI 141))))
But combine is unable to cleanly extract the (original) DI mode components back
out of this using SUBREGs. Currently combine gets confused and attempts to
match things like:
Trying 10 -> 74:
10: r111:TI=zero_extend(r142:DI)<<0x40|zero_extend(r141:DI)
REG_DEAD r141:DI
REG_DEAD r142:DI
74: r137:DI=r111:TI#0
Failed to match this instruction:
(parallel [
(set (reg:DI 137 [ bD.2764 ])
(reg:DI 141))
(set (reg:TI 111 [ bD.2764 ])
(ior:TI (ashift:TI (zero_extend:TI (reg:DI 142))
(const_int 64 [0x40]))
(zero_extend:TI (reg:DI 141))))
])
which contains the simplification we want, "reg:DI 137 := reg:DI 141", but
along with stuff that combine should really take care off (strip/duplicate).
I'll work on a more acceptable middle-end fix, but this patch demonstrates
progress, and can be used if a more general solution can't be found.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (3 preceding siblings ...)
2023-10-20 22:49 ` roger at nextmovesoftware dot com
@ 2024-01-05 10:43 ` jakub at gcc dot gnu.org
2024-01-12 15:49 ` jakub at gcc dot gnu.org
` (11 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-01-05 10:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Roger Sayle from comment #3)
> This patch addresses the regression, but probably isn't the correct fix.
Why? To me it looks like the correct fix.
> The issue is that the backend now has a way of representing the
> concatenation of two registers (for example, TI is constructed for two DI
> mode registers):
>
> (set (reg:TI 111 [ bD.2764 ])
> (ior:TI (ashift:TI (zero_extend:TI (reg:DI 142))
> (const_int 64 [0x40]))
> (zero_extend:TI (reg:DI 141))))
>
> But combine is unable to cleanly extract the (original) DI mode components
> back out of this using SUBREGs. Currently combine gets confused and
> attempts to match things like:
>
> Trying 10 -> 74:
> 10: r111:TI=zero_extend(r142:DI)<<0x40|zero_extend(r141:DI)
> REG_DEAD r141:DI
> REG_DEAD r142:DI
> 74: r137:DI=r111:TI#0
> Failed to match this instruction:
> (parallel [
> (set (reg:DI 137 [ bD.2764 ])
> (reg:DI 141))
> (set (reg:TI 111 [ bD.2764 ])
> (ior:TI (ashift:TI (zero_extend:TI (reg:DI 142))
> (const_int 64 [0x40]))
> (zero_extend:TI (reg:DI 141))))
> ])
>
> which contains the simplification we want, "reg:DI 137 := reg:DI 141", but
> along with stuff that combine should really take care off (strip/duplicate).
How it could do anything else? It simplifies the computation of r137, but
because r111 isn't dead but used later, it has to preserve the previous
computation. And, combine only considers the 2-4 instructions being simplified
together, so doesn't know that the other use only extracts the highpart subreg
and can be therefore also simplified.
Combine simply never tries to simplify folding say 10 -> 74 + 75.
Does your patch help with the testcase?
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (4 preceding siblings ...)
2024-01-05 10:43 ` jakub at gcc dot gnu.org
@ 2024-01-12 15:49 ` jakub at gcc dot gnu.org
2024-01-12 19:06 ` roger at nextmovesoftware dot com
` (10 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-01-12 15:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |segher at gcc dot gnu.org
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Slightly cleaned up testcase:
struct S { float a, b, c, d; };
int
foo (struct S x, struct S y)
{
return x.a <= y.c && x.b <= y.d && x.c >= y.a && x.c >= y.a;
}
int
bar (struct S x, struct S y)
{
return x.b <= y.d && x.c >= y.a;
}
I think the pattern be using reg_overlap_mentioned_p, register_operand doesn't
really guarantee the operand is REG_P on which REGNO can be used.
So something like:
--- gcc/config/i386/i386.md.jj 2024-01-03 12:01:11.649644854 +0100
+++ gcc/config/i386/i386.md 2024-01-12 16:24:10.806783889 +0100
@@ -13347,6 +13347,27 @@
DONE;
}
[(set_attr "isa" "*,nox64,x64,*")])
+
+;; Extract from concat
+
+(define_insn_and_split "*extv<dwi><mode>2_concat"
+ [(set (match_operand:DWIH 0 "register_operand")
+ (match_operand:DWIH 1 "register_operand"))
+ (set (match_operand:<DWI> 2 "register_operand")
+ (any_or_plus:<DWI>
+ (ashift:<DWI>
+ (zero_extend:<DWI> (match_operand:DWIH 3 "register_operand"))
+ (match_operand:QI 4 "const_scalar_int_operand"))
+ (zero_extend:<DWI> (match_operand:DWIH 5 "register_operand"))))]
+ "INTVAL (operands[4]) == <MODE_SIZE> * BITS_PER_UNIT
+ && !reg_overlap_mentioned_p (operands[0], operands[3])
+ && !reg_overlap_mentioned_p (operands[0], operands[5])"
+ "#"
+ "&& 1"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2) (ior:<DWI> (ashift:<DWI> (zero_extend:<DWI> (match_dup
3))
+ (match_dup 4))
+ (zero_extend:<DWI> (match_dup 5))))])
;; Negation instructions
Note, while that improves the generated code for the first function (almost but
not 100% to what GCC 13 emitted), the second function still results in much
larger code than before.
Bet it would be even better if we could just define_split the pattern instead
of define_insn_and_split, but unfortunately this is a 2 insn combination and
that doesn't allow to be split into 2 new instructions.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (5 preceding siblings ...)
2024-01-12 15:49 ` jakub at gcc dot gnu.org
@ 2024-01-12 19:06 ` roger at nextmovesoftware dot com
2024-01-14 0:09 ` roger at nextmovesoftware dot com
` (9 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-01-12 19:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
--- Comment #6 from Roger Sayle <roger at nextmovesoftware dot com> ---
Sorry for the delay in replying/answering Jakub's questions/comments. Yes,
using a define_insn_and_split in the backend fixes/works around the issue (and
I agree your implementation/refinement in comment #5 is better than mine in
comment #2), but I've a feeling that this approach isn't the ideal solution.
Nothing about this split, is specific to these x86 instructions or even to the
i386 backend.
A more generic fix might be teach combine.cc that it can split parallels of two
independent sets, with no inter dependencies, into two insns if the total cost
of the two instructions is less than the original two, i.e. a 2 insn -> 2 insn
combination.
But then even this doesn't feel like the perfect approach... the reason combine
doesn't already support 2->2 combinations is that they're not normally
required, these types of problems are usually handled by GCSE or CSE or PRE (or
?).
The pattern is insn1 defines REG1 to a complicated expression, that is live in
several locations, so this instruction can't be eliminated. However, if the
definition of REG1 is provided to insn2 that sets REG2, this second instruction
can be significantly simplified. This feels like a classic (non-)constant
propagation problem. I'm thinking perhaps want_to_gcse_p (or somewhere
similar) could be tweaked.
For people just joining the discussion (hopefully Jeff or a Richard):
(set (REG:DI 1) (concat:DI (REG:SI 2) (REG:SI 3))
...
(set (REG:SI 4) (low_part (REG:DI 1))
can be simplified so that the second assignment becomes just:
(set (REG:SI 4) (REG:SI 2))
and similarly for high_part vs. low_part. These don't even
need to be in the same basic block.
In actuality, "concat" is a large ugly expression, and high_part/low_part are
actually SUBREGs (or could be TRUNCATE or SHIFT+TRUNCATE), but the theory
should remain the same.
I'm trying to figure out which pass (or cselib?) is normally responsible for
handling this type of pseudo-reg propagation.
But the define_insn_and_split certainly papers over the deficiency in the
middle-end's RTL optimizers and fixes this (very) specific case/regression.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (6 preceding siblings ...)
2024-01-12 19:06 ` roger at nextmovesoftware dot com
@ 2024-01-14 0:09 ` roger at nextmovesoftware dot com
2024-01-14 12:10 ` roger at nextmovesoftware dot com
` (8 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-01-14 0:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
--- Comment #7 from Roger Sayle <roger at nextmovesoftware dot com> ---
Very many thanks to Jeff Law for pointing me to fwprop. The following simple
patch also fixes this regression.
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 0c588f8..cbba44e 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -449,15 +449,6 @@ try_fwprop_subst_pattern (obstack_watermark &attempt,
insn_
change &use_change,
if (prop.num_replacements == 0)
return false;
- if (!prop.profitable_p ())
- {
- if (dump_file && (dump_flags & TDF_DETAILS))
- fprintf (dump_file, "cannot propagate from insn %d into"
- " insn %d: %s\n", def_insn->uid (), use_insn->uid (),
- "would increase complexity of pattern");
- return false;
- }
-
if (dump_file && (dump_flags & TDF_DETAILS))
{
fprintf (dump_file, "\npropagating insn %d into insn %d, replacing:\n",
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (7 preceding siblings ...)
2024-01-14 0:09 ` roger at nextmovesoftware dot com
@ 2024-01-14 12:10 ` roger at nextmovesoftware dot com
2024-01-15 7:32 ` rguenth at gcc dot gnu.org
` (7 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-01-14 12:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |roger at nextmovesoftware dot com
--- Comment #8 from Roger Sayle <roger at nextmovesoftware dot com> ---
Now we're in stage4, I'll take this. I'm bootstrapping and regression testing
a variant of my tweak to try_fwprop_subst_pattern. The change in comment #7
can hang loop indefinitely if the transformation results in the same cost as
the original, so the logic on when to forward-propagate needed to be tweaked a
little.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (8 preceding siblings ...)
2024-01-14 12:10 ` roger at nextmovesoftware dot com
@ 2024-01-15 7:32 ` rguenth at gcc dot gnu.org
2024-01-16 1:15 ` roger at nextmovesoftware dot com
` (6 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-15 7:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rsandifo at gcc dot gnu.org
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Yes, this is basically "folding", and across multiple insns this requires
fwprop or combine.
9: r111:TI=zero_extend(r112:DI)
10: r111:TI=r111:TI&<0,0xffffffffffffffff>|zero_extend(r113:DI)<<0x40
74: r137:DI=r111:TI#0
75: r138:DI=r111:TI#8
I'm not sure fwprop even tries to do 9->10->74 though.
The suggested change to drop the call to "profitable" seems to remove any
cost checking done (and is also not consistent with the other call we do
on notes?).
Would ssa-combine have catched the above?
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (9 preceding siblings ...)
2024-01-15 7:32 ` rguenth at gcc dot gnu.org
@ 2024-01-16 1:15 ` roger at nextmovesoftware dot com
2024-01-21 21:25 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-01-16 1:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
--- Comment #10 from Roger Sayle <roger at nextmovesoftware dot com> ---
A revised and improved patch has been posted for review at
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643062.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (10 preceding siblings ...)
2024-01-16 1:15 ` roger at nextmovesoftware dot com
@ 2024-01-21 21:25 ` cvs-commit at gcc dot gnu.org
2024-01-22 9:11 ` rsandifo at gcc dot gnu.org
` (4 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-21 21:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:86de9b66480b710202a2898cf513db105d8c432f
commit r14-8319-g86de9b66480b710202a2898cf513db105d8c432f
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Sun Jan 21 21:22:28 2024 +0000
PR rtl-optimization/111267: Improved forward propagation.
This patch resolves PR rtl-optimization/111267 by improving RTL-level
forward propagation. This x86_64 code quality regression was caused
(exposed) by my changes to improve how x86's (TImode) argument passing
is represented at the RTL-level (reducing the use of SUBREGs to catch
more optimization opportunities in combine). The pitfall is that the
more complex RTL representations expose a limitation in RTL's fwprop
pass.
At the heart of fwprop, in try_fwprop_subst_pattern, the logic can
be summarized as three steps. Step 1 is a heuristic that rejects the
propagation attempt if the expression is too complex, step 2 calls
the backend's recog to see if the propagated/simplified instruction
is recognizable/valid, and step 3 then calls set_src_cost to compare
the rtx costs of the replacement vs. the original, and accepts the
transformation if the final cost is the same of better.
The logic error (or missed optimization opportunity) is that the
step 1 heuristic that attempts to predict (second guess) the
process is flawed. Ultimately the decision on whether to fwprop
or not should depend solely on actual improvement, as measured
by RTX costs. Hence the prototype fix in the bugzilla PR removes
the heuristic of calling prop.profitable_p entirely, relying
entirely on the cost comparison in step 3.
Unfortunately, things are a tiny bit more complicated. The cost
comparison in fwprop uses the older set_src_cost API and not the
newer (preffered) insn_cost API as currently used in combine.
This means that the cost improvement comparisons are only done
for single_set instructions (more complex PARALLELs etc. aren't
supported). Hence we can only rely on skipping step 1 for that
subset of instructions actually evaluated by step 3.
The other subtlety is that to avoid potential infinite loops
in fwprop we should only reply purely on rtx costs when the
transformation is obviously an improvement. If the replacement
has the same cost as the original, we can use the prop.profitable_p
test to preserve the current behavior.
Finally, to answer Richard Biener's remaining question about this
approach: yes, there is an asymmetry between how patterns are
handled and how REG_EQUAL notes are handled. For example, at
the moment propagation into notes doesn't use rtx costs at all,
and ultimately when fwprop is updated to use insn_cost, this
(and recog) obviously isn't applicable to notes. There's no reason
the logic need be identical between patterns and notes, and during
stage4 we only need update propagation into patterns to fix this
P1 regression (notes and use of cost_insn can be done for GCC 15).
For Jakub's reduced testcase:
struct S { float a, b, c, d; };
int bar (struct S x, struct S y) {
return x.b <= y.d && x.c >= y.a;
}
On x86_64-pc-linux-gnu with -O2 gcc currently generates:
bar: movq %xmm2, %rdx
movq %xmm3, %rax
movq %xmm0, %rsi
xchgq %rdx, %rax
movq %rsi, %rcx
movq %rax, %rsi
movq %rdx, %rax
shrq $32, %rcx
shrq $32, %rax
movd %ecx, %xmm4
movd %eax, %xmm0
comiss %xmm4, %xmm0
jb .L6
movd %esi, %xmm0
xorl %eax, %eax
comiss %xmm0, %xmm1
setnb %al
ret
.L6: xorl %eax, %eax
ret
with this simple patch to fwprop, we now generate:
bar: shufps $85, %xmm0, %xmm0
shufps $85, %xmm3, %xmm3
comiss %xmm0, %xmm3
jb .L6
xorl %eax, %eax
comiss %xmm2, %xmm1
setnb %al
ret
.L6: xorl %eax, %eax
ret
2024-01-21 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR rtl-optimization/111267
* fwprop.cc (fwprop_propagation::profitabe_p): Rename
profitable_p method to likely_profitable_p.
(try_fwprop_subst_node): Update call to likely_profitable_p.
Only bail-out early when !prop.likely_profitable_p for instructions
that are not single sets. When comparing costs, bail-out if the
cost is unchanged and !prop.likely_profitable_p.
gcc/testsuite/ChangeLog
PR rtl-optimization/111267
* gcc.target/i386/pr111267.c: New test case.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (11 preceding siblings ...)
2024-01-21 21:25 ` cvs-commit at gcc dot gnu.org
@ 2024-01-22 9:11 ` rsandifo at gcc dot gnu.org
2024-01-25 7:51 ` mkuvyrkov at gcc dot gnu.org
` (3 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-01-22 9:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
--- Comment #12 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
I don't object to the patch, but for the record: the current heuristics go back
a long way. Although I reworked the pass to use rtl-ssa a few years ago, I
tried as far as possible to preserve the old heuristics (tested by making sure
that there were no unexplained differences over a large set of targets).
I wouldn't characterise the old heuristics as a logic error. Although I didn't
write them, my understanding is that they were being deliberately conservative,
in particular due to the risk of introducing excess register pressure.
So this change seems potentially quite invasive for stage 4. Perhaps it'll
work out — if so, great! But if there is some fallout, I think we should lean
towards reverting the patch and revisiting in GCC 15.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (12 preceding siblings ...)
2024-01-22 9:11 ` rsandifo at gcc dot gnu.org
@ 2024-01-25 7:51 ` mkuvyrkov at gcc dot gnu.org
2024-01-25 7:55 ` clyon at gcc dot gnu.org
` (2 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: mkuvyrkov at gcc dot gnu.org @ 2024-01-25 7:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mkuvyrkov at gcc dot gnu.org
--- Comment #13 from Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> ---
We are seeing scan-assembler failures in a single 32-bit arm test. This
affects both linux and bare-metal targets: arm-linux-gnueabihf and
arm-none-eabi.
=== gcc tests ===
Running gcc:gcc.target/arm/arm.exp ...
FAIL: gcc.target/arm/bics_3.c scan-assembler-times bics\tr[0-9]+, r[0-9]+,
r[0-9]+ 2
FAIL: gcc.target/arm/bics_3.c scan-assembler-times bics\tr[0-9]+, r[0-9]+,
r[0-9]+, .sl #2 1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (13 preceding siblings ...)
2024-01-25 7:51 ` mkuvyrkov at gcc dot gnu.org
@ 2024-01-25 7:55 ` clyon at gcc dot gnu.org
2024-01-25 9:48 ` mkuvyrkov at gcc dot gnu.org
2024-02-16 9:12 ` roger at nextmovesoftware dot com
16 siblings, 0 replies; 18+ messages in thread
From: clyon at gcc dot gnu.org @ 2024-01-25 7:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
Christophe Lyon <clyon at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |clyon at gcc dot gnu.org
--- Comment #14 from Christophe Lyon <clyon at gcc dot gnu.org> ---
(In reply to Maxim Kuvyrkov from comment #13)
> We are seeing scan-assembler failures in a single 32-bit arm test. This
> affects both linux and bare-metal targets: arm-linux-gnueabihf and
> arm-none-eabi.
>
> === gcc tests ===
>
> Running gcc:gcc.target/arm/arm.exp ...
> FAIL: gcc.target/arm/bics_3.c scan-assembler-times bics\tr[0-9]+, r[0-9]+,
> r[0-9]+ 2
> FAIL: gcc.target/arm/bics_3.c scan-assembler-times bics\tr[0-9]+, r[0-9]+,
> r[0-9]+, .sl #2 1
I think this is reported as:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113542
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (14 preceding siblings ...)
2024-01-25 7:55 ` clyon at gcc dot gnu.org
@ 2024-01-25 9:48 ` mkuvyrkov at gcc dot gnu.org
2024-02-16 9:12 ` roger at nextmovesoftware dot com
16 siblings, 0 replies; 18+ messages in thread
From: mkuvyrkov at gcc dot gnu.org @ 2024-01-25 9:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
--- Comment #15 from Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> ---
(In reply to Maxim Kuvyrkov from comment #13)
> We are seeing scan-assembler failures in a single 32-bit arm test. This
> affects both linux and bare-metal targets: arm-linux-gnueabihf and
> arm-none-eabi.
>
> === gcc tests ===
>
> Running gcc:gcc.target/arm/arm.exp ...
> FAIL: gcc.target/arm/bics_3.c scan-assembler-times bics\tr[0-9]+, r[0-9]+,
> r[0-9]+ 2
> FAIL: gcc.target/arm/bics_3.c scan-assembler-times bics\tr[0-9]+, r[0-9]+,
> r[0-9]+, .sl #2 1
I've looked into the reason for the above failures, and it seems to be not an
issue.
After the patch fwprop1 decides to do an additional propagation, which was
considered as "would increase complexity of pattern" before the patch. This
results in change from "bics; mov" to "bic; subs". If I understand ARM
assembler correctly, handling of sign was shifted from "bics" to "subs"
instruction.
This is the actual code: BEFORE:
bics r0, r0, r1 @ 9 [c=4 l=4]
*andsi_notsi_si_compare0_scratch
mov r0, #1 @ 23 [c=4 l=4] *thumb2_movsi_vfp/1
it eq
moveq r0, #0 @ 26 [c=8 l=4] *p *thumb2_movsi_vfp/2
bx lr @ 29 [c=8 l=4] *thumb2_return
and AFTER:
bic r0, r0, r1 @ 8 [c=4 l=4] andsi_notsi_si
subs r0, r0, #0 @ 22 [c=4 l=4] cmpsi2_addneg/0
it ne
movne r0, #1 @ 23 [c=8 l=4] *p *thumb2_movsi_vfp/2
bx lr @ 26 [c=8 l=4] *thumb2_return
If I don't hear anything to the contrary, I'll update the testcase to accept
both "bic" and "bics".
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes
2023-09-01 12:55 [Bug rtl-optimization/111267] New: Codegen regression from i386 argument passing changes manolis.tsamis at vrull dot eu
` (15 preceding siblings ...)
2024-01-25 9:48 ` mkuvyrkov at gcc dot gnu.org
@ 2024-02-16 9:12 ` roger at nextmovesoftware dot com
16 siblings, 0 replies; 18+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-02-16 9:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #16 from Roger Sayle <roger at nextmovesoftware dot com> ---
This should now be fixed on mainline. The testsuite regressions (on non-x86
targets) are cosmetic, i.e. neither wrong code nor worse performance/size, just
differences in expected code generation.
^ permalink raw reply [flat|nested] 18+ messages in thread