public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523
@ 2024-03-28 10:01 rsandifo at gcc dot gnu.org
  2024-03-28 10:05 ` [Bug rtl-optimization/114515] " rguenth at gcc dot gnu.org
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-03-28 10:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

            Bug ID: 114515
           Summary: [14 Regression] Failure to use aarch64 lane forms
                    after PR101523
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---

The following test regressed on aarch64 after
g:839bc42772ba7af66af3bd16efed4a69511312ae (the fix for PR101523):

typedef float v4sf __attribute__((vector_size(16)));
void f (v4sf *ptr, float f)
{
  ptr[0] = ptr[0] * (v4sf) { f, f, f, f };
  ptr[1] = ptr[1] * (v4sf) { f, f, f, f };
}

Compiled with -O2, we previously generated:

        ldp     q1, q31, [x0]
        fmul    v1.4s, v1.4s, v0.s[0]
        fmul    v31.4s, v31.4s, v0.s[0]
        stp     q1, q31, [x0]
        ret

Now we generate:

        ldp     q1, q31, [x0]
        dup     v0.4s, v0.s[0]
        fmul    v1.4s, v1.4s, v0.4s
        fmul    v31.4s, v31.4s, v0.4s
        stp     q1, q31, [x0]
        ret

with the extra dup.

The patch is trying to avoid cases where i3 is canonicalised by contextual
information provided by i2.  But here we place a full copy of i2 into i3
(creating an instruction that is no more expensive).  This is a benefit in its
own right because the two instructions can then execute in parallel rather than
serially.  But it also means that, as here, we might be able to remove i2 with
later combinations.

Perhaps we could also check whether i3 still contains the destination of i2?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
@ 2024-03-28 10:05 ` rguenth at gcc dot gnu.org
  2024-03-28 10:06 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-28 10:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |aarch64
   Target Milestone|---                         |14.0
                 CC|                            |rguenth at gcc dot gnu.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
  2024-03-28 10:05 ` [Bug rtl-optimization/114515] " rguenth at gcc dot gnu.org
@ 2024-03-28 10:06 ` rguenth at gcc dot gnu.org
  2024-03-28 10:09 ` segher at gcc dot gnu.org
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-28 10:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, why does forwprop not do this?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
  2024-03-28 10:05 ` [Bug rtl-optimization/114515] " rguenth at gcc dot gnu.org
  2024-03-28 10:06 ` rguenth at gcc dot gnu.org
@ 2024-03-28 10:09 ` segher at gcc dot gnu.org
  2024-03-28 10:19 ` rsandifo at gcc dot gnu.org
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-28 10:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

--- Comment #2 from Segher Boessenkool <segher at gcc dot gnu.org> ---
The PR101523 fix makes sure we do not get the same I2 back, because that
violates algorithmic assumptions of combine.  Importantly, the way it was
things can be changed back time and time again, and that actually happened.
There is no "canonical form" in combine, it all depends on what little
piece of context is and is not considered what form combine prefers.  Things
can -- and DID -- oscillate.

So, what is happening here?  The "dup" here is really a "splat"?  Should the
backend have some extra define_insn or define_split, or maybe even a peephole?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-03-28 10:09 ` segher at gcc dot gnu.org
@ 2024-03-28 10:19 ` rsandifo at gcc dot gnu.org
  2024-03-28 10:29 ` rsandifo at gcc dot gnu.org
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-03-28 10:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

--- Comment #3 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
In RTL terms, the dup is vec_duplicate.  The combination is:

Trying 10 -> 13:
   10: r107:V4SF=vec_duplicate(r115:SF)
      REG_DEAD r115:SF
   13: r110:V4SF=r111:V4SF*r107:V4SF
      REG_DEAD r111:V4SF
Failed to match this instruction:
(parallel [
        (set (reg:V4SF 110 [ _2 ])
            (mult:V4SF (vec_duplicate:V4SF (reg:SF 115))
                (reg:V4SF 111 [ *ptr_6(D) ])))
        (set (reg:V4SF 107)
            (vec_duplicate:V4SF (reg:SF 115)))
    ])
Failed to match this instruction:
(parallel [
        (set (reg:V4SF 110 [ _2 ])
            (mult:V4SF (vec_duplicate:V4SF (reg:SF 115))
                (reg:V4SF 111 [ *ptr_6(D) ])))
        (set (reg:V4SF 107)
            (vec_duplicate:V4SF (reg:SF 115)))
    ])
Successfully matched this instruction:
(set (reg:V4SF 107)
    (vec_duplicate:V4SF (reg:SF 115)))
Successfully matched this instruction:
(set (reg:V4SF 110 [ _2 ])
    (mult:V4SF (vec_duplicate:V4SF (reg:SF 115))
        (reg:V4SF 111 [ *ptr_6(D) ])))
allowing combination of insns 10 and 13
original costs 8 + 20 = 28
replacement costs 8 + 20 = 28
modifying insn i2    10: r107:V4SF=vec_duplicate(r115:SF)
deferring rescan insn with uid = 10.
modifying insn i3    13: r110:V4SF=vec_duplicate(r115:SF)*r111:V4SF
      REG_DEAD r115:SF
      REG_DEAD r111:V4SF
deferring rescan insn with uid = 13.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-03-28 10:19 ` rsandifo at gcc dot gnu.org
@ 2024-03-28 10:29 ` rsandifo at gcc dot gnu.org
  2024-03-28 12:43 ` rsandifo at gcc dot gnu.org
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-03-28 10:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

--- Comment #4 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> Btw, why does forwprop not do this?
Not 100% sure (I wasn't involved in choosing the current heuristics).  But
fwprop can propagate across blocks, so there is probably more risk of
increasing register pressure.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-03-28 10:29 ` rsandifo at gcc dot gnu.org
@ 2024-03-28 12:43 ` rsandifo at gcc dot gnu.org
  2024-03-29 23:47 ` law at gcc dot gnu.org
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-03-28 12:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

--- Comment #5 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
For the record, the associated new testsuite failures are:

FAIL: gcc.target/aarch64/ashltidisi.c scan-assembler-times asr 3
FAIL: gcc.target/aarch64/asimd-mull-elem.c scan-assembler-times
\\s+fmul\\tv[0-9]+\\.4s, v[0-9]+\\.4s, v[0-9]+\\.s\\[0\\] 4
FAIL: gcc.target/aarch64/asimd-mull-elem.c scan-assembler-times
\\s+mul\\tv[0-9]+\\.4s, v[0-9]+\\.4s, v[0-9]+\\.s\\[0\\] 4
FAIL: gcc.target/aarch64/ccmp_3.c scan-assembler-not \tcbnz\t
FAIL: gcc.target/aarch64/pr100056.c scan-assembler-times \\t[us]bfiz\\tw[0-9]+,
w[0-9]+, 11 2
FAIL: gcc.target/aarch64/pr100056.c scan-assembler-times \\tadd\\tw[0-9]+,
w[0-9]+, w[0-9]+, uxtb\\n 2
FAIL: gcc.target/aarch64/pr108840.c scan-assembler-not and\\tw[0-9]+, w[0-9]+,
31
FAIL: gcc.target/aarch64/pr112105.c scan-assembler-not \\tdup\\t
FAIL: gcc.target/aarch64/pr112105.c scan-assembler-times
(?n)\\tfmul\\t.*v[0-9]+\\.s\\[0\\]\\n 2
FAIL: gcc.target/aarch64/rev16_2.c scan-assembler-times rev16\\tx[0-9]+ 2
FAIL: gcc.target/aarch64/vaddX_high_cost.c scan-assembler-not dup\\t
FAIL: gcc.target/aarch64/vmul_element_cost.c scan-assembler-not dup\\t
FAIL: gcc.target/aarch64/vmul_high_cost.c scan-assembler-not dup\\t
FAIL: gcc.target/aarch64/vsubX_high_cost.c scan-assembler-not dup\\t
FAIL: gcc.target/aarch64/sve/pr98119.c scan-assembler \\tand\\tx[0-9]+,
x[0-9]+, #?-31\\n
FAIL: gcc.target/aarch64/sve/pred-not-gen-1.c scan-assembler-not \\tbic\\t
FAIL: gcc.target/aarch64/sve/pred-not-gen-1.c scan-assembler-times
\\tnot\\tp[0-9]+\\.b, p[0-9]+/z, p[0-9]+\\.b\\n 1
FAIL: gcc.target/aarch64/sve/pred-not-gen-4.c scan-assembler-not \\tbic\\t
FAIL: gcc.target/aarch64/sve/pred-not-gen-4.c scan-assembler-times
\\tnot\\tp[0-9]+\\.b, p[0-9]+/z, p[0-9]+\\.b\\n 1
FAIL: gcc.target/aarch64/sve/var_stride_2.c scan-assembler-times
\\tubfiz\\tx[0-9]+, x2, 10, 16\\n 1
FAIL: gcc.target/aarch64/sve/var_stride_2.c scan-assembler-times
\\tubfiz\\tx[0-9]+, x3, 10, 16\\n 1
FAIL: gcc.target/aarch64/sve/var_stride_4.c scan-assembler-times
\\tsbfiz\\tx[0-9]+, x2, 10, 32\\n 1
FAIL: gcc.target/aarch64/sve/var_stride_4.c scan-assembler-times
\\tsbfiz\\tx[0-9]+, x3, 10, 32\\n 1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-03-28 12:43 ` rsandifo at gcc dot gnu.org
@ 2024-03-29 23:47 ` law at gcc dot gnu.org
  2024-04-02  8:05 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-29 23:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
                 CC|                            |law at gcc dot gnu.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-03-29 23:47 ` law at gcc dot gnu.org
@ 2024-04-02  8:05 ` rguenth at gcc dot gnu.org
  2024-04-02 18:42 ` rdapp at gcc dot gnu.org
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-04-02  8:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P1
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-04-02
             Status|UNCONFIRMED                 |NEW

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note I think given the offending rev fixed a very old bug we should eventually
revert the fix and rework it during next stage1.  This was at least
unexpectedly big fallout AFAIU.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-04-02  8:05 ` rguenth at gcc dot gnu.org
@ 2024-04-02 18:42 ` rdapp at gcc dot gnu.org
  2024-04-02 20:24 ` ewlu at rivosinc dot com
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rdapp at gcc dot gnu.org @ 2024-04-02 18:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

Robin Dapp <rdapp at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ewlu at rivosinc dot com,
                   |                            |rdapp at gcc dot gnu.org

--- Comment #7 from Robin Dapp <rdapp at gcc dot gnu.org> ---
There is some riscv fallout as well.  Edwin has the details.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2024-04-02 18:42 ` rdapp at gcc dot gnu.org
@ 2024-04-02 20:24 ` ewlu at rivosinc dot com
  2024-04-02 20:45 ` law at gcc dot gnu.org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: ewlu at rivosinc dot com @ 2024-04-02 20:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

--- Comment #8 from Edwin Lu <ewlu at rivosinc dot com> ---
(In reply to Robin Dapp from comment #7)
> There is some riscv fallout as well.  Edwin has the details.

I haven't done an in depth analysis but the full list of new riscv scan-dump
failures can be found here:
https://github.com/patrick-rivos/gcc-postcommit-ci/issues/694

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2024-04-02 20:24 ` ewlu at rivosinc dot com
@ 2024-04-02 20:45 ` law at gcc dot gnu.org
  2024-04-03 15:20 ` tnfchris at gcc dot gnu.org
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: law at gcc dot gnu.org @ 2024-04-02 20:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

--- Comment #9 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Thanks for that info Edwin -- my tester flagged them too and mentally I'd
figured it was most likely the combiner change.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2024-04-02 20:45 ` law at gcc dot gnu.org
@ 2024-04-03 15:20 ` tnfchris at gcc dot gnu.org
  2024-04-10  6:01 ` [Bug rtl-optimization/114515] [15 " rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-04-03 15:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tnfchris at gcc dot gnu.org
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=114575

--- Comment #10 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
This has also broken our addressing modes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114575

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [15 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2024-04-03 15:20 ` tnfchris at gcc dot gnu.org
@ 2024-04-10  6:01 ` rguenth at gcc dot gnu.org
  2024-06-16  3:28 ` law at gcc dot gnu.org
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-04-10  6:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|14.0                        |15.0
            Summary|[14 Regression] Failure to  |[15 Regression] Failure to
                   |use aarch64 lane forms      |use aarch64 lane forms
                   |after PR101523              |after PR101523

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Reverted for GCC 14 but will re-appear for GCC 15.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [15 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2024-04-10  6:01 ` [Bug rtl-optimization/114515] [15 " rguenth at gcc dot gnu.org
@ 2024-06-16  3:28 ` law at gcc dot gnu.org
  2024-06-24  7:43 ` cvs-commit at gcc dot gnu.org
  2024-06-24  8:17 ` rsandifo at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: law at gcc dot gnu.org @ 2024-06-16  3:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=114996

--- Comment #12 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Reading the RTL dumps from Richard S. this looks like the exact same problem
we're still seeing on the RISC-V port, affecting 557.xz.

Specifically we get the same I2 back, but I3 has changed.  The change in I3 in
turn allows I2 to combine into a different instruction and net is a clear
improvement.  ISTM that allowing this combination when we get the same I2 back,
but a different I3 would be sufficient to fix both the aarch64 and riscv
problems.

Unfortunately Segher has gone radio silent on this issue.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [15 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2024-06-16  3:28 ` law at gcc dot gnu.org
@ 2024-06-24  7:43 ` cvs-commit at gcc dot gnu.org
  2024-06-24  8:17 ` rsandifo at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-24  7:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

--- Comment #13 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:792f97b44ffc5e6a967292b3747fd835e99396e7

commit r15-1579-g792f97b44ffc5e6a967292b3747fd835e99396e7
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Mon Jun 24 08:43:19 2024 +0100

    Add a late-combine pass [PR106594]

    This patch adds a combine pass that runs late in the pipeline.
    There are two instances: one between combine and split1, and one
    after postreload.

    The pass currently has a single objective: remove definitions by
    substituting into all uses.  The pre-RA version tries to restrict
    itself to cases that are likely to have a neutral or beneficial
    effect on register pressure.

    The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
    in the aarch64 test results, mostly due to making proper use of
    MOVPRFX in cases where we didn't previously.

    This is just a first step.  I'm hoping that the pass could be
    used for other combine-related optimisations in future.  In particular,
    the post-RA version doesn't need to restrict itself to cases where all
    uses are substitutable, since it doesn't have to worry about register
    pressure.  If we did that, and if we extended it to handle multi-register
    REGs, the pass might be a viable replacement for regcprop, which in
    turn might reduce the cost of having a post-RA instance of the new pass.

    On most targets, the pass is enabled by default at -O2 and above.
    However, it has a tendency to undo x86's STV and RPAD passes,
    by folding the more complex post-STV/RPAD form back into the
    simpler pre-pass form.

    Also, running a pass after register allocation means that we can
    now match define_insn_and_splits that were previously only matched
    before register allocation.  This trips things like:

      (define_insn_and_split "..."
        [...pattern...]
        "...cond..."
        "#"
        "&& 1"
        [...pattern...]
        {
          ...unconditional use of gen_reg_rtx ()...;
        }

    because matching and splitting after RA will call gen_reg_rtx when
    pseudos are no longer allowed.  rs6000 has several instances of this.

    xtensa has a variation in which the split condition is:

        "&& can_create_pseudo_p ()"

    The failure then is that, if we match after RA, we'll never be
    able to split the instruction.

    The patch therefore disables the pass by default on i386, rs6000
    and xtensa.  Hopefully we can fix those ports later (if their
    maintainers want).  It seems better to add the pass first, though,
    to make it easier to test any such fixes.

    gcc.target/aarch64/bitfield-bitint-abi-align{16,8}.c would need
    quite a few updates for the late-combine output.  That might be
    worth doing, but it seems too complex to do as part of this patch.

    I tried compiling at least one target per CPU directory and comparing
    the assembly output for parts of the GCC testsuite.  This is just a way
    of getting a flavour of how the pass performs; it obviously isn't a
    meaningful benchmark.  All targets seemed to improve on average:

    Target                 Tests   Good    Bad   %Good   Delta  Median
    ======                 =====   ====    ===   =====   =====  ======
    aarch64-linux-gnu       2215   1975    240  89.16%   -4159      -1
    aarch64_be-linux-gnu    1569   1483     86  94.52%  -10117      -1
    alpha-linux-gnu         1454   1370     84  94.22%   -9502      -1
    amdgcn-amdhsa           5122   4671    451  91.19%  -35737      -1
    arc-elf                 2166   1932    234  89.20%  -37742      -1
    arm-linux-gnueabi       1953   1661    292  85.05%  -12415      -1
    arm-linux-gnueabihf     1834   1549    285  84.46%  -11137      -1
    avr-elf                 4789   4330    459  90.42% -441276      -4
    bfin-elf                2795   2394    401  85.65%  -19252      -1
    bpf-elf                 3122   2928    194  93.79%   -8785      -1
    c6x-elf                 2227   1929    298  86.62%  -17339      -1
    cris-elf                3464   3270    194  94.40%  -23263      -2
    csky-elf                2915   2591    324  88.89%  -22146      -1
    epiphany-elf            2399   2304     95  96.04%  -28698      -2
    fr30-elf                7712   7299    413  94.64%  -99830      -2
    frv-linux-gnu           3332   2877    455  86.34%  -25108      -1
    ft32-elf                2775   2667    108  96.11%  -25029      -1
    h8300-elf               3176   2862    314  90.11%  -29305      -2
    hppa64-hp-hpux11.23     4287   4247     40  99.07%  -45963      -2
    ia64-linux-gnu          2343   1946    397  83.06%   -9907      -2
    iq2000-elf              9684   9637     47  99.51% -126557      -2
    lm32-elf                2681   2608     73  97.28%  -59884      -3
    loongarch64-linux-gnu   1303   1218     85  93.48%  -13375      -2
    m32r-elf                1626   1517    109  93.30%   -9323      -2
    m68k-linux-gnu          3022   2620    402  86.70%  -21531      -1
    mcore-elf               2315   2085    230  90.06%  -24160      -1
    microblaze-elf          2782   2585    197  92.92%  -16530      -1
    mipsel-linux-gnu        1958   1827    131  93.31%  -15462      -1
    mipsisa64-linux-gnu     1655   1488    167  89.91%  -16592      -2
    mmix                    4914   4814    100  97.96%  -63021      -1
    mn10300-elf             3639   3320    319  91.23%  -34752      -2
    moxie-rtems             3497   3252    245  92.99%  -87305      -3
    msp430-elf              4353   3876    477  89.04%  -23780      -1
    nds32le-elf             3042   2780    262  91.39%  -27320      -1
    nios2-linux-gnu         1683   1355    328  80.51%   -8065      -1
    nvptx-none              2114   1781    333  84.25%  -12589      -2
    or1k-elf                3045   2699    346  88.64%  -14328      -2
    pdp11                   4515   4146    369  91.83%  -26047      -2
    pru-elf                 1585   1245    340  78.55%   -5225      -1
    riscv32-elf             2122   2000    122  94.25% -101162      -2
    riscv64-elf             1841   1726    115  93.75%  -49997      -2
    rl78-elf                2823   2530    293  89.62%  -40742      -4
    rx-elf                  2614   2480    134  94.87%  -18863      -1
    s390-linux-gnu          1591   1393    198  87.55%  -16696      -1
    s390x-linux-gnu         2015   1879    136  93.25%  -21134      -1
    sh-linux-gnu            1870   1507    363  80.59%   -9491      -1
    sparc-linux-gnu         1123   1075     48  95.73%  -14503      -1
    sparc-wrs-vxworks       1121   1073     48  95.72%  -14578      -1
    sparc64-linux-gnu       1096   1021     75  93.16%  -15003      -1
    v850-elf                1897   1728    169  91.09%  -11078      -1
    vax-netbsdelf           3035   2995     40  98.68%  -27642      -1
    visium-elf              1392   1106    286  79.45%   -7984      -2
    xstormy16-elf           2577   2071    506  80.36%  -13061      -1

    gcc/
            PR rtl-optimization/106594
            PR rtl-optimization/114515
            PR rtl-optimization/114575
            PR rtl-optimization/114996
            PR rtl-optimization/115104
            * Makefile.in (OBJS): Add late-combine.o.
            * common.opt (flate-combine-instructions): New option.
            * doc/invoke.texi: Document it.
            * opts.cc (default_options_table): Enable it by default at -O2
            and above.
            * tree-pass.h (make_pass_late_combine): Declare.
            * late-combine.cc: New file.
            * passes.def: Add two instances of late_combine.
            * doc/passes.texi: Document the new passes.
            * config/i386/i386-options.cc (ix86_override_options_after_change):
            Disable late-combine by default.
            * config/rs6000/rs6000.cc (rs6000_option_override_internal):
Likewise.
            * config/xtensa/xtensa.cc (xtensa_option_override): Likewise.

    gcc/testsuite/
            PR rtl-optimization/106594
            * gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
            targets.
            * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
            * gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
            * gcc.target/aarch64/bitfield-bitint-abi-align16.c: Add
            -fno-late-combine-instructions.
            * gcc.target/aarch64/bitfield-bitint-abi-align8.c: Likewise.
            * gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
            * gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
            * gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
            * gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
            described in the comment.
            * gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
            * gcc.target/aarch64/pr106594_1.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/114515] [15 Regression] Failure to use aarch64 lane forms after PR101523
  2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2024-06-24  7:43 ` cvs-commit at gcc dot gnu.org
@ 2024-06-24  8:17 ` rsandifo at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-06-24  8:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

Richard Sandiford <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #14 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-06-24  8:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
2024-03-28 10:05 ` [Bug rtl-optimization/114515] " rguenth at gcc dot gnu.org
2024-03-28 10:06 ` rguenth at gcc dot gnu.org
2024-03-28 10:09 ` segher at gcc dot gnu.org
2024-03-28 10:19 ` rsandifo at gcc dot gnu.org
2024-03-28 10:29 ` rsandifo at gcc dot gnu.org
2024-03-28 12:43 ` rsandifo at gcc dot gnu.org
2024-03-29 23:47 ` law at gcc dot gnu.org
2024-04-02  8:05 ` rguenth at gcc dot gnu.org
2024-04-02 18:42 ` rdapp at gcc dot gnu.org
2024-04-02 20:24 ` ewlu at rivosinc dot com
2024-04-02 20:45 ` law at gcc dot gnu.org
2024-04-03 15:20 ` tnfchris at gcc dot gnu.org
2024-04-10  6:01 ` [Bug rtl-optimization/114515] [15 " rguenth at gcc dot gnu.org
2024-06-16  3:28 ` law at gcc dot gnu.org
2024-06-24  7:43 ` cvs-commit at gcc dot gnu.org
2024-06-24  8:17 ` rsandifo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).