public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases
       [not found] <bug-106694-4@http.gcc.gnu.org/bugzilla/>
@ 2022-10-27 23:00 ` pinskia at gcc dot gnu.org
  2022-10-27 23:12 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-27 23:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106694

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 107445 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases
       [not found] <bug-106694-4@http.gcc.gnu.org/bugzilla/>
  2022-10-27 23:00 ` [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases pinskia at gcc dot gnu.org
@ 2022-10-27 23:12 ` pinskia at gcc dot gnu.org
  2022-10-27 23:15 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-27 23:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106694

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect you could get a similar testcase with ARM neon too.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases
       [not found] <bug-106694-4@http.gcc.gnu.org/bugzilla/>
  2022-10-27 23:00 ` [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases pinskia at gcc dot gnu.org
  2022-10-27 23:12 ` pinskia at gcc dot gnu.org
@ 2022-10-27 23:15 ` pinskia at gcc dot gnu.org
  2023-11-07 22:06 ` rsandifo at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-27 23:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106694

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=89967
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2022-10-27

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Oh yes PR 89967.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases
       [not found] <bug-106694-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2022-10-27 23:15 ` pinskia at gcc dot gnu.org
@ 2023-11-07 22:06 ` rsandifo at gcc dot gnu.org
  2023-11-07 22:08 ` juzhe.zhong at rivai dot ai
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-11-07 22:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106694

Richard Sandiford <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rsandifo at gcc dot gnu.org
             Status|NEW                         |ASSIGNED
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #10 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
Some of the SME changes I'm working on fix this, but I'm not sure how widely
we'll be able to use them on non-SME code.  Assigning myself just in case.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases
       [not found] <bug-106694-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2023-11-07 22:06 ` rsandifo at gcc dot gnu.org
@ 2023-11-07 22:08 ` juzhe.zhong at rivai dot ai
  2023-11-08  3:58 ` juzhe.zhong at rivai dot ai
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-07 22:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106694

--- Comment #11 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Sandiford from comment #10)
> Some of the SME changes I'm working on fix this, but I'm not sure how widely
> we'll be able to use them on non-SME code.  Assigning myself just in case.

Hi, Richard. We are going to fix subreg issue by subreg liveness tracking on
IRA/LRA.
Hopefully today.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases
       [not found] <bug-106694-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2023-11-07 22:08 ` juzhe.zhong at rivai dot ai
@ 2023-11-08  3:58 ` juzhe.zhong at rivai dot ai
  2023-12-07 19:41 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-08  3:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106694

--- Comment #12 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Sandiford from comment #10)
> Some of the SME changes I'm working on fix this, but I'm not sure how widely
> we'll be able to use them on non-SME code.  Assigning myself just in case.

Hi, Richard. My colleague Lehua has sent patches for general subreg liveness
tracking.
We are sure it can fixed all subreg issue of RVV and ARM SVE.

Not sure SME codes. We weren't able to test it.

This is general optimization. Hope we can be possible to make it landed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases
       [not found] <bug-106694-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2023-11-08  3:58 ` juzhe.zhong at rivai dot ai
@ 2023-12-07 19:41 ` cvs-commit at gcc dot gnu.org
  2023-12-07 19:52 ` rsandifo at gcc dot gnu.org
  2024-02-27  8:37 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-07 19:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106694

--- Comment #13 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:9f0f7d802482a8958d6cdc72f1fe0c8549db2182

commit r14-6290-g9f0f7d802482a8958d6cdc72f1fe0c8549db2182
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Thu Dec 7 19:41:19 2023 +0000

    aarch64: Add an early RA for strided registers

    This pass adds a simple register allocator for FP & SIMD registers.
    Its main purpose is to make use of SME2's strided LD1, ST1 and LUTI2/4
    instructions, which require a very specific grouping structure,
    and so would be difficult to exploit with general allocation.

    The allocator is very simple.  It gives up on anything that would
    require spilling, or that it might not handle well for other reasons.

    The allocator needs to track liveness at the level of individual FPRs.
    Doing that fixes a lot of the PRs relating to redundant moves caused by
    structure loads and stores.  That particular problem is going to be
    fixed more generally for GCC 15 by Lehua's RA patches.

    However, the early-RA pass runs before scheduling, so it has a chance
    to bag a spill-free allocation of vector code before the scheduler moves
    things around.  It could therefore still be useful for non-SME code
    (e.g. for hand-scheduled ACLE code) even after Lehua's patches are in.

    The pass is controlled by a tristate switch:

    - -mearly-ra=all: run on all functions
    - -mearly-ra=strided: run on functions that have access to strided
registers
    - -mearly-ra=none: don't run on any function

    The patch makes -mearly-ra=all the default at -O2 and above for now.
    We can revisit this for GCC 15 once Lehua's patches are in;
    -mearly-ra=strided might then be more appropriate.

    As said previously, the pass is very naive.  There's much more that we
    could do, such as handling invariants better.  The main focus is on not
    committing to a bad allocation, rather than on handling as much as
    possible.

    gcc/
            PR rtl-optimization/106694
            PR rtl-optimization/109078
            PR rtl-optimization/109391
            * config.gcc: Add aarch64-early-ra.o for AArch64 targets.
            * config/aarch64/t-aarch64 (aarch64-early-ra.o): New rule.
            * config/aarch64/aarch64-opts.h (aarch64_early_ra_scope): New enum.
            * config/aarch64/aarch64.opt (mearly_ra): New option.
            * doc/invoke.texi: Document it.
            * common/config/aarch64/aarch64-common.cc
            (aarch_option_optimization_table): Use -mearly-ra=strided by
            default for -O2 and above.
            * config/aarch64/aarch64-passes.def (pass_aarch64_early_ra): New
pass.
            * config/aarch64/aarch64-protos.h (aarch64_strided_registers_p)
            (make_pass_aarch64_early_ra): Declare.
            * config/aarch64/aarch64-sme.md
(@aarch64_sme_lut<LUTI_BITS><mode>):
            Add a stride_type attribute.
            (@aarch64_sme_lut<LUTI_BITS><mode>_strided2): New pattern.
            (@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise.
            * config/aarch64/aarch64-sve-builtins-base.cc (svld1_impl::expand)
            (svldnt1_impl::expand, svst1_impl::expand, svstn1_impl::expand):
Handle
            new way of defining multi-register loads and stores.
            * config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>)
            (@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>)
            (@aarch64_stnt1<SVE_FULLx24:mode>): Delete.
            * config/aarch64/aarch64-sve2.md (@aarch64_<LD1_COUNT:optab><mode>)
            (@aarch64_<LD1_COUNT:optab><mode>_strided2): New patterns.
            (@aarch64_<LD1_COUNT:optab><mode>_strided4): Likewise.
            (@aarch64_<ST1_COUNT:optab><mode>): Likewise.
            (@aarch64_<ST1_COUNT:optab><mode>_strided2): Likewise.
            (@aarch64_<ST1_COUNT:optab><mode>_strided4): Likewise.
            * config/aarch64/aarch64.cc (aarch64_strided_registers_p): New
            function.
            * config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): Delete.
            (UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
            (UNSPEC_STNT1_SVE_COUNT): Likewise.
            (stride_type): New attribute.
            * config/aarch64/constraints.md (Uwd, Uwt): New constraints.
            * config/aarch64/iterators.md (UNSPEC_LD1_COUNT,
UNSPEC_LDNT1_COUNT)
            (UNSPEC_ST1_COUNT, UNSPEC_STNT1_COUNT): New unspecs.
            (optab): Handle them.
            (LD1_COUNT, ST1_COUNT): New iterators.
            * config/aarch64/aarch64-early-ra.cc: New file.

    gcc/testsuite/
            PR rtl-optimization/106694
            PR rtl-optimization/109078
            PR rtl-optimization/109391
            * gcc.target/aarch64/ldp_stp_16.c (cons4_4_float): Tighten expected
            output test.
            * gcc.target/aarch64/sve/shift_1.c: Allow reversed shifts for .s
            as well as .d.
            * gcc.target/aarch64/sme/strided_1.c: New test.
            * gcc.target/aarch64/pr109078.c: Likewise.
            * gcc.target/aarch64/pr109391.c: Likewise.
            * gcc.target/aarch64/sve/pr106694.c: Likewise.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases
       [not found] <bug-106694-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2023-12-07 19:41 ` cvs-commit at gcc dot gnu.org
@ 2023-12-07 19:52 ` rsandifo at gcc dot gnu.org
  2024-02-27  8:37 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 9+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-12-07 19:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106694

Richard Sandiford <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #14 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
Fix for this case.  The patch only deals with cases that can be allocated
without spilling, but Lehua has a more general fix that should go into GCC 15.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases
       [not found] <bug-106694-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2023-12-07 19:52 ` rsandifo at gcc dot gnu.org
@ 2024-02-27  8:37 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-27  8:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106694
Bug 106694 depends on bug 99161, which changed state.

Bug 99161 Summary: Suboptimal SVE code for ld4/st4 MLA code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99161

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-02-27  8:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-106694-4@http.gcc.gnu.org/bugzilla/>
2022-10-27 23:00 ` [Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases pinskia at gcc dot gnu.org
2022-10-27 23:12 ` pinskia at gcc dot gnu.org
2022-10-27 23:15 ` pinskia at gcc dot gnu.org
2023-11-07 22:06 ` rsandifo at gcc dot gnu.org
2023-11-07 22:08 ` juzhe.zhong at rivai dot ai
2023-11-08  3:58 ` juzhe.zhong at rivai dot ai
2023-12-07 19:41 ` cvs-commit at gcc dot gnu.org
2023-12-07 19:52 ` rsandifo at gcc dot gnu.org
2024-02-27  8:37 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).