public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/109072] [12/13 Regression] SLP costs for vec duplicate too high since g:4963079769c99c4073adfd799885410ad484cbbe
Date: Tue, 28 Mar 2023 11:35:50 +0000	[thread overview]
Message-ID: <bug-109072-4-0rldLaeMjj@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-109072-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109072

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:fcb411564a655a01f759eea3bb16bfd1bc879bfd

commit r13-6903-gfcb411564a655a01f759eea3bb16bfd1bc879bfd
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Tue Mar 28 12:34:51 2023 +0100

    aarch64: Restore vectorisation of vld1 inputs [PR109072]

    Before GCC 12, we would vectorize:

      int32_t arr[] = { x, x, x, x };

    at -O3.  Vectorizing the store on its own is often a loss, particularly
    for integers, so g:4963079769c99c4073adfd799885410ad484cbbe suppressed it.
    This was necessary to fix regressions from enabling vectorisation at -O2,

    However, the vectorisation is important if the code subsequently loads
    from the array using vld1:

      return vld1q_s32 (arr);

    This approach of initialising an array and loading from it is the
    recommend endian-agnostic way of constructing an ACLE vector.

    As discussed in the PR notes, the general fix would be to fold the
    store and load-back to a constructor (preferably before vectorisation).
    But that's clearly not stage 4 material.

    This patch instead delays folding vld1 until after inlining and
    records which decls a vld1 loads from.  It then treats vector
    stores to those decls as free, on the optimistic assumption that
    they will be removed later.  The patch also brute-forces
    vectorization of plain constructor+store sequences, since some
    of the CPU costs make that (dubiously) expensive even when the
    store is discounted.

    Delaying folding showed that we were failing to update the vops.
    The patch fixes that too.

    Thanks to Tamar for discussion & help with testing.

    gcc/
            PR target/109072
            * config/aarch64/aarch64-protos.h (aarch64_vector_load_decl):
Declare.
            * config/aarch64/aarch64.h (machine_function::vector_load_decls):
New
            variable.
            * config/aarch64/aarch64-builtins.cc
(aarch64_record_vector_load_arg):
            New function.
            (aarch64_general_gimple_fold_builtin): Delay folding of vld1 until
            after inlining.  Record which decls are loaded from.  Fix handling
            of vops for loads and stores.
            * config/aarch64/aarch64.cc (aarch64_vector_load_decl): New
function.
            (aarch64_accesses_vector_load_decl_p): Likewise.
            (aarch64_vector_costs::m_stores_to_vector_load_decl): New member
            variable.
            (aarch64_vector_costs::add_stmt_cost): If the function has a vld1
            that loads from a decl, treat vector stores to those decls as
            zero cost.
            (aarch64_vector_costs::finish_cost): ...and in that case,
            if the vector code does nothing more than a store, give the
            prologue a zero cost as well.

    gcc/testsuite/
            PR target/109072
            * gcc.target/aarch64/pr109072_1.c: New test.
            * gcc.target/aarch64/pr109072_2.c: Likewise.

  parent reply	other threads:[~2023-03-28 11:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-08 22:32 [Bug target/109072] New: " tnfchris at gcc dot gnu.org
2023-03-09 10:19 ` [Bug target/109072] " rguenth at gcc dot gnu.org
2023-03-09 14:13 ` rsandifo at gcc dot gnu.org
2023-03-09 14:30 ` tnfchris at gcc dot gnu.org
2023-03-09 14:46 ` rsandifo at gcc dot gnu.org
2023-03-09 15:04 ` tnfchris at gcc dot gnu.org
2023-03-09 16:22 ` rsandifo at gcc dot gnu.org
2023-03-09 17:35 ` rsandifo at gcc dot gnu.org
2023-03-10  7:40 ` rguenth at gcc dot gnu.org
2023-03-10  8:50 ` rsandifo at gcc dot gnu.org
2023-03-15 14:29 ` rguenth at gcc dot gnu.org
2023-03-28 11:35 ` cvs-commit at gcc dot gnu.org [this message]
2023-03-28 12:59 ` [Bug target/109072] [12 " rsandifo at gcc dot gnu.org
2023-04-03  8:58 ` cvs-commit at gcc dot gnu.org
2023-04-03  9:03 ` rsandifo at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-109072-4-0rldLaeMjj@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).