public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "rsandifo at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7
Date: Mon, 04 Mar 2024 12:07:36 +0000	[thread overview]
Message-ID: <bug-113441-4-keJFhamxL2@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-113441-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #35 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
Maybe I've misunderstood the flow of the ticket, but it looks to me like we do
still correctly recognise the truncating scatter stores.  And, on their own, we
would be able to convert them into masked scatters.

The reason for the epilogue is instead on the load side.  There we have a
non-strided grouped load, and currently we hard-code the assumption that it is
better to use contiguous loads and permutes rather than gather loads where
possible.  So we have:

      /* As a last resort, trying using a gather load or scatter store.

         ??? Although the code can handle all group sizes correctly,
         it probably isn't a win to use separate strided accesses based
         on nearby locations.  Or, even if it's a win over scalar code,
         it might not be a win over vectorizing at a lower VF, if that
         allows us to use contiguous accesses.  */
      if (*memory_access_type == VMAT_ELEMENTWISE
          && single_element_p
          && loop_vinfo
          && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
                                                 masked_p, gs_info))
        *memory_access_type = VMAT_GATHER_SCATTER;

only after we've tried and failed to use load lanes or load+permute.  If
instead I change the order so that the code above is tried first, then we do
use extending gather loads and truncating scatter stores as before, with no
epilogue loop.

So I suppose the question is: if we do prefer to use gathers over load+permute
for some cases, how do we decide which to use?  And can it be done a per-load
basis, or should it instead be a per-loop decision?  E.g., if we end up with a
loop that needs peeling for gaps, perhaps we should try again and forbid
peeling for gaps.  Then, if that succeeds, see which loop gives the better
overall cost.

Of course, trying more things means more compile time…

  parent reply	other threads:[~2024-03-04 12:07 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 12:38 [Bug c/113441] New: [14 Regression] Fail to fold the last element with multiple loop juzhe.zhong at rivai dot ai
2024-01-17 12:45 ` [Bug tree-optimization/113441] " juzhe.zhong at rivai dot ai
2024-01-17 13:22 ` [Bug tree-optimization/113441] [13/14 " rguenth at gcc dot gnu.org
2024-01-17 14:07 ` juzhe.zhong at rivai dot ai
2024-01-17 14:35 ` rguenth at gcc dot gnu.org
2024-01-22 12:38 ` juzhe.zhong at rivai dot ai
2024-01-22 12:41 ` tnfchris at gcc dot gnu.org
2024-01-22 12:42 ` juzhe.zhong at rivai dot ai
2024-01-22 13:19 ` juzhe.zhong at rivai dot ai
2024-01-22 13:52 ` [Bug tree-optimization/113441] [14 " rguenth at gcc dot gnu.org
2024-01-22 16:16 ` tnfchris at gcc dot gnu.org
2024-01-22 22:16 ` juzhe.zhong at rivai dot ai
2024-01-23  6:42 ` rguenth at gcc dot gnu.org
2024-01-23  8:15 ` juzhe.zhong at rivai dot ai
2024-01-23  8:17 ` rguenther at suse dot de
2024-01-23  8:25 ` juzhe.zhong at rivai dot ai
2024-01-23 10:29 ` rguenther at suse dot de
2024-01-23 10:30 ` tnfchris at gcc dot gnu.org
2024-01-23 12:32 ` tnfchris at gcc dot gnu.org
2024-01-23 12:50 ` rguenth at gcc dot gnu.org
2024-01-23 12:52 ` rguenth at gcc dot gnu.org
2024-01-23 12:56 ` rguenth at gcc dot gnu.org
2024-01-23 13:02 ` rguenth at gcc dot gnu.org
2024-01-23 13:05 ` tnfchris at gcc dot gnu.org
2024-01-23 13:12 ` tnfchris at gcc dot gnu.org
2024-01-23 13:21 ` juzhe.zhong at rivai dot ai
2024-01-23 13:28 ` tnfchris at gcc dot gnu.org
2024-02-22 16:18 ` [Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7 tnfchris at gcc dot gnu.org
2024-02-26  8:10 ` tnfchris at gcc dot gnu.org
2024-02-26  8:17 ` rguenther at suse dot de
2024-02-27  8:01 ` tnfchris at gcc dot gnu.org
2024-02-27  8:08 ` rguenth at gcc dot gnu.org
2024-02-29 22:18 ` rsandifo at gcc dot gnu.org
2024-03-01  9:44 ` rguenth at gcc dot gnu.org
2024-03-01  9:53 ` rsandifo at gcc dot gnu.org
2024-03-01 10:44 ` rguenther at suse dot de
2024-03-04 12:07 ` rsandifo at gcc dot gnu.org [this message]
2024-03-04 13:26 ` rsandifo at gcc dot gnu.org
2024-03-04 14:28 ` rguenth at gcc dot gnu.org
2024-03-04 14:48 ` rsandifo at gcc dot gnu.org
2024-03-04 15:01 ` rsandifo at gcc dot gnu.org
2024-03-04 15:10 ` rguenth at gcc dot gnu.org
2024-03-04 16:16 ` rsandifo at gcc dot gnu.org
2024-03-04 22:52 ` rsandifo at gcc dot gnu.org
2024-03-05  8:21 ` rguenther at suse dot de
2024-03-05 10:44 ` rguenth at gcc dot gnu.org
2024-03-07 20:50 ` law at gcc dot gnu.org
2024-05-07  7:43 ` [Bug tree-optimization/113441] [14/15 " rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-113441-4-keJFhamxL2@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).