public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "rsandifo at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7
Date: Mon, 04 Mar 2024 16:16:26 +0000	[thread overview]
Message-ID: <bug-113441-4-yZl7vygU6I@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-113441-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #41 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #40)
> So I wonder if we can use "local costing" to decide a gather is always OK
> compared to the alternative with peeling for gaps.  On x86 gather tends
> to be slow compared to open-coding it.
Yeah, on SVE gathers are generally “enabling” instructions rather than
something to use for their own sake.

I suppose one problem is that we currently only try to use gathers for
single-element groups.  If we make a local decision to use gathers while
keeping that restriction, we could end up using gathers “unnecessarily” while
still needing to peel for gaps for (say) a two-element group.

That is, it's only better to use gathers than contiguous loads if by doing that
we avoid all need to peel for gaps (and if the cost of peeling for gaps was
high enough to justify the cost of using gathers over consecutive loads).

One of the things on the list to do (once everything is SLP!) is to support
loads with gaps directly via predication, so that we never load elements that
aren't needed.  E.g. on SVE, a 64-bit predicate (PTRUE .D) can be used with a
32-bit load (LD1W .S) to load only even-indexed elements.  So a single-element
group with a group size of 2 could be done cheaply with just consecutive loads,
without peeling for gaps.

  parent reply	other threads:[~2024-03-04 16:16 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 12:38 [Bug c/113441] New: [14 Regression] Fail to fold the last element with multiple loop juzhe.zhong at rivai dot ai
2024-01-17 12:45 ` [Bug tree-optimization/113441] " juzhe.zhong at rivai dot ai
2024-01-17 13:22 ` [Bug tree-optimization/113441] [13/14 " rguenth at gcc dot gnu.org
2024-01-17 14:07 ` juzhe.zhong at rivai dot ai
2024-01-17 14:35 ` rguenth at gcc dot gnu.org
2024-01-22 12:38 ` juzhe.zhong at rivai dot ai
2024-01-22 12:41 ` tnfchris at gcc dot gnu.org
2024-01-22 12:42 ` juzhe.zhong at rivai dot ai
2024-01-22 13:19 ` juzhe.zhong at rivai dot ai
2024-01-22 13:52 ` [Bug tree-optimization/113441] [14 " rguenth at gcc dot gnu.org
2024-01-22 16:16 ` tnfchris at gcc dot gnu.org
2024-01-22 22:16 ` juzhe.zhong at rivai dot ai
2024-01-23  6:42 ` rguenth at gcc dot gnu.org
2024-01-23  8:15 ` juzhe.zhong at rivai dot ai
2024-01-23  8:17 ` rguenther at suse dot de
2024-01-23  8:25 ` juzhe.zhong at rivai dot ai
2024-01-23 10:29 ` rguenther at suse dot de
2024-01-23 10:30 ` tnfchris at gcc dot gnu.org
2024-01-23 12:32 ` tnfchris at gcc dot gnu.org
2024-01-23 12:50 ` rguenth at gcc dot gnu.org
2024-01-23 12:52 ` rguenth at gcc dot gnu.org
2024-01-23 12:56 ` rguenth at gcc dot gnu.org
2024-01-23 13:02 ` rguenth at gcc dot gnu.org
2024-01-23 13:05 ` tnfchris at gcc dot gnu.org
2024-01-23 13:12 ` tnfchris at gcc dot gnu.org
2024-01-23 13:21 ` juzhe.zhong at rivai dot ai
2024-01-23 13:28 ` tnfchris at gcc dot gnu.org
2024-02-22 16:18 ` [Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7 tnfchris at gcc dot gnu.org
2024-02-26  8:10 ` tnfchris at gcc dot gnu.org
2024-02-26  8:17 ` rguenther at suse dot de
2024-02-27  8:01 ` tnfchris at gcc dot gnu.org
2024-02-27  8:08 ` rguenth at gcc dot gnu.org
2024-02-29 22:18 ` rsandifo at gcc dot gnu.org
2024-03-01  9:44 ` rguenth at gcc dot gnu.org
2024-03-01  9:53 ` rsandifo at gcc dot gnu.org
2024-03-01 10:44 ` rguenther at suse dot de
2024-03-04 12:07 ` rsandifo at gcc dot gnu.org
2024-03-04 13:26 ` rsandifo at gcc dot gnu.org
2024-03-04 14:28 ` rguenth at gcc dot gnu.org
2024-03-04 14:48 ` rsandifo at gcc dot gnu.org
2024-03-04 15:01 ` rsandifo at gcc dot gnu.org
2024-03-04 15:10 ` rguenth at gcc dot gnu.org
2024-03-04 16:16 ` rsandifo at gcc dot gnu.org [this message]
2024-03-04 22:52 ` rsandifo at gcc dot gnu.org
2024-03-05  8:21 ` rguenther at suse dot de
2024-03-05 10:44 ` rguenth at gcc dot gnu.org
2024-03-07 20:50 ` law at gcc dot gnu.org
2024-05-07  7:43 ` [Bug tree-optimization/113441] [14/15 " rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-113441-4-yZl7vygU6I@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).