[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "rsandifo at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7
Date: Mon, 04 Mar 2024 14:48:37 +0000	[thread overview]
Message-ID: <bug-113441-4-AKOzUBB9p8@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-113441-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #38 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #37)
> Even more iteration looks bad.  I do wonder why when gather can avoid
> peeling for GAPs using load-lanes cannot?
Like you say, we don't realise that all the loads from array3[i] form a single
group.

Note that we're not using load-lanes in either case, since the group size (8)
is too big for that.  But load-lanes and load-and-permute have the same
restriction about when peeling for gaps is required.

In contrast, gather loads only ever load data that they actually need.

> Also for the stores we seem to use elementwise stores rather than store-lanes.
What configuration are you trying?  The original report was about SVE, so I was
trying that.  There we use a scatter store.

> To me the most obvious thing to try optimizing in this testcase is DR
> analysis.  With -march=armv8.3-a I still see
> 
> t.c:26:22: note:   === vect_analyze_data_ref_accesses ===
> t.c:26:22: note:   Detected single element interleaving array1[0][_8] step 4
> t.c:26:22: note:   Detected single element interleaving array1[1][_8] step 4
> t.c:26:22: note:   Detected single element interleaving array1[2][_8] step 4
> t.c:26:22: note:   Detected single element interleaving array1[3][_8] step 4
> t.c:26:22: note:   Detected single element interleaving array1[0][_1] step 4
> t.c:26:22: note:   Detected single element interleaving array1[1][_1] step 4
> t.c:26:22: note:   Detected single element interleaving array1[2][_1] step 4
> t.c:26:22: note:   Detected single element interleaving array1[3][_1] step 4
> t.c:26:22: missed:   not consecutive access array2[_4][_8] = _69;
> t.c:26:22: note:   using strided accesses
> t.c:26:22: missed:   not consecutive access array2[_4][_1] = _67;
> t.c:26:22: note:   using strided accesses
> 
> so we don't figure
> 
> Creating dr for array1[0][_1]
>         base_address: &array1
>         offset from base address: (ssizetype) ((sizetype) (m_111 * 2) * 2)
>         constant offset from base address: 0
>         step: 4
>         base alignment: 16
>         base misalignment: 0
>         offset alignment: 4
>         step alignment: 4
>         base_object: array1
>         Access function 0: {m_111 * 2, +, 2}<nw>_4
>         Access function 1: 0
> Creating dr for array1[0][_8]
> analyze_innermost: success.
>         base_address: &array1
>         offset from base address: (ssizetype) ((sizetype) (m_111 * 2 + 1) *
> 2)
>         constant offset from base address: 0
>         step: 4
>         base alignment: 16
>         base misalignment: 0
>         offset alignment: 2
>         step alignment: 4
>         base_object: array1
>         Access function 0: {m_111 * 2 + 1, +, 2}<nw>_4
>         Access function 1: 0
> 
> belong to the same group (but the access functions tell us it worked out).
> Above we fail to split the + 1 to the constant offset.
OK, but this is moving the question on to how we should optimise the testcase
for Advanced SIMD rather than SVE, and how we should optimise the testcase in
general, rather than simply recover what we could do before.  (SVE is only
enabled for -march=arvm9-a and above, in case armv8.3-a was intended to enable
SVE too.)

next prev parent reply	other threads:[~2024-03-04 14:48 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 12:38 [Bug c/113441] New: [14 Regression] Fail to fold the last element with multiple loop juzhe.zhong at rivai dot ai
2024-01-17 12:45 ` [Bug tree-optimization/113441] " juzhe.zhong at rivai dot ai
2024-01-17 13:22 ` [Bug tree-optimization/113441] [13/14 " rguenth at gcc dot gnu.org
2024-01-17 14:07 ` juzhe.zhong at rivai dot ai
2024-01-17 14:35 ` rguenth at gcc dot gnu.org
2024-01-22 12:38 ` juzhe.zhong at rivai dot ai
2024-01-22 12:41 ` tnfchris at gcc dot gnu.org
2024-01-22 12:42 ` juzhe.zhong at rivai dot ai
2024-01-22 13:19 ` juzhe.zhong at rivai dot ai
2024-01-22 13:52 ` [Bug tree-optimization/113441] [14 " rguenth at gcc dot gnu.org
2024-01-22 16:16 ` tnfchris at gcc dot gnu.org
2024-01-22 22:16 ` juzhe.zhong at rivai dot ai
2024-01-23  6:42 ` rguenth at gcc dot gnu.org
2024-01-23  8:15 ` juzhe.zhong at rivai dot ai
2024-01-23  8:17 ` rguenther at suse dot de
2024-01-23  8:25 ` juzhe.zhong at rivai dot ai
2024-01-23 10:29 ` rguenther at suse dot de
2024-01-23 10:30 ` tnfchris at gcc dot gnu.org
2024-01-23 12:32 ` tnfchris at gcc dot gnu.org
2024-01-23 12:50 ` rguenth at gcc dot gnu.org
2024-01-23 12:52 ` rguenth at gcc dot gnu.org
2024-01-23 12:56 ` rguenth at gcc dot gnu.org
2024-01-23 13:02 ` rguenth at gcc dot gnu.org
2024-01-23 13:05 ` tnfchris at gcc dot gnu.org
2024-01-23 13:12 ` tnfchris at gcc dot gnu.org
2024-01-23 13:21 ` juzhe.zhong at rivai dot ai
2024-01-23 13:28 ` tnfchris at gcc dot gnu.org
2024-02-22 16:18 ` [Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7 tnfchris at gcc dot gnu.org
2024-02-26  8:10 ` tnfchris at gcc dot gnu.org
2024-02-26  8:17 ` rguenther at suse dot de
2024-02-27  8:01 ` tnfchris at gcc dot gnu.org
2024-02-27  8:08 ` rguenth at gcc dot gnu.org
2024-02-29 22:18 ` rsandifo at gcc dot gnu.org
2024-03-01  9:44 ` rguenth at gcc dot gnu.org
2024-03-01  9:53 ` rsandifo at gcc dot gnu.org
2024-03-01 10:44 ` rguenther at suse dot de
2024-03-04 12:07 ` rsandifo at gcc dot gnu.org
2024-03-04 13:26 ` rsandifo at gcc dot gnu.org
2024-03-04 14:28 ` rguenth at gcc dot gnu.org
2024-03-04 14:48 ` rsandifo at gcc dot gnu.org [this message]
2024-03-04 15:01 ` rsandifo at gcc dot gnu.org
2024-03-04 15:10 ` rguenth at gcc dot gnu.org
2024-03-04 16:16 ` rsandifo at gcc dot gnu.org
2024-03-04 22:52 ` rsandifo at gcc dot gnu.org
2024-03-05  8:21 ` rguenther at suse dot de
2024-03-05 10:44 ` rguenth at gcc dot gnu.org
2024-03-07 20:50 ` law at gcc dot gnu.org
2024-05-07  7:43 ` [Bug tree-optimization/113441] [14/15 " rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-113441-4-AKOzUBB9p8@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).