public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523
@ 2024-03-28 10:01 rsandifo at gcc dot gnu.org
  2024-03-28 10:05 ` [Bug rtl-optimization/114515] " rguenth at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-03-28 10:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

            Bug ID: 114515
           Summary: [14 Regression] Failure to use aarch64 lane forms
                    after PR101523
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---

The following test regressed on aarch64 after
g:839bc42772ba7af66af3bd16efed4a69511312ae (the fix for PR101523):

typedef float v4sf __attribute__((vector_size(16)));
void f (v4sf *ptr, float f)
{
  ptr[0] = ptr[0] * (v4sf) { f, f, f, f };
  ptr[1] = ptr[1] * (v4sf) { f, f, f, f };
}

Compiled with -O2, we previously generated:

        ldp     q1, q31, [x0]
        fmul    v1.4s, v1.4s, v0.s[0]
        fmul    v31.4s, v31.4s, v0.s[0]
        stp     q1, q31, [x0]
        ret

Now we generate:

        ldp     q1, q31, [x0]
        dup     v0.4s, v0.s[0]
        fmul    v1.4s, v1.4s, v0.4s
        fmul    v31.4s, v31.4s, v0.4s
        stp     q1, q31, [x0]
        ret

with the extra dup.

The patch is trying to avoid cases where i3 is canonicalised by contextual
information provided by i2.  But here we place a full copy of i2 into i3
(creating an instruction that is no more expensive).  This is a benefit in its
own right because the two instructions can then execute in parallel rather than
serially.  But it also means that, as here, we might be able to remove i2 with
later combinations.

Perhaps we could also check whether i3 still contains the destination of i2?

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-04-10  6:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28 10:01 [Bug rtl-optimization/114515] New: [14 Regression] Failure to use aarch64 lane forms after PR101523 rsandifo at gcc dot gnu.org
2024-03-28 10:05 ` [Bug rtl-optimization/114515] " rguenth at gcc dot gnu.org
2024-03-28 10:06 ` rguenth at gcc dot gnu.org
2024-03-28 10:09 ` segher at gcc dot gnu.org
2024-03-28 10:19 ` rsandifo at gcc dot gnu.org
2024-03-28 10:29 ` rsandifo at gcc dot gnu.org
2024-03-28 12:43 ` rsandifo at gcc dot gnu.org
2024-03-29 23:47 ` law at gcc dot gnu.org
2024-04-02  8:05 ` rguenth at gcc dot gnu.org
2024-04-02 18:42 ` rdapp at gcc dot gnu.org
2024-04-02 20:24 ` ewlu at rivosinc dot com
2024-04-02 20:45 ` law at gcc dot gnu.org
2024-04-03 15:20 ` tnfchris at gcc dot gnu.org
2024-04-10  6:01 ` [Bug rtl-optimization/114515] [15 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).