[Bug target/109499] New: Unnecessary zeroing in SVE loops

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/109499] New: Unnecessary zeroing in SVE loops
@ 2023-04-13 12:35 rsandifo at gcc dot gnu.org
  2023-04-13 13:06 ` [Bug target/109499] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-04-13 12:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109499

            Bug ID: 109499
           Summary: Unnecessary zeroing in SVE loops
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*-*-*

The following two loops contain unnecessary zeroing operations:

// -march=armv8.2-a+sve -O2
void
f (int *__restrict x, int *__restrict y, int n)
{
  for (int i = 0; i < n; i++)
    x[i] = x[i] ? y[i] : 0;
}

void
g (int *__restrict x, int *__restrict y, int n)
{
  for (int i = 0; i < n; i++)
    x[i] = x[i] ? y[i] & 15 : 0;
}

Output:

f(int*, int*, int):
        cmp     w2, 0
        ble     .L1
        mov     x3, 0
        cntw    x4
        whilelo p0.s, wzr, w2
        mov     z1.b, #0
.L3:
        ld1w    z0.s, p0/z, [x0, x3, lsl 2]
        cmpne   p1.s, p0/z, z0.s, #0
        ld1w    z0.s, p1/z, [x1, x3, lsl 2]   // Sets inactive lanes to zero
        sel     z0.s, p1, z0.s, z1.s          // Not needed
        st1w    z0.s, p0, [x0, x3, lsl 2]
        add     x3, x3, x4
        whilelo p0.s, w3, w2
        b.any   .L3
.L1:
        ret
g(int*, int*, int):
        cmp     w2, 0
        ble     .L6
        mov     x3, 0
        cntw    x4
        whilelo p0.s, wzr, w2
        mov     z1.s, #15
.L8:
        ld1w    z0.s, p0/z, [x0, x3, lsl 2]
        cmpne   p1.s, p0/z, z0.s, #0
        ld1w    z0.s, p1/z, [x1, x3, lsl 2]   // Sets inactive lanes to zero
        movprfx z0.s, p1/z, z0.s              // Not needed
        and     z0.s, p1/m, z0.s, z1.s        // Could be AND (immediate)
        st1w    z0.s, p0, [x0, x3, lsl 2]
        add     x3, x3, x4
        whilelo p0.s, w3, w2
        b.any   .L8
.L6:
        ret

It would be good to model somehow that IFN_MASK_LOAD has a zeroing effect on
AArch64, so that this is exposed at the gimple level.  At the same time, we
probably don't want the behaviour of the ifn to depend on target hooks.  Not
sure what the best design is here.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109499] Unnecessary zeroing in SVE loops
  2023-04-13 12:35 [Bug target/109499] New: Unnecessary zeroing in SVE loops rsandifo at gcc dot gnu.org
@ 2023-04-13 13:06 ` rguenth at gcc dot gnu.org
  2023-04-13 13:14 ` rsandifo at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-13 13:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109499

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Is there not enough info to catch this on the RTL level with a peephole?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109499] Unnecessary zeroing in SVE loops
  2023-04-13 12:35 [Bug target/109499] New: Unnecessary zeroing in SVE loops rsandifo at gcc dot gnu.org
  2023-04-13 13:06 ` [Bug target/109499] " rguenth at gcc dot gnu.org
@ 2023-04-13 13:14 ` rsandifo at gcc dot gnu.org
  2023-04-13 13:21 ` rguenther at suse dot de
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-04-13 13:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109499

--- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> Is there not enough info to catch this on the RTL level with a peephole?
That works for simple cases like the first loop.  But in general, I think we
want the full power of gimple to push the information down.  The second loop is
one example of that, but in general, there could be a chain of operations that
naturally do the right thing for inactive lanes.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109499] Unnecessary zeroing in SVE loops
  2023-04-13 12:35 [Bug target/109499] New: Unnecessary zeroing in SVE loops rsandifo at gcc dot gnu.org
  2023-04-13 13:06 ` [Bug target/109499] " rguenth at gcc dot gnu.org
  2023-04-13 13:14 ` rsandifo at gcc dot gnu.org
@ 2023-04-13 13:21 ` rguenther at suse dot de
  2023-04-13 14:19 ` rsandifo at gcc dot gnu.org
  2024-02-22 19:43 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenther at suse dot de @ 2023-04-13 13:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109499

--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 13 Apr 2023, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109499
> 
> --- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #1)
> > Is there not enough info to catch this on the RTL level with a peephole?
> That works for simple cases like the first loop.  But in general, I think we
> want the full power of gimple to push the information down.  The second loop is
> one example of that, but in general, there could be a chain of operations that
> naturally do the right thing for inactive lanes.

AVX512 masking allows merge and zero modes, zero being cheaper 
(obviously).  I think "zero" is what all targets support so we could
define GIMPLE to be that way - inactive lanes become zero.  That's
then also less of a "partial definition" and "undefined" should be
avoided at best?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109499] Unnecessary zeroing in SVE loops
  2023-04-13 12:35 [Bug target/109499] New: Unnecessary zeroing in SVE loops rsandifo at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-04-13 13:21 ` rguenther at suse dot de
@ 2023-04-13 14:19 ` rsandifo at gcc dot gnu.org
  2024-02-22 19:43 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-04-13 14:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109499

--- Comment #4 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #3)
> AVX512 masking allows merge and zero modes, zero being cheaper 
> (obviously).  I think "zero" is what all targets support so we could
> define GIMPLE to be that way - inactive lanes become zero.  That's
> then also less of a "partial definition" and "undefined" should be
> avoided at best?
Thanks, sounds good to me.  If direct support for merging turns out
to be useful in future, maybe we could add the value of inactive lanes
as an extra parameter at that point.  Would be quite an invasive change,
but it would just be work.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109499] Unnecessary zeroing in SVE loops
  2023-04-13 12:35 [Bug target/109499] New: Unnecessary zeroing in SVE loops rsandifo at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-04-13 14:19 ` rsandifo at gcc dot gnu.org
@ 2024-02-22 19:43 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-22 19:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109499

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2024-02-22
                 CC|                            |pinskia at gcc dot gnu.org

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-02-22 19:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-13 12:35 [Bug target/109499] New: Unnecessary zeroing in SVE loops rsandifo at gcc dot gnu.org
2023-04-13 13:06 ` [Bug target/109499] " rguenth at gcc dot gnu.org
2023-04-13 13:14 ` rsandifo at gcc dot gnu.org
2023-04-13 13:21 ` rguenther at suse dot de
2023-04-13 14:19 ` rsandifo at gcc dot gnu.org
2024-02-22 19:43 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).