public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/113860] New: SVE popcount can be used for 16bit, 32bit and 64bit
@ 2024-02-10  2:48 pinskia at gcc dot gnu.org
  2024-02-10  3:35 ` [Bug target/113860] " pinskia at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-10  2:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113860

            Bug ID: 113860
           Summary: SVE popcount can be used for 16bit, 32bit and 64bit
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
```
void f(unsigned long *  __restrict b, unsigned long * __restrict d)
{
    d[0]  = __builtin_popcountll(b[0]);
}

```

Currently with `-march=armv9-a`, GCC produces:
```
        ldr     d31, [x0]
        cnt     v31.8b, v31.8b
        addv    b31, v31.8b
        str     d31, [x1]
```

But I think we could do:
```
        ptrue   p6.b, all
        ldr     d31, [x0]
        cnt     z31.d, p6/m, z31.d
        str     d31, [x1]
```

Instead, especially if this is inside a loop (not vectorized), as p6.b
assignment could be pulled out. Or something similar to that.

Likewise for short (.h) and int (.b).

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug target/113860] SVE popcount can be used for 16bit, 32bit and 64bit
  2024-02-10  2:48 [Bug target/113860] New: SVE popcount can be used for 16bit, 32bit and 64bit pinskia at gcc dot gnu.org
@ 2024-02-10  3:35 ` pinskia at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-10  3:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113860

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
SVE instructions can also be used for V4HI/V8HI/V2SI/V4SI so the SLP vectorizer
can use them.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-02-10  3:35 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-10  2:48 [Bug target/113860] New: SVE popcount can be used for 16bit, 32bit and 64bit pinskia at gcc dot gnu.org
2024-02-10  3:35 ` [Bug target/113860] " pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).