public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/99195] New: Optimise away vec_concat of 64-bit AdvancedSIMD operations with zeroes in aarch64
@ 2021-02-22 10:27 ktkachov at gcc dot gnu.org
  2021-03-04 11:55 ` [Bug target/99195] " ktkachov at gcc dot gnu.org
                   ` (20 more replies)
  0 siblings, 21 replies; 22+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-02-22 10:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99195

            Bug ID: 99195
           Summary: Optimise away vec_concat of 64-bit AdvancedSIMD
                    operations with zeroes in aarch64
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Motivating testcases:
#include <arm_neon.h>

#define ONE(OT,IT,OP,S)                         \
OT                                              \
foo_##OP##_##S (IT a, IT b)                     \
{                                               \
  IT zeros = vcreate_##S (0);                   \
  return vcombine_##S (v##OP##_##S (a, b), zeros);      \
}


#define FUNC(T,IS,OS,OP,S) ONE (T##x##OS##_t, T##x##IS##_t, OP, S)

#define OPTWO(T,IS,OS,S,OP1,OP2)        \
FUNC (T, IS, OS, OP1, S)                \
FUNC (T, IS, OS, OP2, S)

#define OPTHREE(T, IS, OS, S, OP1, OP2, OP3)    \
FUNC (T, IS, OS, OP1, S)        \
OPTWO (T, IS, OS, S, OP2, OP3)

#define OPFOUR(T,IS,OS,S,OP1,OP2,OP3,OP4)       \
FUNC (T, IS, OS, OP1, S)                \
OPTHREE (T, IS, OS, S, OP2, OP3, OP4)

#define OPFIVE(T,IS,OS,S,OP1,OP2,OP3,OP4, OP5)  \
FUNC (T, IS, OS, OP1, S)                \
OPFOUR (T, IS, OS, S, OP2, OP3, OP4, OP5)

#define OPSIX(T,IS,OS,S,OP1,OP2,OP3,OP4,OP5,OP6)        \
FUNC (T, IS, OS, OP1, S)                \
OPFIVE (T, IS, OS, S, OP2, OP3, OP4, OP5, OP6)

OPSIX (int8, 8, 16, s8, add, sub, mul, and, orr, eor)
OPSIX (int16, 4, 8, s16, add, sub, mul, and, orr, eor)
OPSIX (int32, 2, 4, s32, add, sub, mul, and, orr, eor)
OPFIVE (int64, 1, 2, s64, add, sub, and, orr, eor)

OPSIX (uint8, 8, 16, u8, add, sub, mul, and, orr, eor)
OPSIX (uint16, 4, 8, u16, add, sub, mul, and, orr, eor)
OPSIX (uint32, 2, 4, u32, add, sub, mul, and, orr, eor)
OPFIVE (uint64, 1, 2, u64, add, sub, and, orr, eor)

for example generates:
foo_add_s8:
        add     v0.8b, v0.8b, v1.8b
        mov     v0.8b, v0.8b
        ret

The 64-bit V8QI ADD instruction implicitly zeroes out the top bits of the
128-bit destination so the vec_concat with zeroes can be represented easily.
However we don't have such pattern for all the AdvancedSIMd operations that we
support. Indeed, it would bloat the MD files quite a bit. Can we come up with a
define_subst scheme to auto-generate the patterns to match things like:
(set (reg:V16QI 93 [ <retval> ])
    (vec_concat:V16QI (plus:V8QI (reg:V8QI 98)
            (reg:V8QI 99))
        (const_vector:V8QI [
                (const_int 0 [0]) repeated x8
            ])))
?
Then we should be able to just generate:
foo_add_s8:
        add     v0.8b, v0.8b, v1.8b
        ret
etc.
The testcase above shows the problem for some of the simple binary ops, but
there are many more instructions that can benefit from this.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-04-04  8:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-22 10:27 [Bug target/99195] New: Optimise away vec_concat of 64-bit AdvancedSIMD operations with zeroes in aarch64 ktkachov at gcc dot gnu.org
2021-03-04 11:55 ` [Bug target/99195] " ktkachov at gcc dot gnu.org
2021-03-07  2:05 ` pinskia at gcc dot gnu.org
2023-04-21 17:57 ` cvs-commit at gcc dot gnu.org
2023-04-23 13:41 ` cvs-commit at gcc dot gnu.org
2023-04-25 13:55 ` cvs-commit at gcc dot gnu.org
2023-04-28  8:34 ` cvs-commit at gcc dot gnu.org
2023-05-03 10:16 ` cvs-commit at gcc dot gnu.org
2023-05-03 10:18 ` cvs-commit at gcc dot gnu.org
2023-05-04  8:45 ` cvs-commit at gcc dot gnu.org
2023-05-04  8:45 ` cvs-commit at gcc dot gnu.org
2023-05-10  9:42 ` cvs-commit at gcc dot gnu.org
2023-05-10 10:51 ` cvs-commit at gcc dot gnu.org
2023-05-10 11:02 ` cvs-commit at gcc dot gnu.org
2023-05-15  8:50 ` cvs-commit at gcc dot gnu.org
2023-05-15  8:56 ` cvs-commit at gcc dot gnu.org
2023-05-24 13:53 ` cvs-commit at gcc dot gnu.org
2023-05-25 14:01 ` cvs-commit at gcc dot gnu.org
2023-05-31 16:45 ` cvs-commit at gcc dot gnu.org
2023-05-31 16:46 ` cvs-commit at gcc dot gnu.org
2024-02-27  8:38 ` pinskia at gcc dot gnu.org
2024-04-04  8:21 ` ktkachov at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).