public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/112992] New: Inefficient vector initialization using vec_duplicate/broadcast
@ 2023-12-13  0:18 roger at nextmovesoftware dot com
  2023-12-13  0:22 ` [Bug target/112992] " pinskia at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-12-13  0:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112992

            Bug ID: 112992
           Summary: Inefficient vector initialization using
                    vec_duplicate/broadcast
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: roger at nextmovesoftware dot com
  Target Milestone: ---

The following four functions should in theory all produce the same code:

typedef unsigned long long v4di __attribute((vector_size(32)));
typedef unsigned int v8si __attribute((vector_size(32)));
typedef unsigned short v16hi __attribute((vector_size(32)));
typedef unsigned char v32qi __attribute((vector_size(32)));

#define MASK  0x01010101
#define MASKL 0x0101010101010101ULL
#define MASKS 0x0101

v4di fooq() {
  return (v4di){MASKL,MASKL,MASKL,MASKL};
}

v8si food() {
  return (v8si){MASK,MASK,MASK,MASK,MASK,MASK,MASK,MASK};
}

v16hi foow() {
  return (v16hi){MASKS,MASKS,MASKS,MASKS,MASKS,MASKS,MASKS,MASKS,
                 MASKS,MASKS,MASKS,MASKS,MASKS,MASKS,MASKS,MASKS};
}

v32qi foob() {
  return (v32qi){1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
                 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
}

On x86_64 with -mavx, we currently produce very different implementations:

fooq:
        movabs  rax, 72340172838076673
        push    rbp
        mov     rbp, rsp
        and     rsp, -32
        mov     QWORD PTR [rsp-8], rax
        vbroadcastsd    ymm0, QWORD PTR [rsp-8]
        leave
        ret
food:
        vbroadcastss    ymm0, DWORD PTR .LC2[rip]
        ret
foow:
        vmovdqa ymm0, YMMWORD PTR .LC3[rip]
        ret
foob:
        vmovdqa ymm0, YMMWORD PTR .LC4[rip]
        ret

clang currently produces the vbroadcastss for all four.
I discovered that some of my "day job" code used the "fooq" idiom, requiring a
stack frame, and both reads and writes to memory [of a compile-time constant].

I suspect the fix is to add a define_insn_and_split or two to i386/sse.md, and
perhaps something can be done in expand, but I'm confused why LRA/reload spills
the DImode component of V4DI to the stack frame, but places the SImode
component of V8SI in the constant pool.

This is related (distantly) to PRs 100865 and 106060, but is potentially target
independent, and seems to be going wrong in LRA/reload's REG_EQUIV elimination.
Thoughts?  Apologies if this is a dup.  I'm happy to work up a patch if someone
could advise on where best this should be fixed.  Perhaps RTL's vec_duplicate
could be canonicalized to the most appropriate vector mode?

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-05-07  6:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-13  0:18 [Bug target/112992] New: Inefficient vector initialization using vec_duplicate/broadcast roger at nextmovesoftware dot com
2023-12-13  0:22 ` [Bug target/112992] " pinskia at gcc dot gnu.org
2023-12-13  0:28 ` pinskia at gcc dot gnu.org
2023-12-13  1:25 ` liuhongt at gcc dot gnu.org
2023-12-13  1:26 ` liuhongt at gcc dot gnu.org
2023-12-13  2:44 ` liuhongt at gcc dot gnu.org
2023-12-13  2:46 ` liuhongt at gcc dot gnu.org
2023-12-13  7:42 ` liuhongt at gcc dot gnu.org
2023-12-14  8:41 ` cvs-commit at gcc dot gnu.org
2024-01-09  8:33 ` cvs-commit at gcc dot gnu.org
2024-01-14 11:51 ` roger at nextmovesoftware dot com
2024-05-07  6:19 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).