[Bug target/112384] New: a non-constant vec dup should be improved

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/112384] New: a non-constant vec dup should be improved
@ 2023-11-04 23:09 pinskia at gcc dot gnu.org
  2023-11-04 23:12 ` [Bug target/112384] " pinskia at gcc dot gnu.org
  2023-11-06  8:21 ` rguenth at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-04 23:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112384

            Bug ID: 112384
           Summary: a non-constant vec dup should be improved
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
```
#define vector __attribute__((vector_size(16)))

vector int f1(vector int t, int i)
{
  i&=3;
  vector int tt = {i, i, i, i};
  vector int r = __builtin_shuffle(t, tt);
  return r;
}

vector int f2(vector int t, int i)
{
  i&=3;
  i = t[i];
  vector int tt = {i, i, i, i};
  return tt;
}
```

Both of these give not so good code generation.

f1 has:
```
        dup     v31.4s, w0
...
        shl     v31.4s, v31.4s, 2
        tbl     v31.16b, {v31.16b}, v28.16b
        add     v31.16b, v31.16b, v29.16b
```
But we could do better by combing the dup and the shl into.

For RTL level:
Trying 11 -> 12:
   11: r98:V4SI=vec_duplicate(r92:SI)
      REG_DEAD r92:SI
   12: r101:V4SI=r98:V4SI<<const_vector
      REG_DEAD r98:V4SI
Failed to match this instruction:
(set (reg:V4SI 101)
    (ashift:V4SI (vec_duplicate:V4SI (reg/v:SI 92 [ iD.4390 ]))
        (const_vector:V4SI [
                (const_int 2 [0x2]) repeated x4
            ])))

Changing that into:
(set (reg:V4SI 101)
 (vec_duplicate:V4SI (ashift:SI (reg/v:SI 92 [ iD.4390 ]) (const_int 2 [0x2])))

Will improve things.

The first tlb seems can be removable too.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/112384] a non-constant vec dup should be improved
  2023-11-04 23:09 [Bug target/112384] New: a non-constant vec dup should be improved pinskia at gcc dot gnu.org
@ 2023-11-04 23:12 ` pinskia at gcc dot gnu.org
  2023-11-06  8:21 ` rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-04 23:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112384

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Oh f2 just goes to memory.
Produces:
```
        and     x0, x0, 3
        str     q0, [sp]
        ldr     s0, [sp, x0, lsl 2]

        dup     v0.4s, v0.s[0]
```

Now clang(LLVM) produces:
```
        mov     x8, sp
        and     w9, w0, #0x3
        str     q0, [sp]
        orr     x8, x8, x9, lsl #2
        ld1r    { v0.4s }, [x8]
```

I don't know which is better but it might be the case where GCC's is better for
some micro-arch.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/112384] a non-constant vec dup should be improved
  2023-11-04 23:09 [Bug target/112384] New: a non-constant vec dup should be improved pinskia at gcc dot gnu.org
  2023-11-04 23:12 ` [Bug target/112384] " pinskia at gcc dot gnu.org
@ 2023-11-06  8:21 ` rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-06  8:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112384

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|aarch64                     |aarch64, x86_64-*-*
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-11-06

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  Note for f2 the target needs to support .VEC_EXTRACT with variable
index.

OTOH we miss to transform

  i_4 = VIEW_CONVERT_EXPR<int[4]>(t)[i_2];
  tt_5 = {i_4, i_4, i_4, i_4};

into

  tt_3 = {i_2, i_2, i_2, i_2};
  r_6 = VEC_PERM_EXPR <t_4(D), t_4(D), tt_3>;

but the complication is that 't' isn't in SSA form (which is also why
it goes through memory here).

On x86_64 with SSE4.1 we get

f1:
.LFB0:
        .cfi_startproc
        andl    $3, %edi
        movd    %edi, %xmm2
        pshufd  $0, %xmm2, %xmm1
        pslld   $2, %xmm1
        pshufb  .LC1(%rip), %xmm1
        paddb   .LC2(%rip), %xmm1
        pshufb  %xmm1, %xmm0
        ret

f2:
.LFB1:
        .cfi_startproc
        andl    $3, %edi
        movaps  %xmm0, -24(%rsp)
        movd    -24(%rsp,%rdi,4), %xmm1
        pshufd  $0, %xmm1, %xmm0
        ret

I suspect the memory case is actually faster.  With AVX512VL this
improves to

f1:
.LFB0:
        .cfi_startproc
        andl    $3, %edi
        vmovdqa %xmm0, %xmm1
        vpbroadcastd    %edi, %xmm0
        vpermi2d        %xmm1, %xmm1, %xmm0
        ret

f2:
.LFB1:
        .cfi_startproc
        andl    $3, %edi
        vmovdqa %xmm0, -24(%rsp)
        vpbroadcastd    -24(%rsp,%rdi,4), %xmm0
        ret

AVX2 has the odd

f1:
.LFB0:
        .cfi_startproc
        andl    $3, %edi
        vinserti128     $1, %xmm0, %ymm0, %ymm0
        vmovd   %edi, %xmm2
        vpbroadcastd    %xmm2, %xmm1
        vinserti128     $1, %xmm1, %ymm1, %ymm1
        vpermd  %ymm0, %ymm1, %ymm0
        vzeroupper
        ret

where sth feels wrong - f2 is similar to AVX512.  It's not clear whether
the f1 IL is better in the end.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-11-06  8:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-04 23:09 [Bug target/112384] New: a non-constant vec dup should be improved pinskia at gcc dot gnu.org
2023-11-04 23:12 ` [Bug target/112384] " pinskia at gcc dot gnu.org
2023-11-06  8:21 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).