[Bug target/101846] New: Improve __builtin

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/101846] New: Improve __builtin_shufflevector emitted code
@ 2021-08-10 13:14 jakub at gcc dot gnu.org
  2021-08-10 14:30 ` [Bug target/101846] " rguenth at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-08-10 13:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

            Bug ID: 101846
           Summary: Improve __builtin_shufflevector emitted code
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

typedef short v16hi __attribute__((vector_size (32)));
typedef short v32hi __attribute__((vector_size (64)));

v32hi
foo (v16hi x)
{
  return __builtin_shufflevector (x, (v16hi) {}, 0, 16, 1, 17, 2, 18, 3, 19, 4,
20, 5, 21, 6, 22, 7, 23,
                                                 8, 24, 9, 25, 10, 26, 11, 27,
12, 28, 13, 29, 14, 30, 15, 31);
}

v16hi
bar (v32hi x)
{
  return __builtin_shufflevector (x, x, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,
22, 24, 26, 28, 30);
}

shows two cases, where we should be emitting just
vpmovzxwd       %ymm0, %zmm0
and
vpmovdw %zmm0, %ymm0
but we actually emit
        vmovdqa %ymm0, %ymm0
        vpmovzxwd       %ymm0, %zmm0
where the vmovdqa is unnecessary - the permutation doesn't care about the
elements at or above 32-bytes - and
        vmovdqa64       %zmm0, %zmm1
        vmovdqa64       .LC0(%rip), %zmm0
        vpermi2w        %zmm1, %zmm1, %zmm0
Similarly for permutations matching other vpmovxz* or vpmov* instructions.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101846] Improve __builtin_shufflevector emitted code
  2021-08-10 13:14 [Bug target/101846] New: Improve __builtin_shufflevector emitted code jakub at gcc dot gnu.org
@ 2021-08-10 14:30 ` rguenth at gcc dot gnu.org
  2021-08-11  3:35 ` crazylht at gmail dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-10 14:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-08-10
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  I've pondered (for elsewhere) about how to represent "paradoxical
subregs" on GIMPLE.  We expand from

v32hi foo (v16hi x)
{
  vector(32) short int _1;
  v32hi _3;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _1 = {x_2(D), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }};
  _3 = VEC_PERM_EXPR <_1, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { 0, 32, 1, 33, 2, 34, 3, 35, 4,
36, 5, 37, 6, 38, 7, 39, 8, 40, 9, 41, 10, 42, 11, 43, 12, 44, 13, 45, 14, 46,
15, 47 }>;
  return _3;

and

v16hi bar (v32hi x)
{
  vector(32) short int _1;
  v16hi _3;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _1 = VEC_PERM_EXPR <x_2(D), x_2(D), { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,
22, 24, 26, 28, 30, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31 }>;
  _3 = BIT_FIELD_REF <_1, 256, 0>;
  return _3;

I think bar() is reasonable from the GIMPLE side, it would be a 1:1
canonicalization choice to move the BIT_FIELD_REF across the permute
(and something only "profitable" for single operand permutes).

For foo() I thought of doing

 _1 = BIT_INSERT_EXPR <tem_3(D), x_2(D), 0>;

with tem_3(D) being uninitialized as to represent a paradoxical subreg.
I've tested and disregarded the idea of simply doing VIEW_CONVERT_EXPRs
here but I'm considering it for the case where we need the lowpart
of a vector and the the highpart doesn't matter (aka %xmm0 vs %ymm0)
since the current representation of doing a BIT_FIELD_REF doesn't
seem to optimize well (that was in the context of AVX512 mask registers
though).

I suppose the testcases can be optimized on the RTL level as well.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101846] Improve __builtin_shufflevector emitted code
  2021-08-10 13:14 [Bug target/101846] New: Improve __builtin_shufflevector emitted code jakub at gcc dot gnu.org
  2021-08-10 14:30 ` [Bug target/101846] " rguenth at gcc dot gnu.org
@ 2021-08-11  3:35 ` crazylht at gmail dot com
  2021-08-12  1:53 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2021-08-11  3:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
For foo, vmovdqa is avx_vec_concatv16si/2, and we can add define_insn_and_split
to combine avx_vec_concatv16si/2 and avx512f_zero_extendv16hiv16si2_1, similar
for other modes in pmovzx{bw,wd,dq}.

For bar, we need to match pmov{wb,dw,qd} in ix86_vectorize_vec_perm_const when
only one operand is used and selector are truncate index, just like we did for
pmovzx.

I'll take this.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101846] Improve __builtin_shufflevector emitted code
  2021-08-10 13:14 [Bug target/101846] New: Improve __builtin_shufflevector emitted code jakub at gcc dot gnu.org
  2021-08-10 14:30 ` [Bug target/101846] " rguenth at gcc dot gnu.org
  2021-08-11  3:35 ` crazylht at gmail dot com
@ 2021-08-12  1:53 ` crazylht at gmail dot com
  2021-08-12  1:55 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2021-08-12  1:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
expand_vec_perm_1 is supposed to generate 1 instruction, but it doesn't
consider load of const_vector, if we handle (In reply to Hongtao.liu from
comment #2)
> For foo, vmovdqa is avx_vec_concatv16si/2, and we can add
> define_insn_and_split to combine avx_vec_concatv16si/2 and
> avx512f_zero_extendv16hiv16si2_1, similar for other modes in
> pmovzx{bw,wd,dq}.
> 
> For bar, we need to match pmov{wb,dw,qd} in ix86_vectorize_vec_perm_const
> when only one operand is used and selector are truncate index, just like we
> did for pmovzx.
> 
> I'll take this.

For bar when there's real use for upper bits like
v32hi
foo_dw_512 (v32hi x)
{
  return __builtin_shufflevector (x, x,
                                  0, 2, 4, 6, 8, 10, 12, 14,
                                  16, 18, 20, 22, 24, 26, 28, 30,
                                  16, 17, 18, 19, 20, 21, 22, 23,
                                  24, 25, 26, 27, 28, 29, 30, 31);
}

The vpmovdw version seems still better

-       vmovdqa64       %zmm0, %zmm1
-       vmovdqa64       .LC0(%rip), %zmm0
-       vpermi2w        %zmm1, %zmm1, %zmm0
+       vpmovdw %zmm0, %ymm1
+       vinserti64x4    $0x0, %ymm1, %zmm0, %zmm0

The conclusion hold true for other 256/512bit modes, but not 128-bit modes.

-       vpshufb .LC2(%rip), %xmm0, %xmm0
+       vpmovdw %xmm0, %xmm1
+       vmovq   %xmm1, %rax
+       vpinsrq $0, %rax, %xmm0, %xmm0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101846] Improve __builtin_shufflevector emitted code
  2021-08-10 13:14 [Bug target/101846] New: Improve __builtin_shufflevector emitted code jakub at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-08-12  1:53 ` crazylht at gmail dot com
@ 2021-08-12  1:55 ` crazylht at gmail dot com
  2021-08-12  6:03 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2021-08-12  1:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
But when upper bits is not used, vpmovdw version seems better. 

v4hi
bar_dw_128 (v8hi x)
{
  return __builtin_shufflevector (x, x, 0, 2, 4, 6);// 4, 5, 6, 7);
}

-       vpshufb .LC2(%rip), %xmm0, %xmm0
+       vpmovdw %xmm0, %xmm0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101846] Improve __builtin_shufflevector emitted code
  2021-08-10 13:14 [Bug target/101846] New: Improve __builtin_shufflevector emitted code jakub at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-08-12  1:55 ` crazylht at gmail dot com
@ 2021-08-12  6:03 ` cvs-commit at gcc dot gnu.org
  2021-08-16  7:30 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-08-12  6:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:95e1eca43d106d821720744ac6ff1f5df41a1e78

commit r12-2869-g95e1eca43d106d821720744ac6ff1f5df41a1e78
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Aug 11 14:00:00 2021 +0800

    Combine avx_vec_concatv16si and avx512f_zero_extendv16hiv16si2_1 to
avx512f_zero_extendv16hiv16si2_2.

    Add define_insn_and_split to combine avx_vec_concatv16si/2 and
    avx512f_zero_extendv16hiv16si2_1 since the latter already zero_extend
    the upper bits, similar for other patterns which are related to
    pmovzx{bw,wd,dq}.

    It will do optimization like

    -       vmovdqa %ymm0, %ymm0    # 7     [c=4 l=6]  avx_vec_concatv16si/2
            vpmovzxwd       %ymm0, %zmm0    # 22    [c=4 l=6] 
avx512f_zero_extendv16hiv16si2
            ret             # 25    [c=0 l=1]  simple_return_internal

    gcc/ChangeLog:

            PR target/101846
            * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_2): New
            post_reload define_insn_and_split.
            (*avx512bw_zero_extendv32qiv32hi2_2): Ditto.
            (*sse4_1_zero_extendv8qiv8hi2_4): Ditto.
            (*avx512f_zero_extendv16hiv16si2_2): Ditto.
            (*avx2_zero_extendv8hiv8si2_2): Ditto.
            (*sse4_1_zero_extendv4hiv4si2_4): Ditto.
            (*avx512f_zero_extendv8siv8di2_2): Ditto.
            (*avx2_zero_extendv4siv4di2_2): Ditto.
            (*sse4_1_zero_extendv2siv2di2_4): Ditto.
            (VI248_256, VI248_512, VI148_512, VI148_256, VI148_128): New
            mode iterator.

    gcc/testsuite/ChangeLog:

            PR target/101846
            * gcc.target/i386/pr101846-1.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101846] Improve __builtin_shufflevector emitted code
  2021-08-10 13:14 [Bug target/101846] New: Improve __builtin_shufflevector emitted code jakub at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-08-12  6:03 ` cvs-commit at gcc dot gnu.org
@ 2021-08-16  7:30 ` cvs-commit at gcc dot gnu.org
  2021-12-15  1:50 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-08-16  7:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:faf2b6bc527dff31725dde5538ffff1c92688047

commit r12-2919-gfaf2b6bc527dff31725dde5538ffff1c92688047
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Aug 16 11:16:52 2021 +0800

    Optimize __builtin_shuffle_vector.

    1. Support vpermw/vpermb in ix86_expand_vec_one_operand_perm_avx512.
    2. Support 256/128-bits vpermi2b ix86_expand_vec_perm_vpermt2.
    3. Add define_insn_and_split to optimize specific vector permutation to
opmov{dw,wb,qd}.

    gcc/ChangeLog:

            PR target/101846
            * config/i386/i386-expand.c (ix86_expand_vec_perm_vpermt2):
            Support vpermi2b for V32QI/V16QImode.
            (ix86_extract_perm_from_pool_constant): New function.
            (ix86_expand_vec_one_operand_perm_avx512): Support
            vpermw/vpermb under TARGET_AVX512BW/TARGET_AVX512VBMI.
            (expand_vec_perm_1): Adjust comments for upper.
            * config/i386/i386-protos.h (ix86_extract_perm_from_pool_constant):
            New declare.
            * config/i386/predicates.md (permvar_truncate_operand): New
predicate.
            (pshufb_truncv4siv4hi_operand): Ditto.
            (pshufb_truncv8hiv8qi_operand): Ditto.
            * config/i386/sse.md (*avx512bw_permvar_truncv16siv16hi_1):
            New pre_reload define_insn_and_split.
            (*avx512f_permvar_truncv8siv8hi_1): Ditto.
            (*avx512f_vpermvar_truncv8div8si_1): Ditto.
            (*avx512f_permvar_truncv32hiv32qi_1): Ditto.
            (*avx512f_permvar_truncv16hiv16qi_1): Ditto.
            (*avx512f_permvar_truncv4div4si_1): Ditto.
            (*avx512f_pshufb_truncv8hiv8qi_1): Ditto.
            (*avx512f_pshufb_truncv4siv4hi_1): Ditto.
            (*avx512f_pshufd_truncv2div2si_1): Ditto.

    gcc/testsuite/ChangeLog:

            PR target/101846
            * gcc.target/i386/pr101846-2.c: New test.
            * gcc.target/i386/pr101846-3.c: New test.
            * gcc.target/i386/pr101846-4.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101846] Improve __builtin_shufflevector emitted code
  2021-08-10 13:14 [Bug target/101846] New: Improve __builtin_shufflevector emitted code jakub at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-08-16  7:30 ` cvs-commit at gcc dot gnu.org
@ 2021-12-15  1:50 ` pinskia at gcc dot gnu.org
  2021-12-15  1:52 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-15  1:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
With just -mavx512f we produce a bunch of instructions (looking like we went to
scalar mode) while LLVM is able to produce:
foo(short __vector(16)):                           # @foo(short __vector(16))
        .cfi_startproc
# %bb.0:
        vpmovzxwd       ymm1, xmm0              # ymm1 =
xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
        vextracti128    xmm0, ymm0, 1
        vpmovzxwd       ymm0, xmm0              # ymm0 =
xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
        vinserti64x4    zmm0, zmm1, ymm0, 1
        ret


bar(short __vector(32)):                           # @bar(short __vector(32))
        .cfi_startproc
# %bb.0:
        vpmovdw ymm0, zmm0
        ret


For -march=skylake512 we do produce now:
foo(short __vector(16)):
        vpmovzxwd       zmm0, ymm0
        ret
bar(short __vector(32)):
        vpmovdw ymm0, zmm0
        ret

So still confirmed for the -mavx512f case.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101846] Improve __builtin_shufflevector emitted code
  2021-08-10 13:14 [Bug target/101846] New: Improve __builtin_shufflevector emitted code jakub at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-12-15  1:50 ` pinskia at gcc dot gnu.org
@ 2021-12-15  1:52 ` pinskia at gcc dot gnu.org
  2021-12-15  3:25 ` crazylht at gmail dot com
  2021-12-15 11:26 ` jakub at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-15  1:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
foo is like a zero extend even.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101846] Improve __builtin_shufflevector emitted code
  2021-08-10 13:14 [Bug target/101846] New: Improve __builtin_shufflevector emitted code jakub at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-12-15  1:52 ` pinskia at gcc dot gnu.org
@ 2021-12-15  3:25 ` crazylht at gmail dot com
  2021-12-15 11:26 ` jakub at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2021-12-15  3:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Andrew Pinski from comment #7)
> With just -mavx512f we produce a bunch of instructions (looking like we went
> to scalar mode) while LLVM is able to produce:
> foo(short __vector(16)):                           # @foo(short __vector(16))
>         .cfi_startproc
> # %bb.0:
>         vpmovzxwd       ymm1, xmm0              # ymm1 =
> xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],
> zero,xmm0[6],zero,xmm0[7],zero
>         vextracti128    xmm0, ymm0, 1
>         vpmovzxwd       ymm0, xmm0              # ymm0 =
> xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],
> zero,xmm0[6],zero,xmm0[7],zero
>         vinserti64x4    zmm0, zmm1, ymm0, 1
>         ret
> 
> 
zero_extend from ymm to zmm is supported under avx512bw, LLVM breaks them into
2 zero extends from xmm to ymm, and then pack them back to zmm.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101846] Improve __builtin_shufflevector emitted code
  2021-08-10 13:14 [Bug target/101846] New: Improve __builtin_shufflevector emitted code jakub at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-12-15  3:25 ` crazylht at gmail dot com
@ 2021-12-15 11:26 ` jakub at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-12-15 11:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
For bar, the problem is that while vpmovdw is AVX512F, we actually recognize it
only at combine time as vpermw (with selected exact permutation) combined with
low part extraction.  And vpermw is only AVX512BW.
In order to optimize it, we'd need to implement what LLVM actually has support
for, namely the "I don't care" possibilities for the permutations.
So, instead of what we emit right now in GIMPLE:
  _1 = VEC_PERM_EXPR <x_2(D), x_2(D), { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,
22, 24, 26, 28, 30, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31 }>;
  _3 = BIT_FIELD_REF <_1, 256, 0>;
we'd need to emit
  _1 = VEC_PERM_EXPR <x_2(D), x_2(D), { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,
22, 24, 26, 28, 30, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY,
ANY, ANY, ANY, ANY }>;
(we'd need a special VEC_PERM_EXPR variant for that which would only accept
VECTOR_CSTs and reserve all ones for the "ANY" case in there).
And, the hard part, adjust the target const vec perm code to handle those
efficiently - as a wildcard for whatever other element of the vector or
constant 0.  One thing are the code which verifies the d->perm[?] values which
would treat the wildcards as anything but for a successful match we'd actually
need to compute what value is best based on the non-wildcard values in the
permutation.
Another are the many cases where we construct RTL and try to recog it, we'd
need
some new RTL which would stand for CONST_INT_WILDCARD that would compare equal
to any int, but would need some way how the pattern if matched would actually
tells us back which number it wants to use.

With that support, we could recognize the { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18,
20, 22, 24, 26, 28, 30, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY,
ANY, ANY, ANY, ANY, ANY } V32HI permutation as matching the vpmovdw instruction
which puts 0s in the upper half of the vector.

The foo case is doable even without this I think, the question is whether we
should try to split arbitrary permutation of 64-byte vectors into permutations
of the two halves merged then together if the permutation allows that (first
half of elements is from first halves of the inputs and second half of elements
is from second halves of the inputs).

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-12-15 11:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-10 13:14 [Bug target/101846] New: Improve __builtin_shufflevector emitted code jakub at gcc dot gnu.org
2021-08-10 14:30 ` [Bug target/101846] " rguenth at gcc dot gnu.org
2021-08-11  3:35 ` crazylht at gmail dot com
2021-08-12  1:53 ` crazylht at gmail dot com
2021-08-12  1:55 ` crazylht at gmail dot com
2021-08-12  6:03 ` cvs-commit at gcc dot gnu.org
2021-08-16  7:30 ` cvs-commit at gcc dot gnu.org
2021-12-15  1:50 ` pinskia at gcc dot gnu.org
2021-12-15  1:52 ` pinskia at gcc dot gnu.org
2021-12-15  3:25 ` crazylht at gmail dot com
2021-12-15 11:26 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).