ping ^3 Rebase patch on latest trunk. On Tue, Oct 27, 2020 at 3:51 PM Hongtao Liu wrote: > > ping^1 > > On Tue, Oct 20, 2020 at 3:36 PM Richard Biener > wrote: > > > > On Tue, Oct 20, 2020 at 4:35 AM Hongtao Liu wrote: > > > > > > On Mon, Oct 19, 2020 at 5:55 PM Richard Biener > > > wrote: > > > > > > > > On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu wrote: > > > > > > > > > > On Mon, Oct 19, 2020 at 5:07 PM Richard Biener > > > > > wrote: > > > > > > > > > > > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu wrote: > > > > > > > > > > > > > > Hi: > > > > > > > It's implemented as below: > > > > > > > V setg (V v, int idx, T val) > > > > > > > > > > > > > > { > > > > > > > V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; > > > > > > > V valv = (V){val, val, val, val, val, val, val, val}; > > > > > > > V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv); > > > > > > > v = (v & ~mask) | (valv & mask); > > > > > > > return v; > > > > > > > } > > > > > > > > > > > > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok. > > > > > > > Ok for trunk? > > > > > > > > > > > > Hmm, I guess you're trying to keep the code for !AVX512BW simple > > > > > > but isn't just splitting the compare into > > > > > > > > > > > > clow = {0, 1, 2, 3 ... } == idxv > > > > > > chigh = {16, 17, 18, ... } == idxv; > > > > > > cmp = {clow, chigh} > > > > > > > > > > > > > > > > We also don't have 512-bits byte/word blend instructions without > > > > > TARGET_AVX512W, so how to use 512-bits cmp? > > > > > > > > Oh, I see. Guess two back-to-back vpternlog could emulate > > > > > > Yes, we can have something like vpternlogd %zmm0, %zmm1, %zmm2, 0xD8, > > > but since we don't have 512-bits bytes/word broadcast instruction, > > > It would need 2 broadcast and 1 vec_concat to get 1 512-bits vector. > > > it wouldn't save many instructions compared to my version(as below). > > > > > > --- > > > leal -16(%rsi), %eax > > > vmovd %edi, %xmm2 > > > vmovdqa .LC0(%rip), %ymm4 > > > vextracti64x4 $0x1, %zmm0, %ymm3 > > > vmovd %eax, %xmm1 > > > vpbroadcastw %xmm2, %ymm2 > > > vpbroadcastw %xmm1, %ymm1 > > > vpcmpeqw %ymm4, %ymm1, %ymm1 > > > vpblendvb %ymm1, %ymm2, %ymm3, %ymm3 > > > vmovd %esi, %xmm1 > > > vpbroadcastw %xmm1, %ymm1 > > > vpcmpeqw %ymm4, %ymm1, %ymm1 > > > vpblendvb %ymm1, %ymm2, %ymm0, %ymm0 > > > vinserti64x4 $0x1, %ymm3, %zmm0, %zmm0 > > > --- > > > > > > > the blend? Not sure if important - I recall only knl didn't have bw? > > > > > > > > > > Yes, after(including) SKX, all avx512 targets will support AVX512BW. > > > And i don't think performance for V32HI/V64QI without AVX512BW is important. > > > > True. > > > > I have no further comments on the patch then - it still needs i386 maintainer > > approval though. > > > > Thanks, > > Richard. > > > > > > > > > > cut from i386-expand.c: > > > > > in ix86_expand_sse_movcc > > > > > 3682 case E_V64QImode: > > > > > 3683 gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed > > > > > 3684 break; > > > > > 3685 case E_V32HImode: > > > > > 3686 gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed > > > > > 3687 break; > > > > > 3688 case E_V16SImode: > > > > > 3689 gen = gen_avx512f_blendmv16si; > > > > > 3690 break; > > > > > 3691 case E_V8DImode: > > > > > 3692 gen = gen_avx512f_blendmv8di; > > > > > 3693 break; > > > > > 3694 case E_V8DFmode: > > > > > > > > > > > faster, smaller and eventually even easier during expansion? > > > > > > > > > > > > + gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val)); > > > > > > + gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode, > > > > > > idxv, idx_tmp)); > > > > > > > > > > > > side-effects in gcc_assert is considered bad style, use > > > > > > > > > > > > ok = ix86_expand_vector_init_duplicate (false, mode, valv, val); > > > > > > gcc_assert (ok); > > > > > > > > > > > > + vec[5] = constv; > > > > > > + ix86_expand_int_vcond (vec); > > > > > > > > > > > > this also returns a bool you probably should assert true. > > > > > > > > > > > > > > > > Yes, will change. > > > > > > > > > > > Otherwise thanks for tackling this. > > > > > > > > > > > > Richard. > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > PR target/97194 > > > > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function. > > > > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl. > > > > > > > * config/i386/predicates.md (vec_setm_operand): New predicate, > > > > > > > true for const_int_operand or register_operand under TARGET_AVX2. > > > > > > > * config/i386/sse.md (vec_set): Support both constant > > > > > > > and variable index vec_set. > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test. > > > > > > > * gcc.target/i386/avx2-vec-set-2.c: New test. > > > > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test. > > > > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test. > > > > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test. > > > > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test. > > > > > > > > > > > > > > -- > > > > > > > BR, > > > > > > > Hongtao > > > > > > > > > > > > > > > > > > > > -- > > > > > BR, > > > > > Hongtao > > > > > > > > > > > > -- > > > BR, > > > Hongtao > > > > -- > BR, > Hongtao -- BR, Hongtao