On Tue, Sep 1, 2020 at 6:11 PM Jakub Jelinek wrote: > > On Tue, Sep 01, 2020 at 05:55:18PM +0800, Hongtao Liu wrote: > > I tried define_split, but there's too many of them(considering usage > > of define_subst for mask). > > Also for new added instructions which support embedded broadcast, > > corresponding define_split needs to be added. > > One pass that could (sometimes) handle it is the rpad pass which is also > added right after the combiner like your pass. > So, couldn't you call replace_constant_pool_with_broadcast from both > your pass body loop and remove_partial_avx_dependency, the latter guarded > with if (TARGET_AVX512F), and change the new pass gate to be false when > the rpad pass gate is true? > Yes, it could save compile time when both this pass and rpad pass are available. Changed. > > +static void > > +replace_constant_pool_with_broadcast (rtx_insn* insn) > > Formatting, * should be after space, not before it. > Changed. > > + first = XVECEXP (constant, 0, 0); > > + /* There could be some rtx like > > + (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC1"))) > > + but with "*.LC1" refer to V2DI constant vector. */ > > + if (GET_MODE (constant) != mode) > > + return; > > Is there a reason why don't you want to handle that? just avoid error caused by different element number. > You could do instead: > if (GET_MODE (constant) != mode) > { > constant = simplify_subreg (mode, constant, GET_MODE (constant), 0); > if (constant == NULL_RTX || GET_CODE (constant) != CONST_VECTOR) > return; > } > first = XVECEXP (constant, 0, 0); > Changed. > Other than that LGTM. > > Jakub > -- BR, Hongtao