> I don't think this pattern is correct, because SEL isn't commutative > in the vector operands. Indeed, I think I should invert PRED operand or the comparison operator which produce the PRED operand first. > I think this should be: > >  if (...) >    to = XEXP (to, 0);> > and should be before the REG_P test.  We don't want to treat > arbitrary duplicates as profitable. Agree, the adjustment is more rigorous. > It's not obvious that vec_duplicate is special enough that we should > treat it differently from other unary operators.  For example, > zero_extend and sign_extend don't seem fundamentally more expensive > than vec_duplicate. Juzhe and I also discussed offline recently. We also have widened vector operator that needs to be added, this can be finished in RTL with forwarding instead of adding widen GIMPLE internal function. We think we can add a TARGET HOOK, for example:  `rtx try_forward (rtx dest, rtx src, rtx use_insn, rtx def_insn)` If it returns NULL_RTX, it means that it cannot be forwarded, otherwise it means replace the dest part in use_insn with the returned rtx. Letting the backend decide which ones can be forwarded has several advantages compared to: 1. Let the insn related to TARGET, such as unspec, also can be forwarded,     and when forwarding, the corresponding content can be extracted     from def_insn instead of the complete src part. 2. By default  this HOOK returns NULL_TREE, which can reduce compatibility     issues. > It's a while since I looked at this code, but I assume that, even after > this change, we will still require the new in-loop instruction to be > no more expensive than the old in-loop instruction.  Is that right? Yeah. Forwarding vec_duplicate maybe reduce the use of vector registers, but increase the life cycle of scalar registers. If the scalar register pressure is higher, this change may become more expensive. This decision does not feel very easy to make, is there some way to do this? Best, Lehua