On 2021/6/12 04:16, Segher Boessenkool wrote: > On Thu, Jun 10, 2021 at 03:11:08PM +0800, Xionghu Luo wrote: >> On 2021/6/10 00:24, Segher Boessenkool wrote: >>>> "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR >>>> && !altivec_indexed_or_indirect_operand (operands[0], mode)" >>>> [(const_int 0)] >>>> { >>>> rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); >>>> rs6000_emit_le_vsx_permute (operands[0], operands[1], mode); >>>> rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); >>>> DONE; >>>> }) >>> >>> So it seems like it is only 3 insns in the very unlucky case? Normally >>> it will end up as just one simple store? >> >> I am afraid there is not "simple store" for *TImode on P8LE*. There is only >> stxvd2x that rotates the element(stvx requires memory to be aligned, not >> suitable pattern), so every vsx_le_perm_store_v1ti must be split to 3 >> instructions for alternative 0, it seems incorrect to force the cost to be 4. > > Often it could be done as just two insns though? If the value stored is > not used elsewhere? > > So we could make the first alternative cost 8 then as well, which will > also work out for combine, right? > > Alternatively we could have what is now the second alternative be the > first, if that is realistic -- that one already has cost 8 (it is just > two machine instructions). Attached the patch to update the 5 *vsx_le_perm_store_ function costs from 12 to 8. -- Thanks, Xionghu