Jeff Law writes: > On 04/29/2015 03:36 PM, Jiong Wang wrote: >> >> Jeff Law writes: >> >>> On 04/27/2015 02:21 PM, Jiong Wang wrote: >>> >>>> Jeff, >>>> >>>> Sorry, I can't understand the meaning of "overlap between t_low and low", >>>> assume "right" in "right value" means the opposite of "left" not >>>> "correct". >>>> >>>> So what you mean is t_low and low share the same pseudo regiser? >>> My concern is sharing the same pseudo or memory location. But thinking >>> more about it, the shifted value has to have range information, so it >>> must have been an SSA_NAME, right? If so, then it can't overlap with >>> the destination, so this is a non-issue. Sorry for the confusion. >> >> Thanks for the light. By looking at related code, looks like even it's >> SSA_NAME, it's still possible to share the same pseudo given the >> destination is in the same SSA map parition after ssa name coleascing? > If they're the same size and have non-overlapping lifetimes, then yes, > they could be the same pseudo. That ought to be easy to check. > Thankfully we don't have to worry about MEMs, which is a harder check. > >> OK. I will rework the patch, and I found there is a function named >> "expand_doubleword_shift" which looks like a more natural place to do >> this optimization, although it's hard to get range info there. I will do >> further explore on this. > Sounds good. > > jeff Jeff, Sorry for the slow response. I found we can't integrate this transformation into "expand_doubleword_shift" as it's at quite deep layer in the call stack, when we reach there, we have lost some high level info. So I am keeping this transformaion still in expr.c, and attachment is the updated version of this patch. The main changes are: * Figuring out whether the shift source is coming from sign extension by checking SSA_NAME_DEF_STMT instead of deducing from tree range info. I fell checking the gimple statement is more reliable and straigtforward. * For the pseudo register overlaping issue, given: RTX target = TREE source << TREE amount I was trying to make sure those two SSA_NAME won't get the same pseudo register by comparing their assigned partition using var_to_partition, but I can't get the associated tree representation for "target" which is RTX. Then I just expand "source" and compare the register number with "target". But I found the simplest way maybe just reorder low_part (target) = low_part (source) << amount high_part (target) = low_part (source) >> amount1 to: high_part (target) = low_part (source) >> amount1 low_part (target) = low_part (source) << amount then, even target and source share the same pseudo register, we can still get what we want, as we are operating on their subreg. * Added checking on the result value of expand_variable_shift, if it's not the same as "target" then call emit_move_insn to do that. How is the re-worked patch looks to you? x86 bootstrap OK, regression OK. aarch64 bootstrap OK. 2015-08-14 Jiong.Wang gcc/ * expr.c (expand_expr_real_2): Check tree format to optimize LSHIFT_EXPR expand. gcc/testsuite * gcc.dg/wide_shift_64_1.c: New testcase. * gcc.dg/wide_shift_128_1.c: Likewise. * gcc.target/aarch64/ashlti3_1.c: Likewise. * gcc.target/arm/ashldisi_1.c: Likewise. -- Regards, Jiong