On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw wrote: > On 01/12/15 03:19, Bin.Cheng wrote: >> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw >> wrote: >>> On 24/11/15 09:56, Richard Earnshaw wrote: >>>> On 24/11/15 02:51, Bin.Cheng wrote: >>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't >>>>>>>>> have direct insn pattern describing the "x + y << z". According to >>>>>>>>> gcc internal: >>>>>>>>> >>>>>>>>> ‘addptrm3’ >>>>>>>>> Like addm3 but is guaranteed to only be used for address calculations. >>>>>>>>> The expanded code is not allowed to clobber the condition code. It >>>>>>>>> only needs to be defined if addm3 sets the condition code. >>>>>>> >>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we >>>>>>> shouldn't need to define this pattern. >>>>> Hi Richard, >>>>> I think that rule has a prerequisite that backend needs to support >>>>> register shifted addition in addm3 pattern. >>>> >>>> addm3 is a named pattern and its format is well defined. It does not >>>> take a shifted operand and never has. >>>> >>>>> Apparently for AArch64, >>>>> addm3 only supports "reg+reg" or "reg+imm". Also we don't really >>>>> "does not set the condition codes" actually, because both >>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. >>>> >>>> You appear to be confusing named patterns (used by expand) with >>>> recognizers. Anyway, we have >>>> >>>> (define_insn "*add__" >>>> [(set (match_operand:GPI 0 "register_operand" "=r") >>>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") >>>> (match_operand:QI 2 >>>> "aarch64_shift_imm_" "n")) >>>> (match_operand:GPI 3 "register_operand" "r")))] >>>> >>>> Which is a non-flag setting add with shifted operand. >>>> >>>>> Either way I think it is another backend issue, so do you approve that >>>>> I commit this patch now? >>>> >>>> Not yet. I think there's something fundamental amiss here. >>>> >>>> BTW, it looks to me as though addptr3 should have exactly the same >>>> operand rules as add3 (documentation reads "like add3"), so a >>>> shifted operand shouldn't be supported there either. If that isn't the >>>> case then that should be clearly called out in the documentation. >>>> >>>> R. >>>> >>> >>> PS. >>> >>> I presume you are aware of the canonicalization rules for add? That is, >>> for a shift-and-add operation, the shift operand must appear first. Ie. >>> >>> (plus (shift (op, op)), op) >>> >>> not >>> >>> (plus (op, (shift (op, op)) >> >> Hi Richard, >> Thanks for the comments. I realized that the not-recognized insn >> issue is because the original patch build non-canonical expressions. >> When reloading address expression, LRA generates non-canonical >> register scaled insn, which can't be recognized by aarch64 backend. >> >> Here is the updated patch using canonical form pattern, it passes >> bootstrap and regression test. Well, the ivo failure still exists, >> but it analyzed in the original message. >> >> Is this patch OK? >> >> As for Jiong's concern about the additional extension instruction, I >> think this only stands for atmoic load store instructions. For >> general load store, AArch64 supports zext/sext in register scaling >> addressing mode, the additional instruction can be forward propagated >> into memory reference. The problem for atomic load store is AArch64 >> only supports direct register addressing mode. After LRA reloads >> address expression out of memory reference, there is no combine/fwprop >> optimizer to merge instructions. The problem is atomic_store's >> predicate doesn't match its constraint. The predicate used for >> atomic_store is memory_operand, while all other atomic patterns >> use aarch64_sync_memory_operand. I think this might be a typo. With >> this change, expand will not generate addressing mode requiring reload >> anymore. I will test another patch fixing this. >> >> Thanks, >> bin > > Some comments inline. > >>> >>> R. >>> >>> aarch64_legitimize_addr-20151128.txt >>> >>> >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>> index 3fe2f0f..5b3e3c4 100644 >>> --- a/gcc/config/aarch64/aarch64.c >>> +++ b/gcc/config/aarch64/aarch64.c >>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) >>> We try to pick as large a range for the offset as possible to >>> maximize the chance of a CSE. However, for aligned addresses >>> we limit the range to 4k so that structures with different sized >>> - elements are likely to use the same base. */ >>> + elements are likely to use the same base. We need to be careful >>> + not split CONST for some forms address expressions, otherwise it > > not to split a CONST for some forms of address expression, > >>> + will generate sub-optimal code. */ >>> >>> if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) >>> { >>> HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); >>> HOST_WIDE_INT base_offset; >>> >>> + if (GET_CODE (XEXP (x, 0)) == PLUS) >>> + { >>> + rtx op0 = XEXP (XEXP (x, 0), 0); >>> + rtx op1 = XEXP (XEXP (x, 0), 1); >>> + >>> + /* For addr expression in the form like "r1 + r2 + 0x3ffc". >>> + Since the offset is within range supported by addressing >>> + mode "reg+offset", we don't split the const and legalize >>> + it into below insn and expr sequence: >>> + r3 = r1 + r2; >>> + "r3 + 0x3ffc". */ > > I think this comment would read better as > > /* Address expressions of the form Ra + Rb + CONST. > > If CONST is within the range supported by the addressing > mode "reg+offset", do not split CONST and use the > sequence > Rt = Ra + Rb > addr = Rt + CONST. */ > >>> + if (REG_P (op0) && REG_P (op1)) >>> + { >>> + machine_mode addr_mode = GET_MODE (x); >>> + rtx base = gen_reg_rtx (addr_mode); >>> + rtx addr = plus_constant (addr_mode, base, offset); >>> + >>> + if (aarch64_legitimate_address_hook_p (mode, addr, false)) >>> + { >>> + emit_insn (gen_adddi3 (base, op0, op1)); >>> + return addr; >>> + } >>> + } >>> + /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc". >>> + Live above, we don't split the const and legalize it into >>> + below insn and expr sequence: > > Similarly. >>> + r3 = 0x3ffc; >>> + r4 = r1 + r3; >>> + "r4 + r2<<2". */ > > Why don't we generate > > r3 = r1 + r2 << 2 > r4 = r3 + 0x3ffc > > utilizing the shift-and-add instructions? All other comments are addressed in the attached new patch. As for this question, Wilco also asked it on internal channel before. The main idea is to depend on GIMPLE IVO/SLSR to find CSE opportunities of the scaled plus sub expr. The scaled index is most likely loop iv, so I would like to split const plus out of memory reference so that it can be identified/hoisted as loop invariant. This is more important when base is sfp related. Thanks, bin