On Thu, Dec 3, 2015 at 6:26 PM, Richard Earnshaw wrote: > On 03/12/15 05:26, Bin.Cheng wrote: >> On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw >> wrote: >>> On 01/12/15 03:19, Bin.Cheng wrote: >>>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw >>>> wrote: >>>>> On 24/11/15 09:56, Richard Earnshaw wrote: >>>>>> On 24/11/15 02:51, Bin.Cheng wrote: >>>>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't >>>>>>>>>>> have direct insn pattern describing the "x + y << z". According to >>>>>>>>>>> gcc internal: >>>>>>>>>>> >>>>>>>>>>> ‘addptrm3’ >>>>>>>>>>> Like addm3 but is guaranteed to only be used for address calculations. >>>>>>>>>>> The expanded code is not allowed to clobber the condition code. It >>>>>>>>>>> only needs to be defined if addm3 sets the condition code. >>>>>>>>> >>>>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we >>>>>>>>> shouldn't need to define this pattern. >>>>>>> Hi Richard, >>>>>>> I think that rule has a prerequisite that backend needs to support >>>>>>> register shifted addition in addm3 pattern. >>>>>> >>>>>> addm3 is a named pattern and its format is well defined. It does not >>>>>> take a shifted operand and never has. >>>>>> >>>>>>> Apparently for AArch64, >>>>>>> addm3 only supports "reg+reg" or "reg+imm". Also we don't really >>>>>>> "does not set the condition codes" actually, because both >>>>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. >>>>>> >>>>>> You appear to be confusing named patterns (used by expand) with >>>>>> recognizers. Anyway, we have >>>>>> >>>>>> (define_insn "*add__" >>>>>> [(set (match_operand:GPI 0 "register_operand" "=r") >>>>>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") >>>>>> (match_operand:QI 2 >>>>>> "aarch64_shift_imm_" "n")) >>>>>> (match_operand:GPI 3 "register_operand" "r")))] >>>>>> >>>>>> Which is a non-flag setting add with shifted operand. >>>>>> >>>>>>> Either way I think it is another backend issue, so do you approve that >>>>>>> I commit this patch now? >>>>>> >>>>>> Not yet. I think there's something fundamental amiss here. >>>>>> >>>>>> BTW, it looks to me as though addptr3 should have exactly the same >>>>>> operand rules as add3 (documentation reads "like add3"), so a >>>>>> shifted operand shouldn't be supported there either. If that isn't the >>>>>> case then that should be clearly called out in the documentation. >>>>>> >>>>>> R. >>>>>> >>>>> >>>>> PS. >>>>> >>>>> I presume you are aware of the canonicalization rules for add? That is, >>>>> for a shift-and-add operation, the shift operand must appear first. Ie. >>>>> >>>>> (plus (shift (op, op)), op) >>>>> >>>>> not >>>>> >>>>> (plus (op, (shift (op, op)) >>>> >>>> Hi Richard, >>>> Thanks for the comments. I realized that the not-recognized insn >>>> issue is because the original patch build non-canonical expressions. >>>> When reloading address expression, LRA generates non-canonical >>>> register scaled insn, which can't be recognized by aarch64 backend. >>>> >>>> Here is the updated patch using canonical form pattern, it passes >>>> bootstrap and regression test. Well, the ivo failure still exists, >>>> but it analyzed in the original message. >>>> >>>> Is this patch OK? >>>> >>>> As for Jiong's concern about the additional extension instruction, I >>>> think this only stands for atmoic load store instructions. For >>>> general load store, AArch64 supports zext/sext in register scaling >>>> addressing mode, the additional instruction can be forward propagated >>>> into memory reference. The problem for atomic load store is AArch64 >>>> only supports direct register addressing mode. After LRA reloads >>>> address expression out of memory reference, there is no combine/fwprop >>>> optimizer to merge instructions. The problem is atomic_store's >>>> predicate doesn't match its constraint. The predicate used for >>>> atomic_store is memory_operand, while all other atomic patterns >>>> use aarch64_sync_memory_operand. I think this might be a typo. With >>>> this change, expand will not generate addressing mode requiring reload >>>> anymore. I will test another patch fixing this. >>>> >>>> Thanks, >>>> bin >>> >>> Some comments inline. >>> >>>>> >>>>> R. >>>>> >>>>> aarch64_legitimize_addr-20151128.txt >>>>> >>>>> >>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>>>> index 3fe2f0f..5b3e3c4 100644 >>>>> --- a/gcc/config/aarch64/aarch64.c >>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) >>>>> We try to pick as large a range for the offset as possible to >>>>> maximize the chance of a CSE. However, for aligned addresses >>>>> we limit the range to 4k so that structures with different sized >>>>> - elements are likely to use the same base. */ >>>>> + elements are likely to use the same base. We need to be careful >>>>> + not split CONST for some forms address expressions, otherwise it >>> >>> not to split a CONST for some forms of address expression, >>> >>>>> + will generate sub-optimal code. */ >>>>> >>>>> if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) >>>>> { >>>>> HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); >>>>> HOST_WIDE_INT base_offset; >>>>> >>>>> + if (GET_CODE (XEXP (x, 0)) == PLUS) >>>>> + { >>>>> + rtx op0 = XEXP (XEXP (x, 0), 0); >>>>> + rtx op1 = XEXP (XEXP (x, 0), 1); >>>>> + >>>>> + /* For addr expression in the form like "r1 + r2 + 0x3ffc". >>>>> + Since the offset is within range supported by addressing >>>>> + mode "reg+offset", we don't split the const and legalize >>>>> + it into below insn and expr sequence: >>>>> + r3 = r1 + r2; >>>>> + "r3 + 0x3ffc". */ >>> >>> I think this comment would read better as >>> >>> /* Address expressions of the form Ra + Rb + CONST. >>> >>> If CONST is within the range supported by the addressing >>> mode "reg+offset", do not split CONST and use the >>> sequence >>> Rt = Ra + Rb >>> addr = Rt + CONST. */ >>> >>>>> + if (REG_P (op0) && REG_P (op1)) >>>>> + { >>>>> + machine_mode addr_mode = GET_MODE (x); >>>>> + rtx base = gen_reg_rtx (addr_mode); >>>>> + rtx addr = plus_constant (addr_mode, base, offset); >>>>> + >>>>> + if (aarch64_legitimate_address_hook_p (mode, addr, false)) >>>>> + { >>>>> + emit_insn (gen_adddi3 (base, op0, op1)); >>>>> + return addr; >>>>> + } >>>>> + } >>>>> + /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc". >>>>> + Live above, we don't split the const and legalize it into >>>>> + below insn and expr sequence: >>> >>> Similarly. >>>>> + r3 = 0x3ffc; >>>>> + r4 = r1 + r3; >>>>> + "r4 + r2<<2". */ >>> >>> Why don't we generate >>> >>> r3 = r1 + r2 << 2 >>> r4 = r3 + 0x3ffc >>> >>> utilizing the shift-and-add instructions? >> >> All other comments are addressed in the attached new patch. >> As for this question, Wilco also asked it on internal channel before. >> The main idea is to depend on GIMPLE IVO/SLSR to find CSE >> opportunities of the scaled plus sub expr. The scaled index is most >> likely loop iv, so I would like to split const plus out of memory >> reference so that it can be identified/hoisted as loop invariant. >> This is more important when base is sfp related. >> > > Ah, yes. The SFP problem. > > Since at least two people have queried this, it's clearly non-obvious > enough to require explanation in comment. > > OK with that change and a suitable changelog entry. Given your review comments, I am going to applying attached patch along with below Changelog entry. Thanks, bin 2015-12-04 Bin Cheng Jiong Wang * config/aarch64/aarch64.c (aarch64_legitimize_address): legitimize address expressions like Ra + Rb + CONST and Ra + Rb<