From: "Bin.Cheng" <amker.cheng@gmail.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
Cc: James Greenhalgh <james.greenhalgh@arm.com>,
Bin Cheng <bin.cheng@arm.com>,
gcc-patches List <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
Date: Fri, 04 Dec 2015 03:18:00 -0000 [thread overview]
Message-ID: <CAHFci28yNaR3R-HYq+BrxvU00zrOEQubBrTJf+3bvu7jfuewgQ@mail.gmail.com> (raw)
In-Reply-To: <566018C7.4000107@foss.arm.com>
[-- Attachment #1: Type: text/plain, Size: 8588 bytes --]
On Thu, Dec 3, 2015 at 6:26 PM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> On 03/12/15 05:26, Bin.Cheng wrote:
>> On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw
>> <Richard.Earnshaw@foss.arm.com> wrote:
>>> On 01/12/15 03:19, Bin.Cheng wrote:
>>>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>>>> <Richard.Earnshaw@foss.arm.com> wrote:
>>>>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>>>>> On 24/11/15 02:51, Bin.Cheng wrote:
>>>>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>>>>>>>>>> have direct insn pattern describing the "x + y << z". According to
>>>>>>>>>>> gcc internal:
>>>>>>>>>>>
>>>>>>>>>>> ‘addptrm3’
>>>>>>>>>>> Like addm3 but is guaranteed to only be used for address calculations.
>>>>>>>>>>> The expanded code is not allowed to clobber the condition code. It
>>>>>>>>>>> only needs to be defined if addm3 sets the condition code.
>>>>>>>>>
>>>>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we
>>>>>>>>> shouldn't need to define this pattern.
>>>>>>> Hi Richard,
>>>>>>> I think that rule has a prerequisite that backend needs to support
>>>>>>> register shifted addition in addm3 pattern.
>>>>>>
>>>>>> addm3 is a named pattern and its format is well defined. It does not
>>>>>> take a shifted operand and never has.
>>>>>>
>>>>>>> Apparently for AArch64,
>>>>>>> addm3 only supports "reg+reg" or "reg+imm". Also we don't really
>>>>>>> "does not set the condition codes" actually, because both
>>>>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>>>>
>>>>>> You appear to be confusing named patterns (used by expand) with
>>>>>> recognizers. Anyway, we have
>>>>>>
>>>>>> (define_insn "*add_<shift>_<mode>"
>>>>>> [(set (match_operand:GPI 0 "register_operand" "=r")
>>>>>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>>>>> (match_operand:QI 2
>>>>>> "aarch64_shift_imm_<mode>" "n"))
>>>>>> (match_operand:GPI 3 "register_operand" "r")))]
>>>>>>
>>>>>> Which is a non-flag setting add with shifted operand.
>>>>>>
>>>>>>> Either way I think it is another backend issue, so do you approve that
>>>>>>> I commit this patch now?
>>>>>>
>>>>>> Not yet. I think there's something fundamental amiss here.
>>>>>>
>>>>>> BTW, it looks to me as though addptr<m>3 should have exactly the same
>>>>>> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
>>>>>> shifted operand shouldn't be supported there either. If that isn't the
>>>>>> case then that should be clearly called out in the documentation.
>>>>>>
>>>>>> R.
>>>>>>
>>>>>
>>>>> PS.
>>>>>
>>>>> I presume you are aware of the canonicalization rules for add? That is,
>>>>> for a shift-and-add operation, the shift operand must appear first. Ie.
>>>>>
>>>>> (plus (shift (op, op)), op)
>>>>>
>>>>> not
>>>>>
>>>>> (plus (op, (shift (op, op))
>>>>
>>>> Hi Richard,
>>>> Thanks for the comments. I realized that the not-recognized insn
>>>> issue is because the original patch build non-canonical expressions.
>>>> When reloading address expression, LRA generates non-canonical
>>>> register scaled insn, which can't be recognized by aarch64 backend.
>>>>
>>>> Here is the updated patch using canonical form pattern, it passes
>>>> bootstrap and regression test. Well, the ivo failure still exists,
>>>> but it analyzed in the original message.
>>>>
>>>> Is this patch OK?
>>>>
>>>> As for Jiong's concern about the additional extension instruction, I
>>>> think this only stands for atmoic load store instructions. For
>>>> general load store, AArch64 supports zext/sext in register scaling
>>>> addressing mode, the additional instruction can be forward propagated
>>>> into memory reference. The problem for atomic load store is AArch64
>>>> only supports direct register addressing mode. After LRA reloads
>>>> address expression out of memory reference, there is no combine/fwprop
>>>> optimizer to merge instructions. The problem is atomic_store's
>>>> predicate doesn't match its constraint. The predicate used for
>>>> atomic_store<mode> is memory_operand, while all other atomic patterns
>>>> use aarch64_sync_memory_operand. I think this might be a typo. With
>>>> this change, expand will not generate addressing mode requiring reload
>>>> anymore. I will test another patch fixing this.
>>>>
>>>> Thanks,
>>>> bin
>>>
>>> Some comments inline.
>>>
>>>>>
>>>>> R.
>>>>>
>>>>> aarch64_legitimize_addr-20151128.txt
>>>>>
>>>>>
>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>>> index 3fe2f0f..5b3e3c4 100644
>>>>> --- a/gcc/config/aarch64/aarch64.c
>>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode)
>>>>> We try to pick as large a range for the offset as possible to
>>>>> maximize the chance of a CSE. However, for aligned addresses
>>>>> we limit the range to 4k so that structures with different sized
>>>>> - elements are likely to use the same base. */
>>>>> + elements are likely to use the same base. We need to be careful
>>>>> + not split CONST for some forms address expressions, otherwise it
>>>
>>> not to split a CONST for some forms of address expression,
>>>
>>>>> + will generate sub-optimal code. */
>>>>>
>>>>> if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>>>> {
>>>>> HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>>> HOST_WIDE_INT base_offset;
>>>>>
>>>>> + if (GET_CODE (XEXP (x, 0)) == PLUS)
>>>>> + {
>>>>> + rtx op0 = XEXP (XEXP (x, 0), 0);
>>>>> + rtx op1 = XEXP (XEXP (x, 0), 1);
>>>>> +
>>>>> + /* For addr expression in the form like "r1 + r2 + 0x3ffc".
>>>>> + Since the offset is within range supported by addressing
>>>>> + mode "reg+offset", we don't split the const and legalize
>>>>> + it into below insn and expr sequence:
>>>>> + r3 = r1 + r2;
>>>>> + "r3 + 0x3ffc". */
>>>
>>> I think this comment would read better as
>>>
>>> /* Address expressions of the form Ra + Rb + CONST.
>>>
>>> If CONST is within the range supported by the addressing
>>> mode "reg+offset", do not split CONST and use the
>>> sequence
>>> Rt = Ra + Rb
>>> addr = Rt + CONST. */
>>>
>>>>> + if (REG_P (op0) && REG_P (op1))
>>>>> + {
>>>>> + machine_mode addr_mode = GET_MODE (x);
>>>>> + rtx base = gen_reg_rtx (addr_mode);
>>>>> + rtx addr = plus_constant (addr_mode, base, offset);
>>>>> +
>>>>> + if (aarch64_legitimate_address_hook_p (mode, addr, false))
>>>>> + {
>>>>> + emit_insn (gen_adddi3 (base, op0, op1));
>>>>> + return addr;
>>>>> + }
>>>>> + }
>>>>> + /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
>>>>> + Live above, we don't split the const and legalize it into
>>>>> + below insn and expr sequence:
>>>
>>> Similarly.
>>>>> + r3 = 0x3ffc;
>>>>> + r4 = r1 + r3;
>>>>> + "r4 + r2<<2". */
>>>
>>> Why don't we generate
>>>
>>> r3 = r1 + r2 << 2
>>> r4 = r3 + 0x3ffc
>>>
>>> utilizing the shift-and-add instructions?
>>
>> All other comments are addressed in the attached new patch.
>> As for this question, Wilco also asked it on internal channel before.
>> The main idea is to depend on GIMPLE IVO/SLSR to find CSE
>> opportunities of the scaled plus sub expr. The scaled index is most
>> likely loop iv, so I would like to split const plus out of memory
>> reference so that it can be identified/hoisted as loop invariant.
>> This is more important when base is sfp related.
>>
>
> Ah, yes. The SFP problem.
>
> Since at least two people have queried this, it's clearly non-obvious
> enough to require explanation in comment.
>
> OK with that change and a suitable changelog entry.
Given your review comments, I am going to applying attached patch
along with below Changelog entry.
Thanks,
bin
2015-12-04 Bin Cheng <bin.cheng@arm.com>
Jiong Wang <jiong.wang@arm.com>
* config/aarch64/aarch64.c (aarch64_legitimize_address): legitimize
address expressions like Ra + Rb + CONST and Ra + Rb<<SCALE + CONST.
[-- Attachment #2: aarch64_legitimize_addr-20151204.txt --]
[-- Type: text/plain, Size: 2943 bytes --]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3fe2f0f..7cfdda1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4757,13 +4757,75 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode)
We try to pick as large a range for the offset as possible to
maximize the chance of a CSE. However, for aligned addresses
we limit the range to 4k so that structures with different sized
- elements are likely to use the same base. */
+ elements are likely to use the same base. We need to be careful
+ not to split a CONST for some forms of address expression, otherwise
+ it will generate sub-optimal code. */
if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
{
HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
HOST_WIDE_INT base_offset;
+ if (GET_CODE (XEXP (x, 0)) == PLUS)
+ {
+ rtx op0 = XEXP (XEXP (x, 0), 0);
+ rtx op1 = XEXP (XEXP (x, 0), 1);
+
+ /* Address expressions of the form Ra + Rb + CONST.
+
+ If CONST is within the range supported by the addressing
+ mode "reg+offset", do not split CONST and use the
+ sequence
+ Rt = Ra + Rb;
+ addr = Rt + CONST. */
+ if (REG_P (op0) && REG_P (op1))
+ {
+ machine_mode addr_mode = GET_MODE (x);
+ rtx base = gen_reg_rtx (addr_mode);
+ rtx addr = plus_constant (addr_mode, base, offset);
+
+ if (aarch64_legitimate_address_hook_p (mode, addr, false))
+ {
+ emit_insn (gen_adddi3 (base, op0, op1));
+ return addr;
+ }
+ }
+ /* Address expressions of the form Ra + Rb<<SCALE + CONST.
+
+ If Reg + Rb<<SCALE is a valid address expression, do not
+ split CONST and use the sequence
+ Rc = CONST;
+ Rt = Ra + Rc;
+ addr = Rt + Rb<<SCALE.
+
+ Here we split CONST out of memory referece because:
+ a) We depend on GIMPLE optimizers to pick up common sub
+ expression involving the scaling operation.
+ b) The index Rb is likely a loop iv, it's better to split
+ the CONST so that computation of new base Rt is a loop
+ invariant and can be moved out of loop. This is more
+ important when the original base Ra is sfp related. */
+ else if (REG_P (op0) || REG_P (op1))
+ {
+ machine_mode addr_mode = GET_MODE (x);
+ rtx base = gen_reg_rtx (addr_mode);
+
+ /* Switch to make sure that register is in op0. */
+ if (REG_P (op1))
+ std::swap (op0, op1);
+
+ rtx addr = gen_rtx_PLUS (addr_mode, op1, base);
+
+ if (aarch64_legitimate_address_hook_p (mode, addr, false))
+ {
+ base = force_operand (plus_constant (addr_mode,
+ op0, offset),
+ NULL_RTX);
+ return gen_rtx_PLUS (addr_mode, op1, base);
+ }
+ }
+ }
+
/* Does it look like we'll need a load/store-pair operation? */
if (GET_MODE_SIZE (mode) > 16
|| mode == TImode)
next prev parent reply other threads:[~2015-12-04 3:18 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-17 9:21 Bin Cheng
2015-11-17 10:08 ` James Greenhalgh
2015-11-19 2:32 ` Bin.Cheng
2015-11-20 8:31 ` Bin.Cheng
2015-11-20 17:39 ` Richard Earnshaw
2015-11-24 3:23 ` Bin.Cheng
2015-11-24 9:59 ` Richard Earnshaw
2015-11-24 10:21 ` Richard Earnshaw
2015-11-24 13:13 ` Jiong Wang
2015-11-24 13:29 ` Richard Earnshaw
2015-11-24 14:39 ` Jiong Wang
2015-11-24 14:55 ` Richard Earnshaw
2015-12-01 3:19 ` Bin.Cheng
2015-12-01 10:25 ` Richard Earnshaw
2015-12-03 5:26 ` Bin.Cheng
2015-12-03 10:26 ` Richard Earnshaw
2015-12-04 3:18 ` Bin.Cheng [this message]
2015-11-25 4:53 ` Bin.Cheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHFci28yNaR3R-HYq+BrxvU00zrOEQubBrTJf+3bvu7jfuewgQ@mail.gmail.com \
--to=amker.cheng@gmail.com \
--cc=Richard.Earnshaw@foss.arm.com \
--cc=bin.cheng@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=james.greenhalgh@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).