From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
To: James Greenhalgh <james.greenhalgh@arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
Marcus Shawcroft <marcus.shawcroft@arm.com>,
Richard Earnshaw <Richard.Earnshaw@arm.com>
Subject: Re: [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset
Date: Mon, 23 Nov 2015 15:06:00 -0000 [thread overview]
Message-ID: <56532A43.3000500@arm.com> (raw)
In-Reply-To: <20151123145800.GB14088@arm.com>
On 23/11/15 14:58, James Greenhalgh wrote:
> On Mon, Nov 23, 2015 at 10:33:01AM +0000, Kyrill Tkachov wrote:
>> On 12/11/15 12:05, James Greenhalgh wrote:
>>> On Tue, Nov 03, 2015 at 03:43:24PM +0000, Kyrill Tkachov wrote:
>>>> Hi all,
>>>>
>>>> Bootstrapped and tested on aarch64.
>>>>
>>>> Ok for trunk?
>>> Comments in-line.
>>>
>> Here's an updated patch according to your comments.
>> Sorry it took so long to respin it, had other things to deal with with
>> stage1 closing...
>>
>> I've indented the sample code sequences and used valid mnemonics.
>> These patterns can only match during combine, so I'd expect them to always
>> split during combine or immediately after, but I don't think that's a documented
>> guarantee so I've gated them on !reload_completed.
>>
>> I've used IN_RANGE in the predicate.md hunk and added scan-assembler checks
>> in the tests.
>>
>> Is this ok?
>>
>> Thanks,
>> Kyrill
>>
>> 2015-11-20 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>> * config/aarch64/aarch64.md (*condjump): Rename to...
>> (condjump): ... This.
>> (*compare_condjump<mode>): New define_insn_and_split.
>> (*compare_cstore<mode>_insn): Likewise.
>> (*cstore<mode>_insn): Rename to...
>> (cstore<mode>_insn): ... This.
>> * config/aarch64/iterators.md (CMP): Handle ne code.
>> * config/aarch64/predicates.md (aarch64_imm24): New predicate.
>>
>> 2015-11-20 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>> * gcc.target/aarch64/cmpimm_branch_1.c: New test.
>> * gcc.target/aarch64/cmpimm_cset_1.c: Likewise.
>>
>> commit bb44feed4e6beaae25d9bdffa45073dc61c65838
>> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>> Date: Mon Sep 21 10:56:47 2015 +0100
>>
>> [AArch64] Improve comparison with complex immediates
>>
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index 11f6387..3e57d08 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -372,7 +372,7 @@ (define_expand "mod<mode>3"
>> }
>> )
>>
>> -(define_insn "*condjump"
>> +(define_insn "condjump"
>> [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
>> [(match_operand 1 "cc_register" "") (const_int 0)])
>> (label_ref (match_operand 2 "" ""))
>> @@ -397,6 +397,41 @@ (define_insn "*condjump"
>> (const_int 1)))]
>> )
>>
>> +;; For a 24-bit immediate CST we can optimize the compare for equality
>> +;; and branch sequence from:
>> +;; mov x0, #imm1
>> +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */
>> +;; cmp x1, x0
>> +;; b<ne,eq> .Label
>> +;; into the shorter:
>> +;; sub x0, x0, #(CST & 0xfff000)
>> +;; subs x0, x0, #(CST & 0x000fff)
> sub x0, x1, #(CST....) ?
>
> The transform doesn't make sense otherwise.
Doh, yes. The source should be x1 of course.
Kyrill
>
>> +;; b<ne,eq> .Label
>> +(define_insn_and_split "*compare_condjump<mode>"
>> + [(set (pc) (if_then_else (EQL
>> + (match_operand:GPI 0 "register_operand" "r")
>> + (match_operand:GPI 1 "aarch64_imm24" "n"))
>> + (label_ref:P (match_operand 2 "" ""))
>> + (pc)))]
>> + "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode)
>> + && !aarch64_plus_operand (operands[1], <MODE>mode)
>> + && !reload_completed"
>> + "#"
>> + "&& true"
>> + [(const_int 0)]
>> + {
>> + HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
>> + HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
>> + rtx tmp = gen_reg_rtx (<MODE>mode);
>> + emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm)));
>> + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
>> + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
>> + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx);
>> + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
>> + DONE;
>> + }
>> +)
>> +
>> (define_expand "casesi"
>> [(match_operand:SI 0 "register_operand" "") ; Index
>> (match_operand:SI 1 "const_int_operand" "") ; Lower bound
>> @@ -2901,7 +2936,7 @@ (define_expand "cstore<mode>4"
>> "
>> )
>>
>> -(define_insn "*cstore<mode>_insn"
>> +(define_insn "aarch64_cstore<mode>"
>> [(set (match_operand:ALLI 0 "register_operand" "=r")
>> (match_operator:ALLI 1 "aarch64_comparison_operator"
>> [(match_operand 2 "cc_register" "") (const_int 0)]))]
>> @@ -2910,6 +2945,40 @@ (define_insn "*cstore<mode>_insn"
>> [(set_attr "type" "csel")]
>> )
>>
>> +;; For a 24-bit immediate CST we can optimize the compare for equality
>> +;; and branch sequence from:
>> +;; mov x0, #imm1
>> +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */
>> +;; cmp x1, x0
>> +;; cset x2, <ne,eq>
>> +;; into the shorter:
>> +;; sub x0, x0, #(CST & 0xfff000)
>> +;; subs x0, x0, #(CST & 0x000fff)
>> +;; cset x1, <ne, eq>.
> Please fix the register allocation in your shorter sequence, these
> are not equivalent.
>
>> +(define_insn_and_split "*compare_cstore<mode>_insn"
>> + [(set (match_operand:GPI 0 "register_operand" "=r")
>> + (EQL:GPI (match_operand:GPI 1 "register_operand" "r")
>> + (match_operand:GPI 2 "aarch64_imm24" "n")))]
>> + "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode)
>> + && !aarch64_plus_operand (operands[2], <MODE>mode)
>> + && !reload_completed"
>> + "#"
>> + "&& true"
>> + [(const_int 0)]
>> + {
>> + HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff;
>> + HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000;
>> + rtx tmp = gen_reg_rtx (<MODE>mode);
>> + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm)));
>> + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
>> + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
>> + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx);
>> + emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg));
>> + DONE;
>> + }
>> + [(set_attr "type" "csel")]
>> +)
>> +
>> ;; zero_extend version of the above
>> (define_insn "*cstoresi_insn_uxtw"
>> [(set (match_operand:DI 0 "register_operand" "=r")
>> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
>> index c2eb7de..422bc87 100644
>> --- a/gcc/config/aarch64/iterators.md
>> +++ b/gcc/config/aarch64/iterators.md
>> @@ -824,7 +824,8 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (ge "2") (gt "2")
>> (ltu "1") (leu "1") (geu "2") (gtu "2")])
>>
>> (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT")
>> - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")])
>> + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU")
>> + (gtu "GTU")])
>>
>> (define_code_attr fix_trunc_optab [(fix "fix_trunc")
>> (unsigned_fix "fixuns_trunc")])
>> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
>> index e7f76e0..c0c3ff5 100644
>> --- a/gcc/config/aarch64/predicates.md
>> +++ b/gcc/config/aarch64/predicates.md
>> @@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3"
>> (and (match_code "const_int")
>> (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4")))
>>
>> +;; An immediate that fits into 24 bits.
>> +(define_predicate "aarch64_imm24"
>> + (and (match_code "const_int")
>> + (match_test "IN_RANGE (UINTVAL (op), 0, 0xffffff)")))
>> +
>> (define_predicate "aarch64_pwr_imm3"
>> (and (match_code "const_int")
>> (match_test "INTVAL (op) != 0
>> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c
>> new file mode 100644
>> index 0000000..7ad736b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-save-temps -O2" } */
>> +
>> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */
>> +
>> +void g (void);
>> +void
>> +foo (int x)
>> +{
>> + if (x != 0x123456)
>> + g ();
>> +}
>> +
>> +void
>> +fool (long long x)
>> +{
>> + if (x != 0x123456)
>> + g ();
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */
>> +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */
>> +/* { dg-final { scan-assembler-times "sub\tw\[0-9\]+.*" 1 } } */
>> +/* { dg-final { scan-assembler-times "sub\tx\[0-9\]+.*" 1 } } */
>> +/* { dg-final { scan-assembler-times "subs\tw\[0-9\]+.*" 1 } } */
>> +/* { dg-final { scan-assembler-times "subs\tx\[0-9\]+.*" 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c
>> new file mode 100644
>> index 0000000..6a03cc9
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-save-temps -O2" } */
>> +
>> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */
>> +
>> +int
>> +foo (int x)
>> +{
>> + return x == 0x123456;
>> +}
>> +
>> +long
>> +fool (long x)
>> +{
>> + return x == 0x123456;
>> +}
>> +
> This test will be broken for ILP32. This should be long long.
>
> OK with those comments fixed.
Thanks, I'll prepare an updated version.
Kyrill
> Thanks,
> James
>
next prev parent reply other threads:[~2015-11-23 15:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-03 15:43 Kyrill Tkachov
2015-11-11 10:43 ` Kyrill Tkachov
2015-11-12 12:05 ` James Greenhalgh
2015-11-23 10:36 ` Kyrill Tkachov
2015-11-23 14:59 ` James Greenhalgh
2015-11-23 15:06 ` Kyrill Tkachov [this message]
2015-11-24 9:44 ` Kyrill Tkachov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56532A43.3000500@arm.com \
--to=kyrylo.tkachov@arm.com \
--cc=Richard.Earnshaw@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=james.greenhalgh@arm.com \
--cc=marcus.shawcroft@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).