From: James Greenhalgh <james.greenhalgh@arm.com>
To: Kyrill Tkachov <kyrylo.tkachov@arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
Marcus Shawcroft <marcus.shawcroft@arm.com>,
Richard Earnshaw <Richard.Earnshaw@arm.com>
Subject: Re: [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset
Date: Thu, 12 Nov 2015 12:05:00 -0000 [thread overview]
Message-ID: <20151112120543.GA22716@arm.com> (raw)
In-Reply-To: <5638D61C.5060100@arm.com>
On Tue, Nov 03, 2015 at 03:43:24PM +0000, Kyrill Tkachov wrote:
> Hi all,
>
> Bootstrapped and tested on aarch64.
>
> Ok for trunk?
Comments in-line.
>
> Thanks,
> Kyrill
>
>
> 2015-11-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * config/aarch64/aarch64.md (*condjump): Rename to...
> (condjump): ... This.
> (*compare_condjump<mode>): New define_insn_and_split.
> (*compare_cstore<mode>_insn): Likewise.
> (*cstore<mode>_insn): Rename to...
> (aarch64_cstore<mode>): ... This.
> * config/aarch64/iterators.md (CMP): Handle ne code.
> * config/aarch64/predicates.md (aarch64_imm24): New predicate.
>
> 2015-11-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/aarch64/cmpimm_branch_1.c: New test.
> * gcc.target/aarch64/cmpimm_cset_1.c: Likewise.
> commit 7df013a391532f39932b80c902e3b4bbd841710f
> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> Date: Mon Sep 21 10:56:47 2015 +0100
>
> [AArch64] Improve comparison with complex immediates
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 126c9c2..1bfc870 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -369,7 +369,7 @@ (define_expand "mod<mode>3"
> }
> )
>
> -(define_insn "*condjump"
> +(define_insn "condjump"
> [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
> [(match_operand 1 "cc_register" "") (const_int 0)])
> (label_ref (match_operand 2 "" ""))
> @@ -394,6 +394,40 @@ (define_insn "*condjump"
> (const_int 1)))]
> )
>
> +;; For a 24-bit immediate CST we can optimize the compare for equality
> +;; and branch sequence from:
> +;; mov x0, #imm1
> +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */
> +;; cmp x1, x0
> +;; b<ne,eq> .Label
This would be easier on the eyes if you were to indent the code sequence.
+;; and branch sequence from:
+;; mov x0, #imm1
+;; movk x0, #imm2, lsl 16 /* x0 contains CST. */
+;; cmp x1, x0
+;; b<ne,eq> .Label
+;; into the shorter:
+;; sub x0, #(CST & 0xfff000)
> +;; into the shorter:
> +;; sub x0, #(CST & 0xfff000)
> +;; subs x0, #(CST & 0x000fff)
These instructions are not valid (2 operand sub/subs?) can you write them
out fully for this comment so I can see the data flow?
> +;; b<ne,eq> .Label
> +(define_insn_and_split "*compare_condjump<mode>"
> + [(set (pc) (if_then_else (EQL
> + (match_operand:GPI 0 "register_operand" "r")
> + (match_operand:GPI 1 "aarch64_imm24" "n"))
> + (label_ref:P (match_operand 2 "" ""))
> + (pc)))]
> + "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode)
> + && !aarch64_plus_operand (operands[1], <MODE>mode)"
> + "#"
> + "&& true"
> + [(const_int 0)]
> + {
> + HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
> + HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
> + rtx tmp = gen_reg_rtx (<MODE>mode);
Can you guarantee we can always create this pseudo? What if we're a
post-register-allocation split?
> + emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm)));
> + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
> + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
> + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx);
> + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
> + DONE;
> + }
> +)
> +
> (define_expand "casesi"
> [(match_operand:SI 0 "register_operand" "") ; Index
> (match_operand:SI 1 "const_int_operand" "") ; Lower bound
> @@ -2898,7 +2932,7 @@ (define_expand "cstore<mode>4"
> "
> )
>
> -(define_insn "*cstore<mode>_insn"
> +(define_insn "aarch64_cstore<mode>"
> [(set (match_operand:ALLI 0 "register_operand" "=r")
> (match_operator:ALLI 1 "aarch64_comparison_operator"
> [(match_operand 2 "cc_register" "") (const_int 0)]))]
> @@ -2907,6 +2941,39 @@ (define_insn "*cstore<mode>_insn"
> [(set_attr "type" "csel")]
> )
>
> +;; For a 24-bit immediate CST we can optimize the compare for equality
> +;; and branch sequence from:
> +;; mov x0, #imm1
> +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */
> +;; cmp x1, x0
> +;; cset x2, <ne,eq>
> +;; into the shorter:
> +;; sub x0, #(CST & 0xfff000)
> +;; subs x0, #(CST & 0x000fff)
> +;; cset x1, <ne, eq>.
Same comments as above regarding formatting and making this a valid set
of instructions.
> +(define_insn_and_split "*compare_cstore<mode>_insn"
> + [(set (match_operand:GPI 0 "register_operand" "=r")
> + (EQL:GPI (match_operand:GPI 1 "register_operand" "r")
> + (match_operand:GPI 2 "aarch64_imm24" "n")))]
> + "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode)
> + && !aarch64_plus_operand (operands[2], <MODE>mode)"
> + "#"
> + "&& true"
> + [(const_int 0)]
> + {
> + HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff;
> + HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000;
> + rtx tmp = gen_reg_rtx (<MODE>mode);
> + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm)));
> + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
> + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
> + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx);
> + emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg));
> + DONE;
> + }
> + [(set_attr "type" "csel")]
> +)
> +
> ;; zero_extend version of the above
> (define_insn "*cstoresi_insn_uxtw"
> [(set (match_operand:DI 0 "register_operand" "=r")
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index c4a1c98..9f63ef2 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -801,7 +801,7 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (ge "2") (gt "2")
> (ltu "1") (leu "1") (geu "2") (gtu "2")])
>
> (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT")
> - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")])
> + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") (gtu "GTU")])
>
> (define_code_attr fix_trunc_optab [(fix "fix_trunc")
> (unsigned_fix "fixuns_trunc")])
> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
> index 046f852..1bcbf62 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3"
> (and (match_code "const_int")
> (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4")))
>
> +;; An immediate that fits into 24 bits.
> +(define_predicate "aarch64_imm24"
> + (and (match_code "const_int")
> + (match_test "(UINTVAL (op) & 0xffffff) == UINTVAL (op)")))
> +
IN_RANGE (UINTVAL (op), 0, 0xffffff) ?
We use quite a few different ways to check an immediate fits in a particular
range in the AArch64 backend, it would be good to pick just one idiomatic
way.
> (define_predicate "aarch64_pwr_imm3"
> (and (match_code "const_int")
> (match_test "INTVAL (op) != 0
> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c
> new file mode 100644
> index 0000000..d7a8d5b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-save-temps -O2" } */
> +
> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */
> +
This just tests that we don't emit cmp, it doesn't test anything else.
> +void g (void);
> +void
> +foo (int x)
> +{
> + if (x != 0x123456)
> + g ();
> +}
> +
> +void
> +fool (long long x)
> +{
> + if (x != 0x123456)
> + g ();
> +}
> +
> +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */
> +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c
> new file mode 100644
> index 0000000..619c026
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-save-temps -O2" } */
> +
> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */
Likewise, I don't see any checks for sub/subs.
> +
> +int
> +foo (int x)
> +{
> + return x == 0x123456;
> +}
> +
> +long
> +fool (long x)
> +{
> + return x == 0x123456;
> +}
> +
> +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */
> +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */
Thanks,
James
next prev parent reply other threads:[~2015-11-12 12:05 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-03 15:43 Kyrill Tkachov
2015-11-11 10:43 ` Kyrill Tkachov
2015-11-12 12:05 ` James Greenhalgh [this message]
2015-11-23 10:36 ` Kyrill Tkachov
2015-11-23 14:59 ` James Greenhalgh
2015-11-23 15:06 ` Kyrill Tkachov
2015-11-24 9:44 ` Kyrill Tkachov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151112120543.GA22716@arm.com \
--to=james.greenhalgh@arm.com \
--cc=Richard.Earnshaw@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=kyrylo.tkachov@arm.com \
--cc=marcus.shawcroft@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).