From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 93846 invoked by alias); 8 Oct 2015 08:54:19 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 93828 invoked by uid 89); 8 Oct 2015 08:54:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 08 Oct 2015 08:54:17 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-28-M3kh7biCQouAWGlHw6lRDQ-1; Thu, 08 Oct 2015 09:54:11 +0100 Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 8 Oct 2015 09:54:11 +0100 Message-ID: <56162F33.6060103@arm.com> Date: Thu, 08 Oct 2015 08:54:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: [PATCH][AArch64] Improve comparison with complex immediates followed by branch/cset X-MC-Unique: M3kh7biCQouAWGlHw6lRDQ-1 Content-Type: multipart/mixed; boundary="------------060409050209000808080708" X-IsSubscribed: yes X-SW-Source: 2015-10/txt/msg00800.txt.bz2 This is a multi-part message in MIME format. --------------060409050209000808080708 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-length: 2319 Hi all, This patch slightly improves sequences where we want to compare against a c= omplex immediate and branch against the result or perform a cset on it. This means transforming sequences of mov+movk+cmp+branch into sub+subs+bran= ch. Similar for cset. Unfortunately I can't just do this by simply matching a (= compare (reg) (const_int)) rtx because this transformation is only valid for equal/not equal comparisons, not grea= ter than/less than ones but the compare instruction pattern only has the general CC mode. We need to also match the use of the = condition code. I've done this by creating a splitter for the conditional jump where the co= ndition is the comparison between the register and the complex immediate and splitting it into the sub+subs+condjump seque= nce. Similar for the cstore pattern. Thankfully we don't split immediate moves until later in the optimization p= ipeline so combine can still try the right patterns. With this patch for the example code: void g(void); void f8(int x) { if (x !=3D 0x123456) g(); } I get: f8: sub w0, w0, #1191936 subs w0, w0, #1110 beq .L1 b g .p2align 3 .L1: ret instead of the previous: f8: mov w1, 13398 movk w1, 0x12, lsl 16 cmp w0, w1 beq .L1 b g .p2align 3 .L1: ret The condjump case triggered 130 times across all of SPEC2006 which is, admi= ttedly, not much whereas the cstore case didn't trigger at all. However, the included testca= se in the patch demonstrates the kind of code that it would trigger on. Bootstrapped and tested on aarch64. Ok for trunk? Thanks, Kyrill 2015-10-08 Kyrylo Tkachov * config/aarch64/aarch64.md (*condjump): Rename to... (condjump): ... This. (*compare_condjump): New define_insn_and_split. (*compare_cstore_insn): Likewise. (*cstore_insn): Rename to... (cstore_insn): ... This. * config/aarch64/iterators.md (CMP): Handle ne code. * config/aarch64/predicates.md (aarch64_imm24): New predicate. 2015-10-08 Kyrylo Tkachov * gcc.target/aarch64/cmpimm_branch_1.c: New test. * gcc.target/aarch64/cmpimm_cset_1.c: Likewise. --------------060409050209000808080708 Content-Type: text/x-patch; name=aarch64-cmp-imm.patch Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="aarch64-cmp-imm.patch" Content-length: 6482 commit 0c1530fab4c3979fb287f3b960f110e857df79b6 Author: Kyrylo Tkachov Date: Mon Sep 21 10:56:47 2015 +0100 [AArch64] Improve comparison with complex immediates diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 83ea74a..acda64f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -369,7 +369,7 @@ (define_expand "mod3" } ) =20 -(define_insn "*condjump" +(define_insn "condjump" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand 1 "cc_register" "") (const_int 0)]) (label_ref (match_operand 2 "" "")) @@ -394,6 +394,42 @@ (define_insn "*condjump" (const_int 1)))] ) =20 + +;; For a 24-bit immediate CST we can optimize the compare for equality +;; and branch sequence from: +;; mov x0, #imm1 +;; movk x0, #imm2, lsl 16 // x0 contains CST +;; cmp x1, x0 +;; b .Label +;; into the shorter: +;; sub x0, #(CST & 0xfff000) +;; subs x0, #(CST & 0x000fff) +;; b .Label +(define_insn_and_split "*compare_condjump" + [(set (pc) (if_then_else (EQL + (match_operand:GPI 0 "register_operand" "r") + (match_operand:GPI 1 "aarch64_imm24" "n")) + (label_ref:DI (match_operand 2 "" "")) + (pc)))] + "!aarch64_move_imm (INTVAL (operands[1]), mode) + && !aarch64_plus_operand (operands[1], mode)" + "#" + "&& true" + [(const_int 0)] + { + HOST_WIDE_INT lo_imm =3D UINTVAL (operands[1]) & 0xfff; + HOST_WIDE_INT hi_imm =3D UINTVAL (operands[1]) & 0xfff000; + rtx tmp =3D gen_reg_rtx (mode); + emit_insn (gen_add3 (tmp, operands[0], GEN_INT (-hi_imm))); + emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); + rtx cc_reg =3D gen_rtx_REG (CC_NZmode, CC_REGNUM); + rtx cmp_rtx =3D gen_rtx_fmt_ee (, mode, cc_reg, const0_= rtx); + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2])); + DONE; + } +) + + (define_expand "casesi" [(match_operand:SI 0 "register_operand" "") ; Index (match_operand:SI 1 "const_int_operand" "") ; Lower bound @@ -2894,7 +2930,7 @@ (define_expand "cstore4" " ) =20 -(define_insn "*cstore_insn" +(define_insn "cstore_insn" [(set (match_operand:ALLI 0 "register_operand" "=3Dr") (match_operator:ALLI 1 "aarch64_comparison_operator" [(match_operand 2 "cc_register" "") (const_int 0)]))] @@ -2903,6 +2939,39 @@ (define_insn "*cstore_insn" [(set_attr "type" "csel")] ) =20 +;; For a 24-bit immediate CST we can optimize the compare for equality +;; and branch sequence from: +;; mov x0, #imm1 +;; movk x0, #imm2, lsl 16 // x0 contains CST +;; cmp x1, x0 +;; cset x2, +;; into the shorter: +;; sub x0, #(CST & 0xfff000) +;; subs x0, #(CST & 0x000fff) +;; cset x1, . +(define_insn_and_split "*compare_cstore_insn" + [(set (match_operand:GPI 0 "register_operand" "=3Dr") + (EQL:GPI (match_operand:GPI 1 "register_operand" "r") + (match_operand:GPI 2 "aarch64_imm24" "n")))] + "!aarch64_move_imm (INTVAL (operands[2]), mode) + && !aarch64_plus_operand (operands[2], mode)" + "#" + "&& true" + [(const_int 0)] + { + HOST_WIDE_INT lo_imm =3D UINTVAL (operands[2]) & 0xfff; + HOST_WIDE_INT hi_imm =3D UINTVAL (operands[2]) & 0xfff000; + rtx tmp =3D gen_reg_rtx (mode); + emit_insn (gen_add3 (tmp, operands[1], GEN_INT (-hi_imm))); + emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); + rtx cc_reg =3D gen_rtx_REG (CC_NZmode, CC_REGNUM); + rtx cmp_rtx =3D gen_rtx_fmt_ee (, mode, cc_reg, const0_= rtx); + emit_insn (gen_cstore_insn (operands[0], cmp_rtx, cc_reg)); + DONE; + } + [(set_attr "type" "csel")] +) + ;; zero_extend version of the above (define_insn "*cstoresi_insn_uxtw" [(set (match_operand:DI 0 "register_operand" "=3Dr") diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators= .md index a1436ac..8b2663b 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -798,7 +798,7 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (= ge "2") (gt "2") (ltu "1") (leu "1") (geu "2") (gtu "2")]) =20 (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT") - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")]) + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") (gtu "GTU")]) =20 (define_code_attr fix_trunc_optab [(fix "fix_trunc") (unsigned_fix "fixuns_trunc")]) diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicat= es.md index 7b852a4..1b62432 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -138,6 +138,11 @@ (define_predicate "aarch64_imm3" (and (match_code "const_int") (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <=3D 4"))) =20 +;; An immediate that fits into 24 bits. +(define_predicate "aarch64_imm24" + (and (match_code "const_int") + (match_test "(UINTVAL (op) & 0xffffff) =3D=3D UINTVAL (op)"))) + (define_predicate "aarch64_pwr_imm3" (and (match_code "const_int") (match_test "INTVAL (op) !=3D 0 diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/tests= uite/gcc.target/aarch64/cmpimm_branch_1.c new file mode 100644 index 0000000..d7a8d5b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-save-temps -O2" } */ + +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ + +void g (void); +void +foo (int x) +{ + if (x !=3D 0x123456) + g (); +} + +void +fool (long long x) +{ + if (x !=3D 0x123456) + g (); +} + +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsui= te/gcc.target/aarch64/cmpimm_cset_1.c new file mode 100644 index 0000000..619c026 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-save-temps -O2" } */ + +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ + +int +foo (int x) +{ + return x =3D=3D 0x123456; +} + +long +fool (long x) +{ + return x =3D=3D 0x123456; +} + +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ --------------060409050209000808080708--