From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-409579-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 93846 invoked by alias); 8 Oct 2015 08:54:19 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 93828 invoked by uid 89); 8 Oct 2015 08:54:18 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 08 Oct 2015 08:54:17 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-28-M3kh7biCQouAWGlHw6lRDQ-1; Thu, 08 Oct 2015 09:54:11 +0100
Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Thu, 8 Oct 2015 09:54:11 +0100
Message-ID: <56162F33.6060103@arm.com>
Date: Thu, 08 Oct 2015 08:54:00 -0000
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: Marcus Shawcroft <marcus.shawcroft@arm.com>,  Richard Earnshaw <Richard.Earnshaw@arm.com>, James Greenhalgh <james.greenhalgh@arm.com>
Subject: [PATCH][AArch64] Improve comparison with complex immediates followed by branch/cset
X-MC-Unique: M3kh7biCQouAWGlHw6lRDQ-1
Content-Type: multipart/mixed; boundary="------------060409050209000808080708"
X-IsSubscribed: yes
X-SW-Source: 2015-10/txt/msg00800.txt.bz2

This is a multi-part message in MIME format.
--------------060409050209000808080708
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-length: 2319

Hi all,

This patch slightly improves sequences where we want to compare against a c=
omplex immediate and branch against the result
or perform a cset on it.
This means transforming sequences of mov+movk+cmp+branch into sub+subs+bran=
ch.
Similar for cset. Unfortunately I can't just do this by simply matching a (=
compare (reg) (const_int)) rtx because
this transformation is only valid for equal/not equal comparisons, not grea=
ter than/less than ones but the compare instruction
pattern only has the general CC mode. We need to also match the use of the =
condition code.

I've done this by creating a splitter for the conditional jump where the co=
ndition is the comparison between the register
and the complex immediate and splitting it into the sub+subs+condjump seque=
nce. Similar for the cstore pattern.
Thankfully we don't split immediate moves until later in the optimization p=
ipeline so combine can still try the right patterns.
With this patch for the example code:
void g(void);
void f8(int x)
{
    if (x !=3D 0x123456) g();
}

I get:
f8:
         sub     w0, w0, #1191936
         subs    w0, w0, #1110
         beq     .L1
         b       g
         .p2align 3
.L1:
         ret

instead of the previous:
f8:
         mov     w1, 13398
         movk    w1, 0x12, lsl 16
         cmp     w0, w1
         beq     .L1
         b       g
         .p2align 3
.L1:
         ret


The condjump case triggered 130 times across all of SPEC2006 which is, admi=
ttedly, not much
whereas the cstore case didn't trigger at all. However, the included testca=
se in the patch
demonstrates the kind of code that it would trigger on.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill


2015-10-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md (*condjump): Rename to...
     (condjump): ... This.
     (*compare_condjump<mode>): New define_insn_and_split.
     (*compare_cstore<mode>_insn): Likewise.
     (*cstore<mode>_insn): Rename to...
     (cstore<mode>_insn): ... This.
     * config/aarch64/iterators.md (CMP): Handle ne code.
     * config/aarch64/predicates.md (aarch64_imm24): New predicate.

2015-10-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/cmpimm_branch_1.c: New test.
     * gcc.target/aarch64/cmpimm_cset_1.c: Likewise.

--------------060409050209000808080708
Content-Type: text/x-patch; name=aarch64-cmp-imm.patch
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
 filename="aarch64-cmp-imm.patch"
Content-length: 6482

commit 0c1530fab4c3979fb287f3b960f110e857df79b6
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Mon Sep 21 10:56:47 2015 +0100

    [AArch64] Improve comparison with complex immediates
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 83ea74a..acda64f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -369,7 +369,7 @@ (define_expand "mod<mode>3"
   }
 )
=20
-(define_insn "*condjump"
+(define_insn "condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
 			    [(match_operand 1 "cc_register" "") (const_int 0)])
 			   (label_ref (match_operand 2 "" ""))
@@ -394,6 +394,42 @@ (define_insn "*condjump"
 		      (const_int 1)))]
 )
=20
+
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov 	x0, #imm1
+;; movk	x0, #imm2, lsl 16 // x0 contains CST
+;; cmp	x1, x0
+;; b<ne,eq> .Label
+;; into the shorter:
+;; sub 	x0, #(CST & 0xfff000)
+;; subs	x0, #(CST & 0x000fff)
+;; b<ne,eq> .Label
+(define_insn_and_split "*compare_condjump<mode>"
+  [(set (pc) (if_then_else (EQL
+			      (match_operand:GPI 0 "register_operand" "r")
+			      (match_operand:GPI 1 "aarch64_imm24" "n"))
+			   (label_ref:DI (match_operand 2 "" ""))
+			   (pc)))]
+  "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode)
+   && !aarch64_plus_operand (operands[1], <MODE>mode)"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+    HOST_WIDE_INT lo_imm =3D UINTVAL (operands[1]) & 0xfff;
+    HOST_WIDE_INT hi_imm =3D UINTVAL (operands[1]) & 0xfff000;
+    rtx tmp =3D gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm)));
+    emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+    rtx cc_reg =3D gen_rtx_REG (CC_NZmode, CC_REGNUM);
+    rtx cmp_rtx =3D gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_=
rtx);
+    emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+    DONE;
+  }
+)
+
+
 (define_expand "casesi"
   [(match_operand:SI 0 "register_operand" "")	; Index
    (match_operand:SI 1 "const_int_operand" "")	; Lower bound
@@ -2894,7 +2930,7 @@ (define_expand "cstore<mode>4"
   "
 )
=20
-(define_insn "*cstore<mode>_insn"
+(define_insn "cstore<mode>_insn"
   [(set (match_operand:ALLI 0 "register_operand" "=3Dr")
 	(match_operator:ALLI 1 "aarch64_comparison_operator"
 	 [(match_operand 2 "cc_register" "") (const_int 0)]))]
@@ -2903,6 +2939,39 @@ (define_insn "*cstore<mode>_insn"
   [(set_attr "type" "csel")]
 )
=20
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov 	x0, #imm1
+;; movk	x0, #imm2, lsl 16 // x0 contains CST
+;; cmp	x1, x0
+;; cset x2, <ne,eq>
+;; into the shorter:
+;; sub 	x0, #(CST & 0xfff000)
+;; subs	x0, #(CST & 0x000fff)
+;; cset x1, <ne, eq>.
+(define_insn_and_split "*compare_cstore<mode>_insn"
+  [(set (match_operand:GPI 0 "register_operand" "=3Dr")
+	 (EQL:GPI (match_operand:GPI 1 "register_operand" "r")
+		  (match_operand:GPI 2 "aarch64_imm24" "n")))]
+  "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode)
+   && !aarch64_plus_operand (operands[2], <MODE>mode)"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+    HOST_WIDE_INT lo_imm =3D UINTVAL (operands[2]) & 0xfff;
+    HOST_WIDE_INT hi_imm =3D UINTVAL (operands[2]) & 0xfff000;
+    rtx tmp =3D gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm)));
+    emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+    rtx cc_reg =3D gen_rtx_REG (CC_NZmode, CC_REGNUM);
+    rtx cmp_rtx =3D gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_=
rtx);
+    emit_insn (gen_cstore<mode>_insn (operands[0], cmp_rtx, cc_reg));
+    DONE;
+  }
+  [(set_attr "type" "csel")]
+)
+
 ;; zero_extend version of the above
 (define_insn "*cstoresi_insn_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=3Dr")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators=
.md
index a1436ac..8b2663b 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -798,7 +798,7 @@ (define_code_attr cmp_2   [(lt "1") (le "1") (eq "2") (=
ge "2") (gt "2")
 			   (ltu "1") (leu "1") (geu "2") (gtu "2")])
=20
 (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT")
-			   (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")])
+			   (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") (gtu "GTU")])
=20
 (define_code_attr fix_trunc_optab [(fix "fix_trunc")
 				   (unsigned_fix "fixuns_trunc")])
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicat=
es.md
index 7b852a4..1b62432 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -138,6 +138,11 @@ (define_predicate "aarch64_imm3"
   (and (match_code "const_int")
        (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <=3D 4")))
=20
+;; An immediate that fits into 24 bits.
+(define_predicate "aarch64_imm24"
+  (and (match_code "const_int")
+       (match_test "(UINTVAL (op) & 0xffffff) =3D=3D UINTVAL (op)")))
+
 (define_predicate "aarch64_pwr_imm3"
   (and (match_code "const_int")
        (match_test "INTVAL (op) !=3D 0
diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/tests=
uite/gcc.target/aarch64/cmpimm_branch_1.c
new file mode 100644
index 0000000..d7a8d5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+/* Test that we emit a sub+subs sequence rather than mov+movk+cmp.  */
+
+void g (void);
+void
+foo (int x)
+{
+  if (x !=3D 0x123456)
+    g ();
+}
+
+void
+fool (long long x)
+{
+  if (x !=3D 0x123456)
+    g ();
+}
+
+/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsui=
te/gcc.target/aarch64/cmpimm_cset_1.c
new file mode 100644
index 0000000..619c026
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+/* Test that we emit a sub+subs sequence rather than mov+movk+cmp.  */
+
+int
+foo (int x)
+{
+  return x =3D=3D 0x123456;
+}
+
+long
+fool (long x)
+{
+  return x =3D=3D 0x123456;
+}
+
+/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */

--------------060409050209000808080708--