* [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset @ 2015-11-03 15:43 Kyrill Tkachov 2015-11-11 10:43 ` Kyrill Tkachov 2015-11-12 12:05 ` James Greenhalgh 0 siblings, 2 replies; 7+ messages in thread From: Kyrill Tkachov @ 2015-11-03 15:43 UTC (permalink / raw) To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh [-- Attachment #1: Type: text/plain, Size: 2368 bytes --] Hi all, This patch slightly improves sequences where we want to compare against a complex immediate and branch against the result or perform a cset on it. This means transforming sequences of mov+movk+cmp+branch into sub+subs+branch. Similar for cset. Unfortunately I can't just do this by simply matching a (compare (reg) (const_int)) rtx because this transformation is only valid for equal/not equal comparisons, not greater than/less than ones but the compare instruction pattern only has the general CC mode. We need to also match the use of the condition code. I've done this by creating a splitter for the conditional jump where the condition is the comparison between the register and the complex immediate and splitting it into the sub+subs+condjump sequence. Similar for the cstore pattern. Thankfully we don't split immediate moves until later in the optimization pipeline so combine can still try the right patterns. With this patch for the example code: void g(void); void f8(int x) { if (x != 0x123456) g(); } I get: f8: sub w0, w0, #1191936 subs w0, w0, #1110 beq .L1 b g .p2align 3 .L1: ret instead of the previous: f8: mov w1, 13398 movk w1, 0x12, lsl 16 cmp w0, w1 beq .L1 b g .p2align 3 .L1: ret The condjump case triggered 130 times across all of SPEC2006 which is, admittedly, not much whereas the cstore case didn't trigger at all. However, the included testcase in the patch demonstrates the kind of code that it would trigger on. Bootstrapped and tested on aarch64. Ok for trunk? Thanks, Kyrill 2015-11-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * config/aarch64/aarch64.md (*condjump): Rename to... (condjump): ... This. (*compare_condjump<mode>): New define_insn_and_split. (*compare_cstore<mode>_insn): Likewise. (*cstore<mode>_insn): Rename to... (aarch64_cstore<mode>): ... This. * config/aarch64/iterators.md (CMP): Handle ne code. * config/aarch64/predicates.md (aarch64_imm24): New predicate. 2015-11-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * gcc.target/aarch64/cmpimm_branch_1.c: New test. * gcc.target/aarch64/cmpimm_cset_1.c: Likewise. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: aarch64-cmp-imm.patch --] [-- Type: text/x-patch; name=aarch64-cmp-imm.patch, Size: 6606 bytes --] commit 7df013a391532f39932b80c902e3b4bbd841710f Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> Date: Mon Sep 21 10:56:47 2015 +0100 [AArch64] Improve comparison with complex immediates diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 126c9c2..1bfc870 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -369,7 +369,7 @@ (define_expand "mod<mode>3" } ) -(define_insn "*condjump" +(define_insn "condjump" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand 1 "cc_register" "") (const_int 0)]) (label_ref (match_operand 2 "" "")) @@ -394,6 +394,40 @@ (define_insn "*condjump" (const_int 1)))] ) +;; For a 24-bit immediate CST we can optimize the compare for equality +;; and branch sequence from: +;; mov x0, #imm1 +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ +;; cmp x1, x0 +;; b<ne,eq> .Label +;; into the shorter: +;; sub x0, #(CST & 0xfff000) +;; subs x0, #(CST & 0x000fff) +;; b<ne,eq> .Label +(define_insn_and_split "*compare_condjump<mode>" + [(set (pc) (if_then_else (EQL + (match_operand:GPI 0 "register_operand" "r") + (match_operand:GPI 1 "aarch64_imm24" "n")) + (label_ref:P (match_operand 2 "" "")) + (pc)))] + "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode) + && !aarch64_plus_operand (operands[1], <MODE>mode)" + "#" + "&& true" + [(const_int 0)] + { + HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff; + HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000; + rtx tmp = gen_reg_rtx (<MODE>mode); + emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm))); + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2])); + DONE; + } +) + (define_expand "casesi" [(match_operand:SI 0 "register_operand" "") ; Index (match_operand:SI 1 "const_int_operand" "") ; Lower bound @@ -2898,7 +2932,7 @@ (define_expand "cstore<mode>4" " ) -(define_insn "*cstore<mode>_insn" +(define_insn "aarch64_cstore<mode>" [(set (match_operand:ALLI 0 "register_operand" "=r") (match_operator:ALLI 1 "aarch64_comparison_operator" [(match_operand 2 "cc_register" "") (const_int 0)]))] @@ -2907,6 +2941,39 @@ (define_insn "*cstore<mode>_insn" [(set_attr "type" "csel")] ) +;; For a 24-bit immediate CST we can optimize the compare for equality +;; and branch sequence from: +;; mov x0, #imm1 +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ +;; cmp x1, x0 +;; cset x2, <ne,eq> +;; into the shorter: +;; sub x0, #(CST & 0xfff000) +;; subs x0, #(CST & 0x000fff) +;; cset x1, <ne, eq>. +(define_insn_and_split "*compare_cstore<mode>_insn" + [(set (match_operand:GPI 0 "register_operand" "=r") + (EQL:GPI (match_operand:GPI 1 "register_operand" "r") + (match_operand:GPI 2 "aarch64_imm24" "n")))] + "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode) + && !aarch64_plus_operand (operands[2], <MODE>mode)" + "#" + "&& true" + [(const_int 0)] + { + HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff; + HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000; + rtx tmp = gen_reg_rtx (<MODE>mode); + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm))); + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); + emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg)); + DONE; + } + [(set_attr "type" "csel")] +) + ;; zero_extend version of the above (define_insn "*cstoresi_insn_uxtw" [(set (match_operand:DI 0 "register_operand" "=r") diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index c4a1c98..9f63ef2 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -801,7 +801,7 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (ge "2") (gt "2") (ltu "1") (leu "1") (geu "2") (gtu "2")]) (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT") - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")]) + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") (gtu "GTU")]) (define_code_attr fix_trunc_optab [(fix "fix_trunc") (unsigned_fix "fixuns_trunc")]) diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 046f852..1bcbf62 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3" (and (match_code "const_int") (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4"))) +;; An immediate that fits into 24 bits. +(define_predicate "aarch64_imm24" + (and (match_code "const_int") + (match_test "(UINTVAL (op) & 0xffffff) == UINTVAL (op)"))) + (define_predicate "aarch64_pwr_imm3" (and (match_code "const_int") (match_test "INTVAL (op) != 0 diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c new file mode 100644 index 0000000..d7a8d5b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-save-temps -O2" } */ + +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ + +void g (void); +void +foo (int x) +{ + if (x != 0x123456) + g (); +} + +void +fool (long long x) +{ + if (x != 0x123456) + g (); +} + +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c new file mode 100644 index 0000000..619c026 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-save-temps -O2" } */ + +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ + +int +foo (int x) +{ + return x == 0x123456; +} + +long +fool (long x) +{ + return x == 0x123456; +} + +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset 2015-11-03 15:43 [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset Kyrill Tkachov @ 2015-11-11 10:43 ` Kyrill Tkachov 2015-11-12 12:05 ` James Greenhalgh 1 sibling, 0 replies; 7+ messages in thread From: Kyrill Tkachov @ 2015-11-11 10:43 UTC (permalink / raw) To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh Ping. https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00233.html Thanks, Kyrill On 03/11/15 15:43, Kyrill Tkachov wrote: > Hi all, > > This patch slightly improves sequences where we want to compare against a complex immediate and branch against the result > or perform a cset on it. > This means transforming sequences of mov+movk+cmp+branch into sub+subs+branch. > Similar for cset. Unfortunately I can't just do this by simply matching a (compare (reg) (const_int)) rtx because > this transformation is only valid for equal/not equal comparisons, not greater than/less than ones but the compare instruction > pattern only has the general CC mode. We need to also match the use of the condition code. > > I've done this by creating a splitter for the conditional jump where the condition is the comparison between the register > and the complex immediate and splitting it into the sub+subs+condjump sequence. Similar for the cstore pattern. > Thankfully we don't split immediate moves until later in the optimization pipeline so combine can still try the right patterns. > With this patch for the example code: > void g(void); > void f8(int x) > { > if (x != 0x123456) g(); > } > > I get: > f8: > sub w0, w0, #1191936 > subs w0, w0, #1110 > beq .L1 > b g > .p2align 3 > .L1: > ret > > instead of the previous: > f8: > mov w1, 13398 > movk w1, 0x12, lsl 16 > cmp w0, w1 > beq .L1 > b g > .p2align 3 > .L1: > ret > > > The condjump case triggered 130 times across all of SPEC2006 which is, admittedly, not much > whereas the cstore case didn't trigger at all. However, the included testcase in the patch > demonstrates the kind of code that it would trigger on. > > Bootstrapped and tested on aarch64. > > Ok for trunk? > > Thanks, > Kyrill > > > 2015-11-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * config/aarch64/aarch64.md (*condjump): Rename to... > (condjump): ... This. > (*compare_condjump<mode>): New define_insn_and_split. > (*compare_cstore<mode>_insn): Likewise. > (*cstore<mode>_insn): Rename to... > (aarch64_cstore<mode>): ... This. > * config/aarch64/iterators.md (CMP): Handle ne code. > * config/aarch64/predicates.md (aarch64_imm24): New predicate. > > 2015-11-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * gcc.target/aarch64/cmpimm_branch_1.c: New test. > * gcc.target/aarch64/cmpimm_cset_1.c: Likewise. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset 2015-11-03 15:43 [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset Kyrill Tkachov 2015-11-11 10:43 ` Kyrill Tkachov @ 2015-11-12 12:05 ` James Greenhalgh 2015-11-23 10:36 ` Kyrill Tkachov 1 sibling, 1 reply; 7+ messages in thread From: James Greenhalgh @ 2015-11-12 12:05 UTC (permalink / raw) To: Kyrill Tkachov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw On Tue, Nov 03, 2015 at 03:43:24PM +0000, Kyrill Tkachov wrote: > Hi all, > > Bootstrapped and tested on aarch64. > > Ok for trunk? Comments in-line. > > Thanks, > Kyrill > > > 2015-11-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * config/aarch64/aarch64.md (*condjump): Rename to... > (condjump): ... This. > (*compare_condjump<mode>): New define_insn_and_split. > (*compare_cstore<mode>_insn): Likewise. > (*cstore<mode>_insn): Rename to... > (aarch64_cstore<mode>): ... This. > * config/aarch64/iterators.md (CMP): Handle ne code. > * config/aarch64/predicates.md (aarch64_imm24): New predicate. > > 2015-11-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * gcc.target/aarch64/cmpimm_branch_1.c: New test. > * gcc.target/aarch64/cmpimm_cset_1.c: Likewise. > commit 7df013a391532f39932b80c902e3b4bbd841710f > Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> > Date: Mon Sep 21 10:56:47 2015 +0100 > > [AArch64] Improve comparison with complex immediates > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 126c9c2..1bfc870 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -369,7 +369,7 @@ (define_expand "mod<mode>3" > } > ) > > -(define_insn "*condjump" > +(define_insn "condjump" > [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" > [(match_operand 1 "cc_register" "") (const_int 0)]) > (label_ref (match_operand 2 "" "")) > @@ -394,6 +394,40 @@ (define_insn "*condjump" > (const_int 1)))] > ) > > +;; For a 24-bit immediate CST we can optimize the compare for equality > +;; and branch sequence from: > +;; mov x0, #imm1 > +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ > +;; cmp x1, x0 > +;; b<ne,eq> .Label This would be easier on the eyes if you were to indent the code sequence. +;; and branch sequence from: +;; mov x0, #imm1 +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ +;; cmp x1, x0 +;; b<ne,eq> .Label +;; into the shorter: +;; sub x0, #(CST & 0xfff000) > +;; into the shorter: > +;; sub x0, #(CST & 0xfff000) > +;; subs x0, #(CST & 0x000fff) These instructions are not valid (2 operand sub/subs?) can you write them out fully for this comment so I can see the data flow? > +;; b<ne,eq> .Label > +(define_insn_and_split "*compare_condjump<mode>" > + [(set (pc) (if_then_else (EQL > + (match_operand:GPI 0 "register_operand" "r") > + (match_operand:GPI 1 "aarch64_imm24" "n")) > + (label_ref:P (match_operand 2 "" "")) > + (pc)))] > + "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode) > + && !aarch64_plus_operand (operands[1], <MODE>mode)" > + "#" > + "&& true" > + [(const_int 0)] > + { > + HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff; > + HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000; > + rtx tmp = gen_reg_rtx (<MODE>mode); Can you guarantee we can always create this pseudo? What if we're a post-register-allocation split? > + emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm))); > + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); > + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); > + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); > + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2])); > + DONE; > + } > +) > + > (define_expand "casesi" > [(match_operand:SI 0 "register_operand" "") ; Index > (match_operand:SI 1 "const_int_operand" "") ; Lower bound > @@ -2898,7 +2932,7 @@ (define_expand "cstore<mode>4" > " > ) > > -(define_insn "*cstore<mode>_insn" > +(define_insn "aarch64_cstore<mode>" > [(set (match_operand:ALLI 0 "register_operand" "=r") > (match_operator:ALLI 1 "aarch64_comparison_operator" > [(match_operand 2 "cc_register" "") (const_int 0)]))] > @@ -2907,6 +2941,39 @@ (define_insn "*cstore<mode>_insn" > [(set_attr "type" "csel")] > ) > > +;; For a 24-bit immediate CST we can optimize the compare for equality > +;; and branch sequence from: > +;; mov x0, #imm1 > +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ > +;; cmp x1, x0 > +;; cset x2, <ne,eq> > +;; into the shorter: > +;; sub x0, #(CST & 0xfff000) > +;; subs x0, #(CST & 0x000fff) > +;; cset x1, <ne, eq>. Same comments as above regarding formatting and making this a valid set of instructions. > +(define_insn_and_split "*compare_cstore<mode>_insn" > + [(set (match_operand:GPI 0 "register_operand" "=r") > + (EQL:GPI (match_operand:GPI 1 "register_operand" "r") > + (match_operand:GPI 2 "aarch64_imm24" "n")))] > + "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode) > + && !aarch64_plus_operand (operands[2], <MODE>mode)" > + "#" > + "&& true" > + [(const_int 0)] > + { > + HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff; > + HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000; > + rtx tmp = gen_reg_rtx (<MODE>mode); > + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm))); > + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); > + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); > + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); > + emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg)); > + DONE; > + } > + [(set_attr "type" "csel")] > +) > + > ;; zero_extend version of the above > (define_insn "*cstoresi_insn_uxtw" > [(set (match_operand:DI 0 "register_operand" "=r") > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index c4a1c98..9f63ef2 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -801,7 +801,7 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (ge "2") (gt "2") > (ltu "1") (leu "1") (geu "2") (gtu "2")]) > > (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT") > - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")]) > + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") (gtu "GTU")]) > > (define_code_attr fix_trunc_optab [(fix "fix_trunc") > (unsigned_fix "fixuns_trunc")]) > diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md > index 046f852..1bcbf62 100644 > --- a/gcc/config/aarch64/predicates.md > +++ b/gcc/config/aarch64/predicates.md > @@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3" > (and (match_code "const_int") > (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4"))) > > +;; An immediate that fits into 24 bits. > +(define_predicate "aarch64_imm24" > + (and (match_code "const_int") > + (match_test "(UINTVAL (op) & 0xffffff) == UINTVAL (op)"))) > + IN_RANGE (UINTVAL (op), 0, 0xffffff) ? We use quite a few different ways to check an immediate fits in a particular range in the AArch64 backend, it would be good to pick just one idiomatic way. > (define_predicate "aarch64_pwr_imm3" > (and (match_code "const_int") > (match_test "INTVAL (op) != 0 > diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c > new file mode 100644 > index 0000000..d7a8d5b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c > @@ -0,0 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-options "-save-temps -O2" } */ > + > +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ > + This just tests that we don't emit cmp, it doesn't test anything else. > +void g (void); > +void > +foo (int x) > +{ > + if (x != 0x123456) > + g (); > +} > + > +void > +fool (long long x) > +{ > + if (x != 0x123456) > + g (); > +} > + > +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ > +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c > new file mode 100644 > index 0000000..619c026 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-options "-save-temps -O2" } */ > + > +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ Likewise, I don't see any checks for sub/subs. > + > +int > +foo (int x) > +{ > + return x == 0x123456; > +} > + > +long > +fool (long x) > +{ > + return x == 0x123456; > +} > + > +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ > +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ Thanks, James ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset 2015-11-12 12:05 ` James Greenhalgh @ 2015-11-23 10:36 ` Kyrill Tkachov 2015-11-23 14:59 ` James Greenhalgh 0 siblings, 1 reply; 7+ messages in thread From: Kyrill Tkachov @ 2015-11-23 10:36 UTC (permalink / raw) To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw [-- Attachment #1: Type: text/plain, Size: 10333 bytes --] On 12/11/15 12:05, James Greenhalgh wrote: > On Tue, Nov 03, 2015 at 03:43:24PM +0000, Kyrill Tkachov wrote: >> Hi all, >> >> Bootstrapped and tested on aarch64. >> >> Ok for trunk? > Comments in-line. > Here's an updated patch according to your comments. Sorry it took so long to respin it, had other things to deal with with stage1 closing... I've indented the sample code sequences and used valid mnemonics. These patterns can only match during combine, so I'd expect them to always split during combine or immediately after, but I don't think that's a documented guarantee so I've gated them on !reload_completed. I've used IN_RANGE in the predicate.md hunk and added scan-assembler checks in the tests. Is this ok? Thanks, Kyrill 2015-11-20 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * config/aarch64/aarch64.md (*condjump): Rename to... (condjump): ... This. (*compare_condjump<mode>): New define_insn_and_split. (*compare_cstore<mode>_insn): Likewise. (*cstore<mode>_insn): Rename to... (cstore<mode>_insn): ... This. * config/aarch64/iterators.md (CMP): Handle ne code. * config/aarch64/predicates.md (aarch64_imm24): New predicate. 2015-11-20 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * gcc.target/aarch64/cmpimm_branch_1.c: New test. * gcc.target/aarch64/cmpimm_cset_1.c: Likewise. >> Thanks, >> Kyrill >> >> >> 2015-11-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * config/aarch64/aarch64.md (*condjump): Rename to... >> (condjump): ... This. >> (*compare_condjump<mode>): New define_insn_and_split. >> (*compare_cstore<mode>_insn): Likewise. >> (*cstore<mode>_insn): Rename to... >> (aarch64_cstore<mode>): ... This. >> * config/aarch64/iterators.md (CMP): Handle ne code. >> * config/aarch64/predicates.md (aarch64_imm24): New predicate. >> >> 2015-11-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * gcc.target/aarch64/cmpimm_branch_1.c: New test. >> * gcc.target/aarch64/cmpimm_cset_1.c: Likewise. >> commit 7df013a391532f39932b80c902e3b4bbd841710f >> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> Date: Mon Sep 21 10:56:47 2015 +0100 >> >> [AArch64] Improve comparison with complex immediates >> >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md >> index 126c9c2..1bfc870 100644 >> --- a/gcc/config/aarch64/aarch64.md >> +++ b/gcc/config/aarch64/aarch64.md >> @@ -369,7 +369,7 @@ (define_expand "mod<mode>3" >> } >> ) >> >> -(define_insn "*condjump" >> +(define_insn "condjump" >> [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" >> [(match_operand 1 "cc_register" "") (const_int 0)]) >> (label_ref (match_operand 2 "" "")) >> @@ -394,6 +394,40 @@ (define_insn "*condjump" >> (const_int 1)))] >> ) >> >> +;; For a 24-bit immediate CST we can optimize the compare for equality >> +;; and branch sequence from: >> +;; mov x0, #imm1 >> +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ >> +;; cmp x1, x0 >> +;; b<ne,eq> .Label > This would be easier on the eyes if you were to indent the code sequence. > > +;; and branch sequence from: > +;; mov x0, #imm1 > +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ > +;; cmp x1, x0 > +;; b<ne,eq> .Label > +;; into the shorter: > +;; sub x0, #(CST & 0xfff000) > >> +;; into the shorter: >> +;; sub x0, #(CST & 0xfff000) >> +;; subs x0, #(CST & 0x000fff) > These instructions are not valid (2 operand sub/subs?) can you write them > out fully for this comment so I can see the data flow? > >> +;; b<ne,eq> .Label >> +(define_insn_and_split "*compare_condjump<mode>" >> + [(set (pc) (if_then_else (EQL >> + (match_operand:GPI 0 "register_operand" "r") >> + (match_operand:GPI 1 "aarch64_imm24" "n")) >> + (label_ref:P (match_operand 2 "" "")) >> + (pc)))] >> + "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode) >> + && !aarch64_plus_operand (operands[1], <MODE>mode)" >> + "#" >> + "&& true" >> + [(const_int 0)] >> + { >> + HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff; >> + HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000; >> + rtx tmp = gen_reg_rtx (<MODE>mode); > Can you guarantee we can always create this pseudo? What if we're a > post-register-allocation split? > >> + emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm))); >> + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); >> + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); >> + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); >> + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2])); >> + DONE; >> + } >> +) >> + >> (define_expand "casesi" >> [(match_operand:SI 0 "register_operand" "") ; Index >> (match_operand:SI 1 "const_int_operand" "") ; Lower bound >> @@ -2898,7 +2932,7 @@ (define_expand "cstore<mode>4" >> " >> ) >> >> -(define_insn "*cstore<mode>_insn" >> +(define_insn "aarch64_cstore<mode>" >> [(set (match_operand:ALLI 0 "register_operand" "=r") >> (match_operator:ALLI 1 "aarch64_comparison_operator" >> [(match_operand 2 "cc_register" "") (const_int 0)]))] >> @@ -2907,6 +2941,39 @@ (define_insn "*cstore<mode>_insn" >> [(set_attr "type" "csel")] >> ) >> >> +;; For a 24-bit immediate CST we can optimize the compare for equality >> +;; and branch sequence from: >> +;; mov x0, #imm1 >> +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ >> +;; cmp x1, x0 >> +;; cset x2, <ne,eq> >> +;; into the shorter: >> +;; sub x0, #(CST & 0xfff000) >> +;; subs x0, #(CST & 0x000fff) >> +;; cset x1, <ne, eq>. > Same comments as above regarding formatting and making this a valid set > of instructions. > >> +(define_insn_and_split "*compare_cstore<mode>_insn" >> + [(set (match_operand:GPI 0 "register_operand" "=r") >> + (EQL:GPI (match_operand:GPI 1 "register_operand" "r") >> + (match_operand:GPI 2 "aarch64_imm24" "n")))] >> + "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode) >> + && !aarch64_plus_operand (operands[2], <MODE>mode)" >> + "#" >> + "&& true" >> + [(const_int 0)] >> + { >> + HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff; >> + HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000; >> + rtx tmp = gen_reg_rtx (<MODE>mode); >> + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm))); >> + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); >> + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); >> + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); >> + emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg)); >> + DONE; >> + } >> + [(set_attr "type" "csel")] >> +) >> + >> ;; zero_extend version of the above >> (define_insn "*cstoresi_insn_uxtw" >> [(set (match_operand:DI 0 "register_operand" "=r") >> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md >> index c4a1c98..9f63ef2 100644 >> --- a/gcc/config/aarch64/iterators.md >> +++ b/gcc/config/aarch64/iterators.md >> @@ -801,7 +801,7 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (ge "2") (gt "2") >> (ltu "1") (leu "1") (geu "2") (gtu "2")]) >> >> (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT") >> - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")]) >> + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") (gtu "GTU")]) >> >> (define_code_attr fix_trunc_optab [(fix "fix_trunc") >> (unsigned_fix "fixuns_trunc")]) >> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md >> index 046f852..1bcbf62 100644 >> --- a/gcc/config/aarch64/predicates.md >> +++ b/gcc/config/aarch64/predicates.md >> @@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3" >> (and (match_code "const_int") >> (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4"))) >> >> +;; An immediate that fits into 24 bits. >> +(define_predicate "aarch64_imm24" >> + (and (match_code "const_int") >> + (match_test "(UINTVAL (op) & 0xffffff) == UINTVAL (op)"))) >> + > IN_RANGE (UINTVAL (op), 0, 0xffffff) ? > > We use quite a few different ways to check an immediate fits in a particular > range in the AArch64 backend, it would be good to pick just one idiomatic > way. > >> (define_predicate "aarch64_pwr_imm3" >> (and (match_code "const_int") >> (match_test "INTVAL (op) != 0 >> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c >> new file mode 100644 >> index 0000000..d7a8d5b >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c >> @@ -0,0 +1,22 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-save-temps -O2" } */ >> + >> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ >> + > This just tests that we don't emit cmp, it doesn't test anything else. > >> +void g (void); >> +void >> +foo (int x) >> +{ >> + if (x != 0x123456) >> + g (); >> +} >> + >> +void >> +fool (long long x) >> +{ >> + if (x != 0x123456) >> + g (); >> +} >> + >> +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ >> +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c >> new file mode 100644 >> index 0000000..619c026 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c >> @@ -0,0 +1,19 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-save-temps -O2" } */ >> + >> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ > Likewise, I don't see any checks for sub/subs. > >> + >> +int >> +foo (int x) >> +{ >> + return x == 0x123456; >> +} >> + >> +long >> +fool (long x) >> +{ >> + return x == 0x123456; >> +} >> + >> +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ >> +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ > Thanks, > James > [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: aarch64-cmp-imm.patch --] [-- Type: text/x-patch; name=aarch64-cmp-imm.patch, Size: 7225 bytes --] commit bb44feed4e6beaae25d9bdffa45073dc61c65838 Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> Date: Mon Sep 21 10:56:47 2015 +0100 [AArch64] Improve comparison with complex immediates diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 11f6387..3e57d08 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -372,7 +372,7 @@ (define_expand "mod<mode>3" } ) -(define_insn "*condjump" +(define_insn "condjump" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand 1 "cc_register" "") (const_int 0)]) (label_ref (match_operand 2 "" "")) @@ -397,6 +397,41 @@ (define_insn "*condjump" (const_int 1)))] ) +;; For a 24-bit immediate CST we can optimize the compare for equality +;; and branch sequence from: +;; mov x0, #imm1 +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ +;; cmp x1, x0 +;; b<ne,eq> .Label +;; into the shorter: +;; sub x0, x0, #(CST & 0xfff000) +;; subs x0, x0, #(CST & 0x000fff) +;; b<ne,eq> .Label +(define_insn_and_split "*compare_condjump<mode>" + [(set (pc) (if_then_else (EQL + (match_operand:GPI 0 "register_operand" "r") + (match_operand:GPI 1 "aarch64_imm24" "n")) + (label_ref:P (match_operand 2 "" "")) + (pc)))] + "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode) + && !aarch64_plus_operand (operands[1], <MODE>mode) + && !reload_completed" + "#" + "&& true" + [(const_int 0)] + { + HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff; + HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000; + rtx tmp = gen_reg_rtx (<MODE>mode); + emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm))); + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2])); + DONE; + } +) + (define_expand "casesi" [(match_operand:SI 0 "register_operand" "") ; Index (match_operand:SI 1 "const_int_operand" "") ; Lower bound @@ -2901,7 +2936,7 @@ (define_expand "cstore<mode>4" " ) -(define_insn "*cstore<mode>_insn" +(define_insn "aarch64_cstore<mode>" [(set (match_operand:ALLI 0 "register_operand" "=r") (match_operator:ALLI 1 "aarch64_comparison_operator" [(match_operand 2 "cc_register" "") (const_int 0)]))] @@ -2910,6 +2945,40 @@ (define_insn "*cstore<mode>_insn" [(set_attr "type" "csel")] ) +;; For a 24-bit immediate CST we can optimize the compare for equality +;; and branch sequence from: +;; mov x0, #imm1 +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ +;; cmp x1, x0 +;; cset x2, <ne,eq> +;; into the shorter: +;; sub x0, x0, #(CST & 0xfff000) +;; subs x0, x0, #(CST & 0x000fff) +;; cset x1, <ne, eq>. +(define_insn_and_split "*compare_cstore<mode>_insn" + [(set (match_operand:GPI 0 "register_operand" "=r") + (EQL:GPI (match_operand:GPI 1 "register_operand" "r") + (match_operand:GPI 2 "aarch64_imm24" "n")))] + "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode) + && !aarch64_plus_operand (operands[2], <MODE>mode) + && !reload_completed" + "#" + "&& true" + [(const_int 0)] + { + HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff; + HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000; + rtx tmp = gen_reg_rtx (<MODE>mode); + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm))); + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); + emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg)); + DONE; + } + [(set_attr "type" "csel")] +) + ;; zero_extend version of the above (define_insn "*cstoresi_insn_uxtw" [(set (match_operand:DI 0 "register_operand" "=r") diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index c2eb7de..422bc87 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -824,7 +824,8 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (ge "2") (gt "2") (ltu "1") (leu "1") (geu "2") (gtu "2")]) (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT") - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")]) + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") + (gtu "GTU")]) (define_code_attr fix_trunc_optab [(fix "fix_trunc") (unsigned_fix "fixuns_trunc")]) diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index e7f76e0..c0c3ff5 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3" (and (match_code "const_int") (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4"))) +;; An immediate that fits into 24 bits. +(define_predicate "aarch64_imm24" + (and (match_code "const_int") + (match_test "IN_RANGE (UINTVAL (op), 0, 0xffffff)"))) + (define_predicate "aarch64_pwr_imm3" (and (match_code "const_int") (match_test "INTVAL (op) != 0 diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c new file mode 100644 index 0000000..7ad736b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-save-temps -O2" } */ + +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ + +void g (void); +void +foo (int x) +{ + if (x != 0x123456) + g (); +} + +void +fool (long long x) +{ + if (x != 0x123456) + g (); +} + +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-times "sub\tw\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "sub\tx\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "subs\tw\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "subs\tx\[0-9\]+.*" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c new file mode 100644 index 0000000..6a03cc9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-save-temps -O2" } */ + +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ + +int +foo (int x) +{ + return x == 0x123456; +} + +long +fool (long x) +{ + return x == 0x123456; +} + +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-times "sub\tw\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "sub\tx\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "subs\tw\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "subs\tx\[0-9\]+.*" 1 } } */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset 2015-11-23 10:36 ` Kyrill Tkachov @ 2015-11-23 14:59 ` James Greenhalgh 2015-11-23 15:06 ` Kyrill Tkachov 0 siblings, 1 reply; 7+ messages in thread From: James Greenhalgh @ 2015-11-23 14:59 UTC (permalink / raw) To: Kyrill Tkachov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw On Mon, Nov 23, 2015 at 10:33:01AM +0000, Kyrill Tkachov wrote: > > On 12/11/15 12:05, James Greenhalgh wrote: > >On Tue, Nov 03, 2015 at 03:43:24PM +0000, Kyrill Tkachov wrote: > >>Hi all, > >> > >>Bootstrapped and tested on aarch64. > >> > >>Ok for trunk? > >Comments in-line. > > > > Here's an updated patch according to your comments. > Sorry it took so long to respin it, had other things to deal with with > stage1 closing... > > I've indented the sample code sequences and used valid mnemonics. > These patterns can only match during combine, so I'd expect them to always > split during combine or immediately after, but I don't think that's a documented > guarantee so I've gated them on !reload_completed. > > I've used IN_RANGE in the predicate.md hunk and added scan-assembler checks > in the tests. > > Is this ok? > > Thanks, > Kyrill > > 2015-11-20 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * config/aarch64/aarch64.md (*condjump): Rename to... > (condjump): ... This. > (*compare_condjump<mode>): New define_insn_and_split. > (*compare_cstore<mode>_insn): Likewise. > (*cstore<mode>_insn): Rename to... > (cstore<mode>_insn): ... This. > * config/aarch64/iterators.md (CMP): Handle ne code. > * config/aarch64/predicates.md (aarch64_imm24): New predicate. > > 2015-11-20 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * gcc.target/aarch64/cmpimm_branch_1.c: New test. > * gcc.target/aarch64/cmpimm_cset_1.c: Likewise. > > commit bb44feed4e6beaae25d9bdffa45073dc61c65838 > Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> > Date: Mon Sep 21 10:56:47 2015 +0100 > > [AArch64] Improve comparison with complex immediates > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 11f6387..3e57d08 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -372,7 +372,7 @@ (define_expand "mod<mode>3" > } > ) > > -(define_insn "*condjump" > +(define_insn "condjump" > [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" > [(match_operand 1 "cc_register" "") (const_int 0)]) > (label_ref (match_operand 2 "" "")) > @@ -397,6 +397,41 @@ (define_insn "*condjump" > (const_int 1)))] > ) > > +;; For a 24-bit immediate CST we can optimize the compare for equality > +;; and branch sequence from: > +;; mov x0, #imm1 > +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ > +;; cmp x1, x0 > +;; b<ne,eq> .Label > +;; into the shorter: > +;; sub x0, x0, #(CST & 0xfff000) > +;; subs x0, x0, #(CST & 0x000fff) sub x0, x1, #(CST....) ? The transform doesn't make sense otherwise. > +;; b<ne,eq> .Label > +(define_insn_and_split "*compare_condjump<mode>" > + [(set (pc) (if_then_else (EQL > + (match_operand:GPI 0 "register_operand" "r") > + (match_operand:GPI 1 "aarch64_imm24" "n")) > + (label_ref:P (match_operand 2 "" "")) > + (pc)))] > + "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode) > + && !aarch64_plus_operand (operands[1], <MODE>mode) > + && !reload_completed" > + "#" > + "&& true" > + [(const_int 0)] > + { > + HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff; > + HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000; > + rtx tmp = gen_reg_rtx (<MODE>mode); > + emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm))); > + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); > + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); > + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); > + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2])); > + DONE; > + } > +) > + > (define_expand "casesi" > [(match_operand:SI 0 "register_operand" "") ; Index > (match_operand:SI 1 "const_int_operand" "") ; Lower bound > @@ -2901,7 +2936,7 @@ (define_expand "cstore<mode>4" > " > ) > > -(define_insn "*cstore<mode>_insn" > +(define_insn "aarch64_cstore<mode>" > [(set (match_operand:ALLI 0 "register_operand" "=r") > (match_operator:ALLI 1 "aarch64_comparison_operator" > [(match_operand 2 "cc_register" "") (const_int 0)]))] > @@ -2910,6 +2945,40 @@ (define_insn "*cstore<mode>_insn" > [(set_attr "type" "csel")] > ) > > +;; For a 24-bit immediate CST we can optimize the compare for equality > +;; and branch sequence from: > +;; mov x0, #imm1 > +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ > +;; cmp x1, x0 > +;; cset x2, <ne,eq> > +;; into the shorter: > +;; sub x0, x0, #(CST & 0xfff000) > +;; subs x0, x0, #(CST & 0x000fff) > +;; cset x1, <ne, eq>. Please fix the register allocation in your shorter sequence, these are not equivalent. > +(define_insn_and_split "*compare_cstore<mode>_insn" > + [(set (match_operand:GPI 0 "register_operand" "=r") > + (EQL:GPI (match_operand:GPI 1 "register_operand" "r") > + (match_operand:GPI 2 "aarch64_imm24" "n")))] > + "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode) > + && !aarch64_plus_operand (operands[2], <MODE>mode) > + && !reload_completed" > + "#" > + "&& true" > + [(const_int 0)] > + { > + HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff; > + HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000; > + rtx tmp = gen_reg_rtx (<MODE>mode); > + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm))); > + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); > + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); > + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); > + emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg)); > + DONE; > + } > + [(set_attr "type" "csel")] > +) > + > ;; zero_extend version of the above > (define_insn "*cstoresi_insn_uxtw" > [(set (match_operand:DI 0 "register_operand" "=r") > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index c2eb7de..422bc87 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -824,7 +824,8 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (ge "2") (gt "2") > (ltu "1") (leu "1") (geu "2") (gtu "2")]) > > (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT") > - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")]) > + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") > + (gtu "GTU")]) > > (define_code_attr fix_trunc_optab [(fix "fix_trunc") > (unsigned_fix "fixuns_trunc")]) > diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md > index e7f76e0..c0c3ff5 100644 > --- a/gcc/config/aarch64/predicates.md > +++ b/gcc/config/aarch64/predicates.md > @@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3" > (and (match_code "const_int") > (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4"))) > > +;; An immediate that fits into 24 bits. > +(define_predicate "aarch64_imm24" > + (and (match_code "const_int") > + (match_test "IN_RANGE (UINTVAL (op), 0, 0xffffff)"))) > + > (define_predicate "aarch64_pwr_imm3" > (and (match_code "const_int") > (match_test "INTVAL (op) != 0 > diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c > new file mode 100644 > index 0000000..7ad736b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c > @@ -0,0 +1,26 @@ > +/* { dg-do compile } */ > +/* { dg-options "-save-temps -O2" } */ > + > +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ > + > +void g (void); > +void > +foo (int x) > +{ > + if (x != 0x123456) > + g (); > +} > + > +void > +fool (long long x) > +{ > + if (x != 0x123456) > + g (); > +} > + > +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ > +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ > +/* { dg-final { scan-assembler-times "sub\tw\[0-9\]+.*" 1 } } */ > +/* { dg-final { scan-assembler-times "sub\tx\[0-9\]+.*" 1 } } */ > +/* { dg-final { scan-assembler-times "subs\tw\[0-9\]+.*" 1 } } */ > +/* { dg-final { scan-assembler-times "subs\tx\[0-9\]+.*" 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c > new file mode 100644 > index 0000000..6a03cc9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-options "-save-temps -O2" } */ > + > +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ > + > +int > +foo (int x) > +{ > + return x == 0x123456; > +} > + > +long > +fool (long x) > +{ > + return x == 0x123456; > +} > + This test will be broken for ILP32. This should be long long. OK with those comments fixed. Thanks, James ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset 2015-11-23 14:59 ` James Greenhalgh @ 2015-11-23 15:06 ` Kyrill Tkachov 2015-11-24 9:44 ` Kyrill Tkachov 0 siblings, 1 reply; 7+ messages in thread From: Kyrill Tkachov @ 2015-11-23 15:06 UTC (permalink / raw) To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw On 23/11/15 14:58, James Greenhalgh wrote: > On Mon, Nov 23, 2015 at 10:33:01AM +0000, Kyrill Tkachov wrote: >> On 12/11/15 12:05, James Greenhalgh wrote: >>> On Tue, Nov 03, 2015 at 03:43:24PM +0000, Kyrill Tkachov wrote: >>>> Hi all, >>>> >>>> Bootstrapped and tested on aarch64. >>>> >>>> Ok for trunk? >>> Comments in-line. >>> >> Here's an updated patch according to your comments. >> Sorry it took so long to respin it, had other things to deal with with >> stage1 closing... >> >> I've indented the sample code sequences and used valid mnemonics. >> These patterns can only match during combine, so I'd expect them to always >> split during combine or immediately after, but I don't think that's a documented >> guarantee so I've gated them on !reload_completed. >> >> I've used IN_RANGE in the predicate.md hunk and added scan-assembler checks >> in the tests. >> >> Is this ok? >> >> Thanks, >> Kyrill >> >> 2015-11-20 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * config/aarch64/aarch64.md (*condjump): Rename to... >> (condjump): ... This. >> (*compare_condjump<mode>): New define_insn_and_split. >> (*compare_cstore<mode>_insn): Likewise. >> (*cstore<mode>_insn): Rename to... >> (cstore<mode>_insn): ... This. >> * config/aarch64/iterators.md (CMP): Handle ne code. >> * config/aarch64/predicates.md (aarch64_imm24): New predicate. >> >> 2015-11-20 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * gcc.target/aarch64/cmpimm_branch_1.c: New test. >> * gcc.target/aarch64/cmpimm_cset_1.c: Likewise. >> >> commit bb44feed4e6beaae25d9bdffa45073dc61c65838 >> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> Date: Mon Sep 21 10:56:47 2015 +0100 >> >> [AArch64] Improve comparison with complex immediates >> >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md >> index 11f6387..3e57d08 100644 >> --- a/gcc/config/aarch64/aarch64.md >> +++ b/gcc/config/aarch64/aarch64.md >> @@ -372,7 +372,7 @@ (define_expand "mod<mode>3" >> } >> ) >> >> -(define_insn "*condjump" >> +(define_insn "condjump" >> [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" >> [(match_operand 1 "cc_register" "") (const_int 0)]) >> (label_ref (match_operand 2 "" "")) >> @@ -397,6 +397,41 @@ (define_insn "*condjump" >> (const_int 1)))] >> ) >> >> +;; For a 24-bit immediate CST we can optimize the compare for equality >> +;; and branch sequence from: >> +;; mov x0, #imm1 >> +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ >> +;; cmp x1, x0 >> +;; b<ne,eq> .Label >> +;; into the shorter: >> +;; sub x0, x0, #(CST & 0xfff000) >> +;; subs x0, x0, #(CST & 0x000fff) > sub x0, x1, #(CST....) ? > > The transform doesn't make sense otherwise. Doh, yes. The source should be x1 of course. Kyrill > >> +;; b<ne,eq> .Label >> +(define_insn_and_split "*compare_condjump<mode>" >> + [(set (pc) (if_then_else (EQL >> + (match_operand:GPI 0 "register_operand" "r") >> + (match_operand:GPI 1 "aarch64_imm24" "n")) >> + (label_ref:P (match_operand 2 "" "")) >> + (pc)))] >> + "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode) >> + && !aarch64_plus_operand (operands[1], <MODE>mode) >> + && !reload_completed" >> + "#" >> + "&& true" >> + [(const_int 0)] >> + { >> + HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff; >> + HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000; >> + rtx tmp = gen_reg_rtx (<MODE>mode); >> + emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm))); >> + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); >> + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); >> + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); >> + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2])); >> + DONE; >> + } >> +) >> + >> (define_expand "casesi" >> [(match_operand:SI 0 "register_operand" "") ; Index >> (match_operand:SI 1 "const_int_operand" "") ; Lower bound >> @@ -2901,7 +2936,7 @@ (define_expand "cstore<mode>4" >> " >> ) >> >> -(define_insn "*cstore<mode>_insn" >> +(define_insn "aarch64_cstore<mode>" >> [(set (match_operand:ALLI 0 "register_operand" "=r") >> (match_operator:ALLI 1 "aarch64_comparison_operator" >> [(match_operand 2 "cc_register" "") (const_int 0)]))] >> @@ -2910,6 +2945,40 @@ (define_insn "*cstore<mode>_insn" >> [(set_attr "type" "csel")] >> ) >> >> +;; For a 24-bit immediate CST we can optimize the compare for equality >> +;; and branch sequence from: >> +;; mov x0, #imm1 >> +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ >> +;; cmp x1, x0 >> +;; cset x2, <ne,eq> >> +;; into the shorter: >> +;; sub x0, x0, #(CST & 0xfff000) >> +;; subs x0, x0, #(CST & 0x000fff) >> +;; cset x1, <ne, eq>. > Please fix the register allocation in your shorter sequence, these > are not equivalent. > >> +(define_insn_and_split "*compare_cstore<mode>_insn" >> + [(set (match_operand:GPI 0 "register_operand" "=r") >> + (EQL:GPI (match_operand:GPI 1 "register_operand" "r") >> + (match_operand:GPI 2 "aarch64_imm24" "n")))] >> + "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode) >> + && !aarch64_plus_operand (operands[2], <MODE>mode) >> + && !reload_completed" >> + "#" >> + "&& true" >> + [(const_int 0)] >> + { >> + HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff; >> + HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000; >> + rtx tmp = gen_reg_rtx (<MODE>mode); >> + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm))); >> + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); >> + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); >> + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); >> + emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg)); >> + DONE; >> + } >> + [(set_attr "type" "csel")] >> +) >> + >> ;; zero_extend version of the above >> (define_insn "*cstoresi_insn_uxtw" >> [(set (match_operand:DI 0 "register_operand" "=r") >> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md >> index c2eb7de..422bc87 100644 >> --- a/gcc/config/aarch64/iterators.md >> +++ b/gcc/config/aarch64/iterators.md >> @@ -824,7 +824,8 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (ge "2") (gt "2") >> (ltu "1") (leu "1") (geu "2") (gtu "2")]) >> >> (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT") >> - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")]) >> + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") >> + (gtu "GTU")]) >> >> (define_code_attr fix_trunc_optab [(fix "fix_trunc") >> (unsigned_fix "fixuns_trunc")]) >> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md >> index e7f76e0..c0c3ff5 100644 >> --- a/gcc/config/aarch64/predicates.md >> +++ b/gcc/config/aarch64/predicates.md >> @@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3" >> (and (match_code "const_int") >> (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4"))) >> >> +;; An immediate that fits into 24 bits. >> +(define_predicate "aarch64_imm24" >> + (and (match_code "const_int") >> + (match_test "IN_RANGE (UINTVAL (op), 0, 0xffffff)"))) >> + >> (define_predicate "aarch64_pwr_imm3" >> (and (match_code "const_int") >> (match_test "INTVAL (op) != 0 >> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c >> new file mode 100644 >> index 0000000..7ad736b >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c >> @@ -0,0 +1,26 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-save-temps -O2" } */ >> + >> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ >> + >> +void g (void); >> +void >> +foo (int x) >> +{ >> + if (x != 0x123456) >> + g (); >> +} >> + >> +void >> +fool (long long x) >> +{ >> + if (x != 0x123456) >> + g (); >> +} >> + >> +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ >> +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ >> +/* { dg-final { scan-assembler-times "sub\tw\[0-9\]+.*" 1 } } */ >> +/* { dg-final { scan-assembler-times "sub\tx\[0-9\]+.*" 1 } } */ >> +/* { dg-final { scan-assembler-times "subs\tw\[0-9\]+.*" 1 } } */ >> +/* { dg-final { scan-assembler-times "subs\tx\[0-9\]+.*" 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c >> new file mode 100644 >> index 0000000..6a03cc9 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c >> @@ -0,0 +1,23 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-save-temps -O2" } */ >> + >> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ >> + >> +int >> +foo (int x) >> +{ >> + return x == 0x123456; >> +} >> + >> +long >> +fool (long x) >> +{ >> + return x == 0x123456; >> +} >> + > This test will be broken for ILP32. This should be long long. > > OK with those comments fixed. Thanks, I'll prepare an updated version. Kyrill > Thanks, > James > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset 2015-11-23 15:06 ` Kyrill Tkachov @ 2015-11-24 9:44 ` Kyrill Tkachov 0 siblings, 0 replies; 7+ messages in thread From: Kyrill Tkachov @ 2015-11-24 9:44 UTC (permalink / raw) To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw [-- Attachment #1: Type: text/plain, Size: 10649 bytes --] On 23/11/15 15:01, Kyrill Tkachov wrote: > > On 23/11/15 14:58, James Greenhalgh wrote: >> On Mon, Nov 23, 2015 at 10:33:01AM +0000, Kyrill Tkachov wrote: >>> On 12/11/15 12:05, James Greenhalgh wrote: >>>> On Tue, Nov 03, 2015 at 03:43:24PM +0000, Kyrill Tkachov wrote: >>>>> Hi all, >>>>> >>>>> Bootstrapped and tested on aarch64. >>>>> >>>>> Ok for trunk? >>>> Comments in-line. >>>> >>> Here's an updated patch according to your comments. >>> Sorry it took so long to respin it, had other things to deal with with >>> stage1 closing... >>> >>> I've indented the sample code sequences and used valid mnemonics. >>> These patterns can only match during combine, so I'd expect them to always >>> split during combine or immediately after, but I don't think that's a documented >>> guarantee so I've gated them on !reload_completed. >>> >>> I've used IN_RANGE in the predicate.md hunk and added scan-assembler checks >>> in the tests. >>> >>> Is this ok? >>> >>> Thanks, >>> Kyrill >>> >>> 2015-11-20 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >>> >>> * config/aarch64/aarch64.md (*condjump): Rename to... >>> (condjump): ... This. >>> (*compare_condjump<mode>): New define_insn_and_split. >>> (*compare_cstore<mode>_insn): Likewise. >>> (*cstore<mode>_insn): Rename to... >>> (cstore<mode>_insn): ... This. >>> * config/aarch64/iterators.md (CMP): Handle ne code. >>> * config/aarch64/predicates.md (aarch64_imm24): New predicate. >>> >>> 2015-11-20 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >>> >>> * gcc.target/aarch64/cmpimm_branch_1.c: New test. >>> * gcc.target/aarch64/cmpimm_cset_1.c: Likewise. >>> >>> commit bb44feed4e6beaae25d9bdffa45073dc61c65838 >>> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> >>> Date: Mon Sep 21 10:56:47 2015 +0100 >>> >>> [AArch64] Improve comparison with complex immediates >>> >>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md >>> index 11f6387..3e57d08 100644 >>> --- a/gcc/config/aarch64/aarch64.md >>> +++ b/gcc/config/aarch64/aarch64.md >>> @@ -372,7 +372,7 @@ (define_expand "mod<mode>3" >>> } >>> ) >>> -(define_insn "*condjump" >>> +(define_insn "condjump" >>> [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" >>> [(match_operand 1 "cc_register" "") (const_int 0)]) >>> (label_ref (match_operand 2 "" "")) >>> @@ -397,6 +397,41 @@ (define_insn "*condjump" >>> (const_int 1)))] >>> ) >>> +;; For a 24-bit immediate CST we can optimize the compare for equality >>> +;; and branch sequence from: >>> +;; mov x0, #imm1 >>> +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ >>> +;; cmp x1, x0 >>> +;; b<ne,eq> .Label >>> +;; into the shorter: >>> +;; sub x0, x0, #(CST & 0xfff000) >>> +;; subs x0, x0, #(CST & 0x000fff) >> sub x0, x1, #(CST....) ? >> >> The transform doesn't make sense otherwise. > > Doh, yes. The source should be x1 of course. > Here's what I'll be committing. Thanks, Kyrill 2015-11-24 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * config/aarch64/aarch64.md (*condjump): Rename to... (condjump): ... This. (*compare_condjump<mode>): New define_insn_and_split. (*compare_cstore<mode>_insn): Likewise. (*cstore<mode>_insn): Rename to... (cstore<mode>_insn): ... This. * config/aarch64/iterators.md (CMP): Handle ne code. * config/aarch64/predicates.md (aarch64_imm24): New predicate. 2015-11-24 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * gcc.target/aarch64/cmpimm_branch_1.c: New test. * gcc.target/aarch64/cmpimm_cset_1.c: Likewise. > Kyrill > >> >>> +;; b<ne,eq> .Label >>> +(define_insn_and_split "*compare_condjump<mode>" >>> + [(set (pc) (if_then_else (EQL >>> + (match_operand:GPI 0 "register_operand" "r") >>> + (match_operand:GPI 1 "aarch64_imm24" "n")) >>> + (label_ref:P (match_operand 2 "" "")) >>> + (pc)))] >>> + "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode) >>> + && !aarch64_plus_operand (operands[1], <MODE>mode) >>> + && !reload_completed" >>> + "#" >>> + "&& true" >>> + [(const_int 0)] >>> + { >>> + HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff; >>> + HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000; >>> + rtx tmp = gen_reg_rtx (<MODE>mode); >>> + emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm))); >>> + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); >>> + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); >>> + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); >>> + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2])); >>> + DONE; >>> + } >>> +) >>> + >>> (define_expand "casesi" >>> [(match_operand:SI 0 "register_operand" "") ; Index >>> (match_operand:SI 1 "const_int_operand" "") ; Lower bound >>> @@ -2901,7 +2936,7 @@ (define_expand "cstore<mode>4" >>> " >>> ) >>> -(define_insn "*cstore<mode>_insn" >>> +(define_insn "aarch64_cstore<mode>" >>> [(set (match_operand:ALLI 0 "register_operand" "=r") >>> (match_operator:ALLI 1 "aarch64_comparison_operator" >>> [(match_operand 2 "cc_register" "") (const_int 0)]))] >>> @@ -2910,6 +2945,40 @@ (define_insn "*cstore<mode>_insn" >>> [(set_attr "type" "csel")] >>> ) >>> +;; For a 24-bit immediate CST we can optimize the compare for equality >>> +;; and branch sequence from: >>> +;; mov x0, #imm1 >>> +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ >>> +;; cmp x1, x0 >>> +;; cset x2, <ne,eq> >>> +;; into the shorter: >>> +;; sub x0, x0, #(CST & 0xfff000) >>> +;; subs x0, x0, #(CST & 0x000fff) >>> +;; cset x1, <ne, eq>. >> Please fix the register allocation in your shorter sequence, these >> are not equivalent. >> >>> +(define_insn_and_split "*compare_cstore<mode>_insn" >>> + [(set (match_operand:GPI 0 "register_operand" "=r") >>> + (EQL:GPI (match_operand:GPI 1 "register_operand" "r") >>> + (match_operand:GPI 2 "aarch64_imm24" "n")))] >>> + "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode) >>> + && !aarch64_plus_operand (operands[2], <MODE>mode) >>> + && !reload_completed" >>> + "#" >>> + "&& true" >>> + [(const_int 0)] >>> + { >>> + HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff; >>> + HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000; >>> + rtx tmp = gen_reg_rtx (<MODE>mode); >>> + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm))); >>> + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); >>> + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); >>> + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); >>> + emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg)); >>> + DONE; >>> + } >>> + [(set_attr "type" "csel")] >>> +) >>> + >>> ;; zero_extend version of the above >>> (define_insn "*cstoresi_insn_uxtw" >>> [(set (match_operand:DI 0 "register_operand" "=r") >>> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md >>> index c2eb7de..422bc87 100644 >>> --- a/gcc/config/aarch64/iterators.md >>> +++ b/gcc/config/aarch64/iterators.md >>> @@ -824,7 +824,8 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (ge "2") (gt "2") >>> (ltu "1") (leu "1") (geu "2") (gtu "2")]) >>> (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT") >>> - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")]) >>> + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") >>> + (gtu "GTU")]) >>> (define_code_attr fix_trunc_optab [(fix "fix_trunc") >>> (unsigned_fix "fixuns_trunc")]) >>> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md >>> index e7f76e0..c0c3ff5 100644 >>> --- a/gcc/config/aarch64/predicates.md >>> +++ b/gcc/config/aarch64/predicates.md >>> @@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3" >>> (and (match_code "const_int") >>> (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4"))) >>> +;; An immediate that fits into 24 bits. >>> +(define_predicate "aarch64_imm24" >>> + (and (match_code "const_int") >>> + (match_test "IN_RANGE (UINTVAL (op), 0, 0xffffff)"))) >>> + >>> (define_predicate "aarch64_pwr_imm3" >>> (and (match_code "const_int") >>> (match_test "INTVAL (op) != 0 >>> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c >>> new file mode 100644 >>> index 0000000..7ad736b >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c >>> @@ -0,0 +1,26 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-save-temps -O2" } */ >>> + >>> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ >>> + >>> +void g (void); >>> +void >>> +foo (int x) >>> +{ >>> + if (x != 0x123456) >>> + g (); >>> +} >>> + >>> +void >>> +fool (long long x) >>> +{ >>> + if (x != 0x123456) >>> + g (); >>> +} >>> + >>> +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ >>> +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ >>> +/* { dg-final { scan-assembler-times "sub\tw\[0-9\]+.*" 1 } } */ >>> +/* { dg-final { scan-assembler-times "sub\tx\[0-9\]+.*" 1 } } */ >>> +/* { dg-final { scan-assembler-times "subs\tw\[0-9\]+.*" 1 } } */ >>> +/* { dg-final { scan-assembler-times "subs\tx\[0-9\]+.*" 1 } } */ >>> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c >>> new file mode 100644 >>> index 0000000..6a03cc9 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c >>> @@ -0,0 +1,23 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-save-temps -O2" } */ >>> + >>> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ >>> + >>> +int >>> +foo (int x) >>> +{ >>> + return x == 0x123456; >>> +} >>> + >>> +long >>> +fool (long x) >>> +{ >>> + return x == 0x123456; >>> +} >>> + >> This test will be broken for ILP32. This should be long long. >> >> OK with those comments fixed. > > Thanks, I'll prepare an updated version. > > Kyrill > >> Thanks, >> James >> > [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: aarch64-compare-imm.patch --] [-- Type: text/x-patch; name=aarch64-compare-imm.patch, Size: 7235 bytes --] commit 30cc3774824ba7fd372111a223ade075ad7c49cc Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> Date: Mon Sep 21 10:56:47 2015 +0100 [AArch64] Improve comparison with complex immediates diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 11f6387..3283cb2 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -372,7 +372,7 @@ (define_expand "mod<mode>3" } ) -(define_insn "*condjump" +(define_insn "condjump" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand 1 "cc_register" "") (const_int 0)]) (label_ref (match_operand 2 "" "")) @@ -397,6 +397,41 @@ (define_insn "*condjump" (const_int 1)))] ) +;; For a 24-bit immediate CST we can optimize the compare for equality +;; and branch sequence from: +;; mov x0, #imm1 +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ +;; cmp x1, x0 +;; b<ne,eq> .Label +;; into the shorter: +;; sub x0, x1, #(CST & 0xfff000) +;; subs x0, x0, #(CST & 0x000fff) +;; b<ne,eq> .Label +(define_insn_and_split "*compare_condjump<mode>" + [(set (pc) (if_then_else (EQL + (match_operand:GPI 0 "register_operand" "r") + (match_operand:GPI 1 "aarch64_imm24" "n")) + (label_ref:P (match_operand 2 "" "")) + (pc)))] + "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode) + && !aarch64_plus_operand (operands[1], <MODE>mode) + && !reload_completed" + "#" + "&& true" + [(const_int 0)] + { + HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff; + HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000; + rtx tmp = gen_reg_rtx (<MODE>mode); + emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm))); + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2])); + DONE; + } +) + (define_expand "casesi" [(match_operand:SI 0 "register_operand" "") ; Index (match_operand:SI 1 "const_int_operand" "") ; Lower bound @@ -2901,7 +2936,7 @@ (define_expand "cstore<mode>4" " ) -(define_insn "*cstore<mode>_insn" +(define_insn "aarch64_cstore<mode>" [(set (match_operand:ALLI 0 "register_operand" "=r") (match_operator:ALLI 1 "aarch64_comparison_operator" [(match_operand 2 "cc_register" "") (const_int 0)]))] @@ -2910,6 +2945,40 @@ (define_insn "*cstore<mode>_insn" [(set_attr "type" "csel")] ) +;; For a 24-bit immediate CST we can optimize the compare for equality +;; and branch sequence from: +;; mov x0, #imm1 +;; movk x0, #imm2, lsl 16 /* x0 contains CST. */ +;; cmp x1, x0 +;; cset x2, <ne,eq> +;; into the shorter: +;; sub x0, x1, #(CST & 0xfff000) +;; subs x0, x0, #(CST & 0x000fff) +;; cset x2, <ne, eq>. +(define_insn_and_split "*compare_cstore<mode>_insn" + [(set (match_operand:GPI 0 "register_operand" "=r") + (EQL:GPI (match_operand:GPI 1 "register_operand" "r") + (match_operand:GPI 2 "aarch64_imm24" "n")))] + "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode) + && !aarch64_plus_operand (operands[2], <MODE>mode) + && !reload_completed" + "#" + "&& true" + [(const_int 0)] + { + HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff; + HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000; + rtx tmp = gen_reg_rtx (<MODE>mode); + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm))); + emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); + rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx); + emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg)); + DONE; + } + [(set_attr "type" "csel")] +) + ;; zero_extend version of the above (define_insn "*cstoresi_insn_uxtw" [(set (match_operand:DI 0 "register_operand" "=r") diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index c2eb7de..422bc87 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -824,7 +824,8 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (ge "2") (gt "2") (ltu "1") (leu "1") (geu "2") (gtu "2")]) (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT") - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")]) + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") + (gtu "GTU")]) (define_code_attr fix_trunc_optab [(fix "fix_trunc") (unsigned_fix "fixuns_trunc")]) diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index e7f76e0..c0c3ff5 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3" (and (match_code "const_int") (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4"))) +;; An immediate that fits into 24 bits. +(define_predicate "aarch64_imm24" + (and (match_code "const_int") + (match_test "IN_RANGE (UINTVAL (op), 0, 0xffffff)"))) + (define_predicate "aarch64_pwr_imm3" (and (match_code "const_int") (match_test "INTVAL (op) != 0 diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c new file mode 100644 index 0000000..7ad736b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-save-temps -O2" } */ + +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ + +void g (void); +void +foo (int x) +{ + if (x != 0x123456) + g (); +} + +void +fool (long long x) +{ + if (x != 0x123456) + g (); +} + +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-times "sub\tw\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "sub\tx\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "subs\tw\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "subs\tx\[0-9\]+.*" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c new file mode 100644 index 0000000..f6fd69f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-save-temps -O2" } */ + +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ + +int +foo (int x) +{ + return x == 0x123456; +} + +long long +fool (long long x) +{ + return x == 0x123456; +} + +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-times "sub\tw\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "sub\tx\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "subs\tw\[0-9\]+.*" 1 } } */ +/* { dg-final { scan-assembler-times "subs\tx\[0-9\]+.*" 1 } } */ ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-11-24 9:42 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-11-03 15:43 [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset Kyrill Tkachov 2015-11-11 10:43 ` Kyrill Tkachov 2015-11-12 12:05 ` James Greenhalgh 2015-11-23 10:36 ` Kyrill Tkachov 2015-11-23 14:59 ` James Greenhalgh 2015-11-23 15:06 ` Kyrill Tkachov 2015-11-24 9:44 ` Kyrill Tkachov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).