From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 102145 invoked by alias); 8 Sep 2015 08:33:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 102089 invoked by uid 89); 8 Sep 2015 08:33:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_50,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 08 Sep 2015 08:33:46 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-29-grNxvAi8TwCQ1tSXLnNPFg-1; Tue, 08 Sep 2015 09:33:42 +0100 Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 8 Sep 2015 09:33:41 +0100 Message-ID: <55EE9D65.3040800@arm.com> Date: Tue, 08 Sep 2015 08:35:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Ramana Radhakrishnan , GCC Patches CC: Richard Earnshaw , Marcus Shawcroft , James Greenhalgh Subject: Re: [PATCH][ARM][3/3] Expand mod by power of 2 References: <55B219AE.6010102@arm.com> <55ED5864.9080607@foss.arm.com> In-Reply-To: <55ED5864.9080607@foss.arm.com> X-MC-Unique: grNxvAi8TwCQ1tSXLnNPFg-1 Content-Type: multipart/mixed; boundary="------------060301090703080707010704" X-IsSubscribed: yes X-SW-Source: 2015-09/txt/msg00475.txt.bz2 This is a multi-part message in MIME format. --------------060301090703080707010704 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-length: 8687 On 07/09/15 10:27, Ramana Radhakrishnan wrote: > > On 24/07/15 11:55, Kyrill Tkachov wrote: >> commit d562629e36ba013b8f77956a74139330d191bc30 >> Author: Kyrylo Tkachov >> Date: Fri Jul 17 16:30:01 2015 +0100 >> >> [ARM][3/3] Expand mod by power of 2 >> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index e1bc727..6ade07c 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -9556,6 +9556,22 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enu= m rtx_code outer_code, >>=20=20=20 >> case MOD: >> case UMOD: >> + /* MOD by a power of 2 can be expanded as: >> + rsbs r1, r0, #0 >> + and r0, r0, #(n - 1) >> + and r1, r1, #(n - 1) >> + rsbpl r0, r1, #0. */ >> + if (code =3D=3D MOD >> + && CONST_INT_P (XEXP (x, 1)) >> + && exact_log2 (INTVAL (XEXP (x, 1))) > 0 >> + && mode =3D=3D SImode) >> + { >> + *cost +=3D COSTS_N_INSNS (3) >> + + 2 * extra_cost->alu.logical >> + + extra_cost->alu.arith; >> + return true; >> + } >> + >> *cost =3D LIBCALL_COST (2); >> return false; /* All arguments must be in registers. */ >>=20=20=20 >> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >> index f341109..8301648 100644 >> --- a/gcc/config/arm/arm.md >> +++ b/gcc/config/arm/arm.md >> @@ -1229,7 +1229,7 @@ (define_peephole2 >> "" >> ) >>=20=20=20 >> -(define_insn "*subsi3_compare0" >> +(define_insn "subsi3_compare0" >> [(set (reg:CC_NOOV CC_REGNUM) >> (compare:CC_NOOV >> (minus:SI (match_operand:SI 1 "arm_rhs_operand" "r,r,I") >> @@ -2158,7 +2158,7 @@ (define_expand "andsi3" >> ) >>=20=20=20 >> ; ??? Check split length for Thumb-2 >> -(define_insn_and_split "*arm_andsi3_insn" >> +(define_insn_and_split "arm_andsi3_insn" >> [(set (match_operand:SI 0 "s_register_operand" "=3Dr,l,r,r,r= ") >> (and:SI (match_operand:SI 1 "s_register_operand" "%r,0,r,r,r") >> (match_operand:SI 2 "reg_or_int_operand" "I,l,K,r,?n")))] >> @@ -11105,6 +11105,78 @@ (define_expand "thumb_legacy_rev" >> "" >> ) > This shouldn't be necessary - you are just adding another interface to pr= oduce an and insn. I see what you mean. Ok, I'll drop this. > >>=20=20=20 >> +;; ARM-specific expansion of signed mod by power of 2 >> +;; using conditional negate. >> +;; For r0 % n where n is a power of 2 produce: >> +;; rsbs r1, r0, #0 >> +;; and r0, r0, #(n - 1) >> +;; and r1, r1, #(n - 1) >> +;; rsbpl r0, r1, #0 >> + >> +(define_expand "modsi3" >> + [(match_operand:SI 0 "register_operand" "") >> + (match_operand:SI 1 "register_operand" "") >> + (match_operand:SI 2 "const_int_operand" "")] >> + "TARGET_32BIT" >> + { >> + HOST_WIDE_INT val =3D INTVAL (operands[2]); >> + >> + if (val <=3D 0 >> + || exact_log2 (INTVAL (operands[2])) <=3D 0 >> + || !const_ok_for_arm (INTVAL (operands[2]) - 1)) >> + FAIL; >> + >> + rtx mask =3D GEN_INT (val - 1); >> + >> + /* In the special case of x0 % 2 we can do the even shorter: >> + cmp r0, #0 >> + and r0, r0, #1 >> + rsblt r0, r0, #0. */ >> + >> + if (val =3D=3D 2) >> + { >> + rtx cc_reg =3D gen_rtx_REG (CCmode, CC_REGNUM); >> + rtx cond =3D gen_rtx_LT (SImode, cc_reg, const0_rtx); >> + >> + emit_insn (gen_rtx_SET (cc_reg, >> + gen_rtx_COMPARE (CCmode, operands[1], const0_rtx))); >> + >> + rtx masked =3D gen_reg_rtx (SImode); >> + emit_insn (gen_arm_andsi3_insn (masked, operands[1], mask)); > Use emit_insn (gen_andsi3 (masked, operands[1], mask) instead and likewis= e below. Ok, done that. A side effect of this is that since the andsi3 expander hand= les any reg_or_int we can catch more masks this way. Also, due to the way the a= ndsi3 expander is written, for the mask 255 it will generate a zero_extend instead of an a= nd. This may or may not be optimal on some cores but perhaps we should look at = the andsi3 expander to make it more robust as a separate task. > > >> + emit_move_insn (operands[0], >> + gen_rtx_IF_THEN_ELSE (SImode, cond, >> + gen_rtx_NEG (SImode, >> + masked), >> + masked)); >> + DONE; >> + } >> + >> + rtx neg_op =3D gen_reg_rtx (SImode); >> + rtx_insn *insn =3D emit_insn (gen_subsi3_compare0 (neg_op, const0_r= tx, >> + operands[1])); >> + >> + /* Extract the condition register and mode. */ >> + rtx cmp =3D XVECEXP (PATTERN (insn), 0, 0); >> + rtx cc_reg =3D SET_DEST (cmp); >> + rtx cond =3D gen_rtx_GE (SImode, cc_reg, const0_rtx); >> + >> + emit_insn (gen_arm_andsi3_insn (operands[0], operands[1], mask)); >> + >> + rtx masked_neg =3D gen_reg_rtx (SImode); >> + emit_insn (gen_arm_andsi3_insn (masked_neg, neg_op, mask)); >> + >> + /* We want a conditional negate here, but emitting COND_EXEC rtxes >> + during expand does not always work. Do an IF_THEN_ELSE instead.= */ >> + emit_move_insn (operands[0], >> + gen_rtx_IF_THEN_ELSE (SImode, cond, >> + gen_rtx_NEG (SImode, masked_neg), >> + operands[0])); >> + >> + >> + DONE; >> + } >> +) >> + >> (define_expand "bswapsi2" >> [(set (match_operand:SI 0 "s_register_operand" "=3Dr") >> (bswap:SI (match_operand:SI 1 "s_register_operand" "r")))] >> diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.c b/gcc/testsuite/gc= c.target/aarch64/mod_2.c >> new file mode 100644 >> index 0000000..2645c18 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/mod_2.c >> @@ -0,0 +1,7 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */ >> + >> +#include "mod_2.x" >> + >> +/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */ >> +/* { dg-final { scan-assembler-times "and\t\[wx\]\[0-9\]*" 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.x b/gcc/testsuite/gc= c.target/aarch64/mod_2.x >> new file mode 100644 >> index 0000000..2b079a4 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/mod_2.x >> @@ -0,0 +1,5 @@ >> +int >> +f (int x) >> +{ >> + return x % 2; >> +} >> diff --git a/gcc/testsuite/gcc.target/aarch64/mod_256.c b/gcc/testsuite/= gcc.target/aarch64/mod_256.c >> new file mode 100644 >> index 0000000..567332c >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/mod_256.c >> @@ -0,0 +1,6 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */ >> + >> +#include "mod_256.x" >> + >> +/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/mod_256.x b/gcc/testsuite/= gcc.target/aarch64/mod_256.x >> new file mode 100644 >> index 0000000..c1de42c >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/mod_256.x >> @@ -0,0 +1,5 @@ >> +int >> +f (int x) >> +{ >> + return x % 256; >> +} >> diff --git a/gcc/testsuite/gcc.target/arm/mod_2.c b/gcc/testsuite/gcc.ta= rget/arm/mod_2.c >> new file mode 100644 >> index 0000000..93017a1 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/mod_2.c >> @@ -0,0 +1,8 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target arm32 } */ >> +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */ >> + >> +#include "../aarch64/mod_2.x" >> + >> +/* { dg-final { scan-assembler "rsblt\tr\[0-9\]*" } } */ >> +/* { dg-final { scan-assembler-times "and\tr\[0-9\].*1" 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/arm/mod_256.c b/gcc/testsuite/gcc.= target/arm/mod_256.c >> new file mode 100644 >> index 0000000..92ab05a >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/mod_256.c >> @@ -0,0 +1,8 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target arm32 } */ >> +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */ >> + >> +#include "../aarch64/mod_256.x" >> + >> +/* { dg-final { scan-assembler "rsbpl\tr\[0-9\]*" } } */ >> +/* { dg-final { scan-assembler "and\tr\[0-9\].*255" } } */ > > OK with those changes if no regressions. Thanks, I'm attaching the updated patch. I'll commit it when the aarch64 one is ok'd as the tests are partly shared = with this patch. Kyrill 2015-09-08 Kyrylo Tkachov * config/arm/arm.md (*subsi3_compare0): Rename to... (subsi3_compare0): ... This. (modsi3): New define_expand. * config/arm/arm.c (arm_new_rtx_costs, MOD case): Handle case when operand is power of 2. 2015-09-08 Kyrylo Tkachov * gcc.target/aarch64/mod_2.x: New file. * gcc.target/aarch64/mod_256.x: Likewise. * gcc.target/arm/mod_2.c: New test. * gcc.target/arm/mod_256.c: Likewise. * gcc.target/aarch64/mod_2.c: Likewise. * gcc.target/aarch64/mod_256.c: Likewise. > Ramana > --------------060301090703080707010704 Content-Type: text/x-patch; name=arm-mod-2.patch Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="arm-mod-2.patch" Content-length: 5965 commit 9a205ad46fe96e35b0dcbdbfcd89e2a2933a7cbd Author: Kyrylo Tkachov Date: Fri Jul 17 16:30:01 2015 +0100 [ARM][3/3] Expand mod by power of 2 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index fa4e083..a81114d 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9580,6 +9580,24 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum r= tx_code outer_code, return false; /* All arguments must be in registers. */ =20 case MOD: + /* MOD by a power of 2 can be expanded as: + rsbs r1, r0, #0 + and r0, r0, #(n - 1) + and r1, r1, #(n - 1) + rsbpl r0, r1, #0. */ + if (CONST_INT_P (XEXP (x, 1)) + && exact_log2 (INTVAL (XEXP (x, 1))) > 0 + && mode =3D=3D SImode) + { + *cost +=3D COSTS_N_INSNS (3); + + if (speed_p) + *cost +=3D 2 * extra_cost->alu.logical + + extra_cost->alu.arith; + return true; + } + + /* Fall-through. */ case UMOD: *cost =3D LIBCALL_COST (2); return false; /* All arguments must be in registers. */ diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index b6c2047..775ca25 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -1229,7 +1229,7 @@ (define_peephole2 "" ) =20 -(define_insn "*subsi3_compare0" +(define_insn "subsi3_compare0" [(set (reg:CC_NOOV CC_REGNUM) (compare:CC_NOOV (minus:SI (match_operand:SI 1 "arm_rhs_operand" "r,r,I") @@ -11142,6 +11142,75 @@ (define_expand "thumb_legacy_rev" "" ) =20 +;; ARM-specific expansion of signed mod by power of 2 +;; using conditional negate. +;; For r0 % n where n is a power of 2 produce: +;; rsbs r1, r0, #0 +;; and r0, r0, #(n - 1) +;; and r1, r1, #(n - 1) +;; rsbpl r0, r1, #0 + +(define_expand "modsi3" + [(match_operand:SI 0 "register_operand" "") + (match_operand:SI 1 "register_operand" "") + (match_operand:SI 2 "const_int_operand" "")] + "TARGET_32BIT" + { + HOST_WIDE_INT val =3D INTVAL (operands[2]); + + if (val <=3D 0 + || exact_log2 (val) <=3D 0) + FAIL; + + rtx mask =3D GEN_INT (val - 1); + + /* In the special case of x0 % 2 we can do the even shorter: + cmp r0, #0 + and r0, r0, #1 + rsblt r0, r0, #0. */ + + if (val =3D=3D 2) + { + rtx cc_reg =3D arm_gen_compare_reg (LT, + operands[1], const0_rtx, NULL_RTX); + rtx cond =3D gen_rtx_LT (SImode, cc_reg, const0_rtx); + rtx masked =3D gen_reg_rtx (SImode); + + emit_insn (gen_andsi3 (masked, operands[1], mask)); + emit_move_insn (operands[0], + gen_rtx_IF_THEN_ELSE (SImode, cond, + gen_rtx_NEG (SImode, + masked), + masked)); + DONE; + } + + rtx neg_op =3D gen_reg_rtx (SImode); + rtx_insn *insn =3D emit_insn (gen_subsi3_compare0 (neg_op, const0_rtx, + operands[1])); + + /* Extract the condition register and mode. */ + rtx cmp =3D XVECEXP (PATTERN (insn), 0, 0); + rtx cc_reg =3D SET_DEST (cmp); + rtx cond =3D gen_rtx_GE (SImode, cc_reg, const0_rtx); + + emit_insn (gen_andsi3 (operands[0], operands[1], mask)); + + rtx masked_neg =3D gen_reg_rtx (SImode); + emit_insn (gen_andsi3 (masked_neg, neg_op, mask)); + + /* We want a conditional negate here, but emitting COND_EXEC rtxes + during expand does not always work. Do an IF_THEN_ELSE instead. */ + emit_move_insn (operands[0], + gen_rtx_IF_THEN_ELSE (SImode, cond, + gen_rtx_NEG (SImode, masked_neg), + operands[0])); + + + DONE; + } +) + (define_expand "bswapsi2" [(set (match_operand:SI 0 "s_register_operand" "=3Dr") (bswap:SI (match_operand:SI 1 "s_register_operand" "r")))] diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.c b/gcc/testsuite/gcc.t= arget/aarch64/mod_2.c new file mode 100644 index 0000000..2645c18 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/mod_2.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */ + +#include "mod_2.x" + +/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */ +/* { dg-final { scan-assembler-times "and\t\[wx\]\[0-9\]*" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.x b/gcc/testsuite/gcc.t= arget/aarch64/mod_2.x new file mode 100644 index 0000000..2b079a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/mod_2.x @@ -0,0 +1,5 @@ +int +f (int x) +{ + return x % 2; +} diff --git a/gcc/testsuite/gcc.target/aarch64/mod_256.c b/gcc/testsuite/gcc= .target/aarch64/mod_256.c new file mode 100644 index 0000000..567332c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/mod_256.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */ + +#include "mod_256.x" + +/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/mod_256.x b/gcc/testsuite/gcc= .target/aarch64/mod_256.x new file mode 100644 index 0000000..c1de42c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/mod_256.x @@ -0,0 +1,5 @@ +int +f (int x) +{ + return x % 256; +} diff --git a/gcc/testsuite/gcc.target/arm/mod_2.c b/gcc/testsuite/gcc.targe= t/arm/mod_2.c new file mode 100644 index 0000000..93017a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mod_2.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */ + +#include "../aarch64/mod_2.x" + +/* { dg-final { scan-assembler "rsblt\tr\[0-9\]*" } } */ +/* { dg-final { scan-assembler-times "and\tr\[0-9\].*1" 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/mod_256.c b/gcc/testsuite/gcc.tar= get/arm/mod_256.c new file mode 100644 index 0000000..ccb7f3c --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mod_256.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */ + +#include "../aarch64/mod_256.x" + +/* { dg-final { scan-assembler "rsbpl\tr\[0-9\]*" } } */ + --------------060301090703080707010704--