From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-406850-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 102145 invoked by alias); 8 Sep 2015 08:33:50 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 102089 invoked by uid 89); 8 Sep 2015 08:33:49 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_50,SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 08 Sep 2015 08:33:46 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-29-grNxvAi8TwCQ1tSXLnNPFg-1; Tue, 08 Sep 2015 09:33:42 +0100
Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Tue, 8 Sep 2015 09:33:41 +0100
Message-ID: <55EE9D65.3040800@arm.com>
Date: Tue, 08 Sep 2015 08:35:00 -0000
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Ramana Radhakrishnan <ramana.radhakrishnan@foss.arm.com>,  GCC Patches <gcc-patches@gcc.gnu.org>
CC: Richard Earnshaw <Richard.Earnshaw@arm.com>,  Marcus Shawcroft <Marcus.Shawcroft@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com>
Subject: Re: [PATCH][ARM][3/3] Expand mod by power of 2
References: <55B219AE.6010102@arm.com> <55ED5864.9080607@foss.arm.com>
In-Reply-To: <55ED5864.9080607@foss.arm.com>
X-MC-Unique: grNxvAi8TwCQ1tSXLnNPFg-1
Content-Type: multipart/mixed; boundary="------------060301090703080707010704"
X-IsSubscribed: yes
X-SW-Source: 2015-09/txt/msg00475.txt.bz2

This is a multi-part message in MIME format.
--------------060301090703080707010704
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-length: 8687


On 07/09/15 10:27, Ramana Radhakrishnan wrote:
>
> On 24/07/15 11:55, Kyrill Tkachov wrote:
>> commit d562629e36ba013b8f77956a74139330d191bc30
>> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>> Date:   Fri Jul 17 16:30:01 2015 +0100
>>
>>      [ARM][3/3] Expand mod by power of 2
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index e1bc727..6ade07c 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -9556,6 +9556,22 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enu=
m rtx_code outer_code,
>>=20=20=20
>>       case MOD:
>>       case UMOD:
>> +      /* MOD by a power of 2 can be expanded as:
>> +	 rsbs    r1, r0, #0
>> +	 and     r0, r0, #(n - 1)
>> +	 and     r1, r1, #(n - 1)
>> +	 rsbpl   r0, r1, #0.  */
>> +      if (code =3D=3D MOD
>> +	  && CONST_INT_P (XEXP (x, 1))
>> +	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
>> +	  && mode =3D=3D SImode)
>> +	{
>> +	  *cost +=3D COSTS_N_INSNS (3)
>> +		   + 2 * extra_cost->alu.logical
>> +		   + extra_cost->alu.arith;
>> +	  return true;
>> +	}
>> +
>>         *cost =3D LIBCALL_COST (2);
>>         return false;	/* All arguments must be in registers.  */
>>=20=20=20
>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>> index f341109..8301648 100644
>> --- a/gcc/config/arm/arm.md
>> +++ b/gcc/config/arm/arm.md
>> @@ -1229,7 +1229,7 @@ (define_peephole2
>>     ""
>>   )
>>=20=20=20
>> -(define_insn "*subsi3_compare0"
>> +(define_insn "subsi3_compare0"
>>     [(set (reg:CC_NOOV CC_REGNUM)
>>   	(compare:CC_NOOV
>>   	 (minus:SI (match_operand:SI 1 "arm_rhs_operand" "r,r,I")
>> @@ -2158,7 +2158,7 @@ (define_expand "andsi3"
>>   )
>>=20=20=20
>>   ; ??? Check split length for Thumb-2
>> -(define_insn_and_split "*arm_andsi3_insn"
>> +(define_insn_and_split "arm_andsi3_insn"
>>     [(set (match_operand:SI         0 "s_register_operand" "=3Dr,l,r,r,r=
")
>>   	(and:SI (match_operand:SI 1 "s_register_operand" "%r,0,r,r,r")
>>   		(match_operand:SI 2 "reg_or_int_operand" "I,l,K,r,?n")))]
>> @@ -11105,6 +11105,78 @@ (define_expand "thumb_legacy_rev"
>>     ""
>>   )
> This shouldn't be necessary - you are just adding another interface to pr=
oduce an and insn.

I see what you mean. Ok, I'll drop this.

>
>>=20=20=20
>> +;; ARM-specific expansion of signed mod by power of 2
>> +;; using conditional negate.
>> +;; For r0 % n where n is a power of 2 produce:
>> +;; rsbs    r1, r0, #0
>> +;; and     r0, r0, #(n - 1)
>> +;; and     r1, r1, #(n - 1)
>> +;; rsbpl   r0, r1, #0
>> +
>> +(define_expand "modsi3"
>> +  [(match_operand:SI 0 "register_operand" "")
>> +   (match_operand:SI 1 "register_operand" "")
>> +   (match_operand:SI 2 "const_int_operand" "")]
>> +  "TARGET_32BIT"
>> +  {
>> +    HOST_WIDE_INT val =3D INTVAL (operands[2]);
>> +
>> +    if (val <=3D 0
>> +       || exact_log2 (INTVAL (operands[2])) <=3D 0
>> +       || !const_ok_for_arm (INTVAL (operands[2]) - 1))
>> +      FAIL;
>> +
>> +    rtx mask =3D GEN_INT (val - 1);
>> +
>> +    /* In the special case of x0 % 2 we can do the even shorter:
>> +	cmp     r0, #0
>> +	and     r0, r0, #1
>> +	rsblt   r0, r0, #0.  */
>> +
>> +    if (val =3D=3D 2)
>> +      {
>> +	rtx cc_reg =3D gen_rtx_REG (CCmode, CC_REGNUM);
>> +	rtx cond =3D gen_rtx_LT (SImode, cc_reg, const0_rtx);
>> +
>> +	emit_insn (gen_rtx_SET (cc_reg,
>> +			gen_rtx_COMPARE (CCmode, operands[1], const0_rtx)));
>> +
>> +	rtx masked =3D gen_reg_rtx (SImode);
>> +	emit_insn (gen_arm_andsi3_insn (masked, operands[1], mask));
> Use emit_insn (gen_andsi3 (masked, operands[1], mask) instead and likewis=
e below.

Ok, done that. A side effect of this is that since the andsi3 expander hand=
les
any reg_or_int we can catch more masks this way. Also, due to the way the a=
ndsi3 expander
is written, for the mask 255 it will generate a zero_extend instead of an a=
nd.
This may or may not be optimal on some cores but perhaps we should look at =
the andsi3 expander
to make it more robust as a separate task.

>
>
>> +	emit_move_insn (operands[0],
>> +			gen_rtx_IF_THEN_ELSE (SImode, cond,
>> +					      gen_rtx_NEG (SImode,
>> +							   masked),
>> +					      masked));
>> +	DONE;
>> +      }
>> +
>> +    rtx neg_op =3D gen_reg_rtx (SImode);
>> +    rtx_insn *insn =3D emit_insn (gen_subsi3_compare0 (neg_op, const0_r=
tx,
>> +						      operands[1]));
>> +
>> +    /* Extract the condition register and mode.  */
>> +    rtx cmp =3D XVECEXP (PATTERN (insn), 0, 0);
>> +    rtx cc_reg =3D SET_DEST (cmp);
>> +    rtx cond =3D gen_rtx_GE (SImode, cc_reg, const0_rtx);
>> +
>> +    emit_insn (gen_arm_andsi3_insn (operands[0], operands[1], mask));
>> +
>> +    rtx masked_neg =3D gen_reg_rtx (SImode);
>> +    emit_insn (gen_arm_andsi3_insn (masked_neg, neg_op, mask));
>> +
>> +    /* We want a conditional negate here, but emitting COND_EXEC rtxes
>> +       during expand does not always work.  Do an IF_THEN_ELSE instead.=
  */
>> +    emit_move_insn (operands[0],
>> +		    gen_rtx_IF_THEN_ELSE (SImode, cond,
>> +					  gen_rtx_NEG (SImode, masked_neg),
>> +					  operands[0]));
>> +
>> +
>> +    DONE;
>> +  }
>> +)
>> +
>>   (define_expand "bswapsi2"
>>     [(set (match_operand:SI 0 "s_register_operand" "=3Dr")
>>     	(bswap:SI (match_operand:SI 1 "s_register_operand" "r")))]
>> diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.c b/gcc/testsuite/gc=
c.target/aarch64/mod_2.c
>> new file mode 100644
>> index 0000000..2645c18
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/mod_2.c
>> @@ -0,0 +1,7 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */
>> +
>> +#include "mod_2.x"
>> +
>> +/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */
>> +/* { dg-final { scan-assembler-times "and\t\[wx\]\[0-9\]*" 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.x b/gcc/testsuite/gc=
c.target/aarch64/mod_2.x
>> new file mode 100644
>> index 0000000..2b079a4
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/mod_2.x
>> @@ -0,0 +1,5 @@
>> +int
>> +f (int x)
>> +{
>> +  return x % 2;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/mod_256.c b/gcc/testsuite/=
gcc.target/aarch64/mod_256.c
>> new file mode 100644
>> index 0000000..567332c
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/mod_256.c
>> @@ -0,0 +1,6 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */
>> +
>> +#include "mod_256.x"
>> +
>> +/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/mod_256.x b/gcc/testsuite/=
gcc.target/aarch64/mod_256.x
>> new file mode 100644
>> index 0000000..c1de42c
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/mod_256.x
>> @@ -0,0 +1,5 @@
>> +int
>> +f (int x)
>> +{
>> +  return x % 256;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/arm/mod_2.c b/gcc/testsuite/gcc.ta=
rget/arm/mod_2.c
>> new file mode 100644
>> index 0000000..93017a1
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/mod_2.c
>> @@ -0,0 +1,8 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target arm32 } */
>> +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */
>> +
>> +#include "../aarch64/mod_2.x"
>> +
>> +/* { dg-final { scan-assembler "rsblt\tr\[0-9\]*" } } */
>> +/* { dg-final { scan-assembler-times "and\tr\[0-9\].*1" 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/arm/mod_256.c b/gcc/testsuite/gcc.=
target/arm/mod_256.c
>> new file mode 100644
>> index 0000000..92ab05a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/mod_256.c
>> @@ -0,0 +1,8 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target arm32 } */
>> +/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */
>> +
>> +#include "../aarch64/mod_256.x"
>> +
>> +/* { dg-final { scan-assembler "rsbpl\tr\[0-9\]*" } } */
>> +/* { dg-final { scan-assembler "and\tr\[0-9\].*255" } } */
>
> OK with those changes if no regressions.

Thanks,
I'm attaching the updated patch.
I'll commit it when the aarch64 one is ok'd as the tests are partly shared =
with this patch.

Kyrill

2015-09-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

       * config/arm/arm.md (*subsi3_compare0): Rename to...
       (subsi3_compare0): ... This.
       (modsi3): New define_expand.
       * config/arm/arm.c (arm_new_rtx_costs, MOD case): Handle case
       when operand is power of 2.


2015-09-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

       * gcc.target/aarch64/mod_2.x: New file.
       * gcc.target/aarch64/mod_256.x: Likewise.
       * gcc.target/arm/mod_2.c: New test.
       * gcc.target/arm/mod_256.c: Likewise.
       * gcc.target/aarch64/mod_2.c: Likewise.
       * gcc.target/aarch64/mod_256.c: Likewise.

> Ramana
>


--------------060301090703080707010704
Content-Type: text/x-patch; name=arm-mod-2.patch
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
 filename="arm-mod-2.patch"
Content-length: 5965

commit 9a205ad46fe96e35b0dcbdbfcd89e2a2933a7cbd
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Fri Jul 17 16:30:01 2015 +0100

    [ARM][3/3] Expand mod by power of 2

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fa4e083..a81114d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9580,6 +9580,24 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum r=
tx_code outer_code,
       return false;	/* All arguments must be in registers.  */
=20
     case MOD:
+      /* MOD by a power of 2 can be expanded as:
+	 rsbs    r1, r0, #0
+	 and     r0, r0, #(n - 1)
+	 and     r1, r1, #(n - 1)
+	 rsbpl   r0, r1, #0.  */
+      if (CONST_INT_P (XEXP (x, 1))
+	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
+	  && mode =3D=3D SImode)
+	{
+	  *cost +=3D COSTS_N_INSNS (3);
+
+	  if (speed_p)
+	    *cost +=3D 2 * extra_cost->alu.logical
+		     + extra_cost->alu.arith;
+	  return true;
+	}
+
+    /* Fall-through.  */
     case UMOD:
       *cost =3D LIBCALL_COST (2);
       return false;	/* All arguments must be in registers.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index b6c2047..775ca25 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1229,7 +1229,7 @@ (define_peephole2
   ""
 )
=20
-(define_insn "*subsi3_compare0"
+(define_insn "subsi3_compare0"
   [(set (reg:CC_NOOV CC_REGNUM)
 	(compare:CC_NOOV
 	 (minus:SI (match_operand:SI 1 "arm_rhs_operand" "r,r,I")
@@ -11142,6 +11142,75 @@ (define_expand "thumb_legacy_rev"
   ""
 )
=20
+;; ARM-specific expansion of signed mod by power of 2
+;; using conditional negate.
+;; For r0 % n where n is a power of 2 produce:
+;; rsbs    r1, r0, #0
+;; and     r0, r0, #(n - 1)
+;; and     r1, r1, #(n - 1)
+;; rsbpl   r0, r1, #0
+
+(define_expand "modsi3"
+  [(match_operand:SI 0 "register_operand" "")
+   (match_operand:SI 1 "register_operand" "")
+   (match_operand:SI 2 "const_int_operand" "")]
+  "TARGET_32BIT"
+  {
+    HOST_WIDE_INT val =3D INTVAL (operands[2]);
+
+    if (val <=3D 0
+       || exact_log2 (val) <=3D 0)
+      FAIL;
+
+    rtx mask =3D GEN_INT (val - 1);
+
+    /* In the special case of x0 % 2 we can do the even shorter:
+	cmp     r0, #0
+	and     r0, r0, #1
+	rsblt   r0, r0, #0.  */
+
+    if (val =3D=3D 2)
+      {
+	rtx cc_reg =3D arm_gen_compare_reg (LT,
+					  operands[1], const0_rtx, NULL_RTX);
+	rtx cond =3D gen_rtx_LT (SImode, cc_reg, const0_rtx);
+	rtx masked =3D gen_reg_rtx (SImode);
+
+	emit_insn (gen_andsi3 (masked, operands[1], mask));
+	emit_move_insn (operands[0],
+			gen_rtx_IF_THEN_ELSE (SImode, cond,
+					      gen_rtx_NEG (SImode,
+							   masked),
+					      masked));
+	DONE;
+      }
+
+    rtx neg_op =3D gen_reg_rtx (SImode);
+    rtx_insn *insn =3D emit_insn (gen_subsi3_compare0 (neg_op, const0_rtx,
+						      operands[1]));
+
+    /* Extract the condition register and mode.  */
+    rtx cmp =3D XVECEXP (PATTERN (insn), 0, 0);
+    rtx cc_reg =3D SET_DEST (cmp);
+    rtx cond =3D gen_rtx_GE (SImode, cc_reg, const0_rtx);
+
+    emit_insn (gen_andsi3 (operands[0], operands[1], mask));
+
+    rtx masked_neg =3D gen_reg_rtx (SImode);
+    emit_insn (gen_andsi3 (masked_neg, neg_op, mask));
+
+    /* We want a conditional negate here, but emitting COND_EXEC rtxes
+       during expand does not always work.  Do an IF_THEN_ELSE instead.  */
+    emit_move_insn (operands[0],
+		    gen_rtx_IF_THEN_ELSE (SImode, cond,
+					  gen_rtx_NEG (SImode, masked_neg),
+					  operands[0]));
+
+
+    DONE;
+  }
+)
+
 (define_expand "bswapsi2"
   [(set (match_operand:SI 0 "s_register_operand" "=3Dr")
   	(bswap:SI (match_operand:SI 1 "s_register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.c b/gcc/testsuite/gcc.t=
arget/aarch64/mod_2.c
new file mode 100644
index 0000000..2645c18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mod_2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */
+
+#include "mod_2.x"
+
+/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */
+/* { dg-final { scan-assembler-times "and\t\[wx\]\[0-9\]*" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.x b/gcc/testsuite/gcc.t=
arget/aarch64/mod_2.x
new file mode 100644
index 0000000..2b079a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mod_2.x
@@ -0,0 +1,5 @@
+int
+f (int x)
+{
+  return x % 2;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/mod_256.c b/gcc/testsuite/gcc=
.target/aarch64/mod_256.c
new file mode 100644
index 0000000..567332c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mod_256.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */
+
+#include "mod_256.x"
+
+/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/mod_256.x b/gcc/testsuite/gcc=
.target/aarch64/mod_256.x
new file mode 100644
index 0000000..c1de42c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mod_256.x
@@ -0,0 +1,5 @@
+int
+f (int x)
+{
+  return x % 256;
+}
diff --git a/gcc/testsuite/gcc.target/arm/mod_2.c b/gcc/testsuite/gcc.targe=
t/arm/mod_2.c
new file mode 100644
index 0000000..93017a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mod_2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */
+
+#include "../aarch64/mod_2.x"
+
+/* { dg-final { scan-assembler "rsblt\tr\[0-9\]*" } } */
+/* { dg-final { scan-assembler-times "and\tr\[0-9\].*1" 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mod_256.c b/gcc/testsuite/gcc.tar=
get/arm/mod_256.c
new file mode 100644
index 0000000..ccb7f3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mod_256.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O2 -mcpu=3Dcortex-a57 -save-temps" } */
+
+#include "../aarch64/mod_256.x"
+
+/* { dg-final { scan-assembler "rsbpl\tr\[0-9\]*" } } */
+

--------------060301090703080707010704--