public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
@ 2015-07-24 10:55 Kyrill Tkachov
  2015-07-31  8:43 ` Kyrill Tkachov
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Kyrill Tkachov @ 2015-07-24 10:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

[-- Attachment #1: Type: text/plain, Size: 2520 bytes --]

Hi all,

This patch implements an aarch64-specific expansion of the signed modulo by a power of 2.
The proposed sequence makes use of the conditional negate instruction CSNEG.
For a power of N, x % N can be calculated with:
negs   x1, x0
and    x0, x0, #(N - 1)
and    x1, x1, #(N - 1)
csneg  x0, x0, x1, mi

So, for N == 256 this would be:
negs   x1, x0
and    x0, x0, #255
and    x1, x1, #255
csneg  x0, x0, x1, mi

For comparison, the existing sequence emitted by expand_smod_pow2 in expmed.c is:
asr     x1, x0, 63
lsr     x1, x1, 56
add     x0, x0, x1
and     x0, x0, 255
sub     x0, x0, x1

Note that the CSNEG sequence is one instruction shorter and that the two and operations
are independent, compared to the existing sequence where all instructions are dependent
on the preceeding instructions.

For the special case of N == 2 we can do even better:
cmp     x0, xzr
and     x0, x0, 1
csneg   x0, x0, x0, ge

I first tried implementing this in the generic code in expmed.c but that didn't work
out for a few reasons:

* This relies on having a conditional-negate instruction. We could gate it on
HAVE_conditional_move and the combiner is capable of merging the final negate into
the conditional move if a conditional negate is available (like on aarch64) but on
targets without a conditional negate this would end up emitting a separate negate.

* The first negs has to be a negs for the sequence to be a win i.e. having a separate
negate and compare makes the sequence slower than the existing one (at least in my
microbenchmarking) and I couldn't get subsequent passes to combine the negate and combine
into the negs (presumably due to the use of the negated result in one of the ands).
Doing it in the aarch64 backend where I could just call the exact gen_* functions that
I need worked much more cleanly.

The costing logic is updated to reflect this sequence during the intialisation of
expmed.c where it calculates the smod_pow2_cheap metric.

The tests will come in patch 3 of the series which are partly shared with the equivalent
arm implementation.

Bootstrapped and tested on aarch64.
Ok for trunk?

Thanks,
Kyrill

2015-07-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md (mod<mode>3): New define_expand.
     (*neg<mode>2_compare0): Rename to...
     (neg<mode>2_compare0): ... This.
     * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case): Reflect
     CSNEG sequence in MOD by power of 2 case.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aarch64-mod-2.patch --]
[-- Type: text/x-patch; name=aarch64-mod-2.patch, Size: 3961 bytes --]

commit 428cbef107b559f67006fe84b4522210902b101f
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Jul 15 17:01:13 2015 +0100

    [AArch64][1/3] Expand signed mod by power of 2 using CSNEG

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9d88a60..7bb4a55 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6639,8 +6639,26 @@ cost_plus:
 	  if (VECTOR_MODE_P (mode))
 	    *cost += extra_cost->vect.alu;
 	  else if (GET_MODE_CLASS (mode) == MODE_INT)
-	    *cost += (extra_cost->mult[mode == DImode].add
-		      + extra_cost->mult[mode == DImode].idiv);
+	    {
+	      /* We can expand signed mod by power of 2 using a
+		 NEGS, two parallel ANDs and a CSNEG.  Assume here
+		 that CSNEG is COSTS_N_INSNS (1).  This case should
+		 only ever be reached through the set_smod_pow2_cheap check
+		 in expmed.c.  */
+	      if (code == MOD
+		  && CONST_INT_P (XEXP (x, 1))
+		  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
+		  && (mode == SImode || mode == DImode))
+		{
+		  *cost += COSTS_N_INSNS (3)
+			   + 2 * extra_cost->alu.logical
+			   + extra_cost->alu.arith;
+		  return true;
+		}
+
+	      *cost += (extra_cost->mult[mode == DImode].add
+			+ extra_cost->mult[mode == DImode].idiv);
+	    }
 	  else if (mode == DFmode)
 	    *cost += (extra_cost->fp[1].mult
 		      + extra_cost->fp[1].div);
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f264534..3aaf407 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -302,6 +302,62 @@ (define_expand "cmp<mode>"
   }
 )
 
+;; AArch64-specific expansion of signed mod by power of 2 using CSNEG.
+;; For x0 % n where n is a power of 2 produce:
+;; negs   x1, x0
+;; and    x0, x0, #(n - 1)
+;; and    x1, x1, #(n - 1)
+;; csneg  x0, x0, x1, mi
+
+(define_expand "mod<mode>3"
+  [(match_operand:GPI 0 "register_operand" "")
+   (match_operand:GPI 1 "register_operand" "")
+   (match_operand:GPI 2 "const_int_operand" "")]
+  ""
+  {
+    HOST_WIDE_INT val = INTVAL (operands[2]);
+
+    if (val <= 0
+       || exact_log2 (INTVAL (operands[2])) <= 0
+       || !aarch64_bitmask_imm (INTVAL (operands[2]) - 1, <MODE>mode))
+      FAIL;
+
+    rtx mask = GEN_INT (val - 1);
+
+    /* In the special case of x0 % 2 we can do the even shorter:
+	cmp     x0, xzr
+	and     x0, x0, 1
+	csneg   x0, x0, x0, ge.  */
+    if (val == 2)
+      {
+	rtx masked = gen_reg_rtx (<MODE>mode);
+	rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
+	emit_insn (gen_and<mode>3 (masked, operands[1], mask));
+	rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
+	emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
+	DONE;
+      }
+
+    rtx neg_op = gen_reg_rtx (<MODE>mode);
+    rtx_insn *insn = emit_insn (gen_neg<mode>2_compare0 (neg_op, operands[1]));
+
+    /* Extract the condition register and mode.  */
+    rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
+    rtx cc_reg = SET_DEST (cmp);
+    rtx cond = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx);
+
+    rtx masked_pos = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_pos, operands[1], mask));
+
+    rtx masked_neg = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_neg, neg_op, mask));
+
+    emit_insn (gen_csneg3<mode>_insn (operands[0], cond,
+				       masked_neg, masked_pos));
+    DONE;
+  }
+)
+
 (define_insn "*condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
 			    [(match_operand 1 "cc_register" "") (const_int 0)])
@@ -2372,7 +2428,7 @@ (define_insn "*ngcsi_uxtw"
   [(set_attr "type" "adc_reg")]
 )
 
-(define_insn "*neg<mode>2_compare0"
+(define_insn "neg<mode>2_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ (neg:GPI (match_operand:GPI 1 "register_operand" "r"))
 		       (const_int 0)))

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
  2015-07-24 10:55 [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG Kyrill Tkachov
@ 2015-07-31  8:43 ` Kyrill Tkachov
  2015-07-31  9:26 ` Andrew Pinski
  2015-08-03 13:01 ` James Greenhalgh
  2 siblings, 0 replies; 11+ messages in thread
From: Kyrill Tkachov @ 2015-07-31  8:43 UTC (permalink / raw)
  To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

Ping.

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02036.html
Thanks,
Kyrill

On 24/07/15 11:55, Kyrill Tkachov wrote:
> Hi all,
>
> This patch implements an aarch64-specific expansion of the signed modulo by a power of 2.
> The proposed sequence makes use of the conditional negate instruction CSNEG.
> For a power of N, x % N can be calculated with:
> negs   x1, x0
> and    x0, x0, #(N - 1)
> and    x1, x1, #(N - 1)
> csneg  x0, x0, x1, mi
>
> So, for N == 256 this would be:
> negs   x1, x0
> and    x0, x0, #255
> and    x1, x1, #255
> csneg  x0, x0, x1, mi
>
> For comparison, the existing sequence emitted by expand_smod_pow2 in expmed.c is:
> asr     x1, x0, 63
> lsr     x1, x1, 56
> add     x0, x0, x1
> and     x0, x0, 255
> sub     x0, x0, x1
>
> Note that the CSNEG sequence is one instruction shorter and that the two and operations
> are independent, compared to the existing sequence where all instructions are dependent
> on the preceeding instructions.
>
> For the special case of N == 2 we can do even better:
> cmp     x0, xzr
> and     x0, x0, 1
> csneg   x0, x0, x0, ge
>
> I first tried implementing this in the generic code in expmed.c but that didn't work
> out for a few reasons:
>
> * This relies on having a conditional-negate instruction. We could gate it on
> HAVE_conditional_move and the combiner is capable of merging the final negate into
> the conditional move if a conditional negate is available (like on aarch64) but on
> targets without a conditional negate this would end up emitting a separate negate.
>
> * The first negs has to be a negs for the sequence to be a win i.e. having a separate
> negate and compare makes the sequence slower than the existing one (at least in my
> microbenchmarking) and I couldn't get subsequent passes to combine the negate and combine
> into the negs (presumably due to the use of the negated result in one of the ands).
> Doing it in the aarch64 backend where I could just call the exact gen_* functions that
> I need worked much more cleanly.
>
> The costing logic is updated to reflect this sequence during the intialisation of
> expmed.c where it calculates the smod_pow2_cheap metric.
>
> The tests will come in patch 3 of the series which are partly shared with the equivalent
> arm implementation.
>
> Bootstrapped and tested on aarch64.
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2015-07-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       * config/aarch64/aarch64.md (mod<mode>3): New define_expand.
>       (*neg<mode>2_compare0): Rename to...
>       (neg<mode>2_compare0): ... This.
>       * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case): Reflect
>       CSNEG sequence in MOD by power of 2 case.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
  2015-07-24 10:55 [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG Kyrill Tkachov
  2015-07-31  8:43 ` Kyrill Tkachov
@ 2015-07-31  9:26 ` Andrew Pinski
  2015-08-03 13:01 ` James Greenhalgh
  2 siblings, 0 replies; 11+ messages in thread
From: Andrew Pinski @ 2015-07-31  9:26 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

On Fri, Jul 24, 2015 at 3:55 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> Hi all,
>
> This patch implements an aarch64-specific expansion of the signed modulo by
> a power of 2.
> The proposed sequence makes use of the conditional negate instruction CSNEG.
> For a power of N, x % N can be calculated with:
> negs   x1, x0
> and    x0, x0, #(N - 1)
> and    x1, x1, #(N - 1)
> csneg  x0, x0, x1, mi
>
> So, for N == 256 this would be:
> negs   x1, x0
> and    x0, x0, #255
> and    x1, x1, #255
> csneg  x0, x0, x1, mi
>
> For comparison, the existing sequence emitted by expand_smod_pow2 in
> expmed.c is:
> asr     x1, x0, 63
> lsr     x1, x1, 56
> add     x0, x0, x1
> and     x0, x0, 255
> sub     x0, x0, x1
>
> Note that the CSNEG sequence is one instruction shorter and that the two and
> operations
> are independent, compared to the existing sequence where all instructions
> are dependent
> on the preceeding instructions.

Just FYI.

For ThunderX, this is a size win and a performance win at least in a
microbenchmark.

>
> For the special case of N == 2 we can do even better:
> cmp     x0, xzr
> and     x0, x0, 1
> csneg   x0, x0, x0, ge

This is a size win and a performance win on ThunderX.

>
> I first tried implementing this in the generic code in expmed.c but that
> didn't work
> out for a few reasons:
>
> * This relies on having a conditional-negate instruction. We could gate it
> on
> HAVE_conditional_move and the combiner is capable of merging the final
> negate into
> the conditional move if a conditional negate is available (like on aarch64)
> but on
> targets without a conditional negate this would end up emitting a separate
> negate.
>
> * The first negs has to be a negs for the sequence to be a win i.e. having a
> separate
> negate and compare makes the sequence slower than the existing one (at least
> in my
> microbenchmarking) and I couldn't get subsequent passes to combine the
> negate and combine
> into the negs (presumably due to the use of the negated result in one of the
> ands).
> Doing it in the aarch64 backend where I could just call the exact gen_*
> functions that
> I need worked much more cleanly.

I agree this does make it harder to implement in a target generic way.

Thanks,
Andrew

>
> The costing logic is updated to reflect this sequence during the
> intialisation of
> expmed.c where it calculates the smod_pow2_cheap metric.
>
> The tests will come in patch 3 of the series which are partly shared with
> the equivalent
> arm implementation.
>
> Bootstrapped and tested on aarch64.
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2015-07-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/aarch64.md (mod<mode>3): New define_expand.
>     (*neg<mode>2_compare0): Rename to...
>     (neg<mode>2_compare0): ... This.
>     * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case): Reflect
>     CSNEG sequence in MOD by power of 2 case.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
  2015-07-24 10:55 [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG Kyrill Tkachov
  2015-07-31  8:43 ` Kyrill Tkachov
  2015-07-31  9:26 ` Andrew Pinski
@ 2015-08-03 13:01 ` James Greenhalgh
  2015-08-13 12:41   ` Kyrill Tkachov
  2 siblings, 1 reply; 11+ messages in thread
From: James Greenhalgh @ 2015-08-03 13:01 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

On Fri, Jul 24, 2015 at 11:55:33AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch implements an aarch64-specific expansion of the signed modulo by a power of 2.
> The proposed sequence makes use of the conditional negate instruction CSNEG.
> For a power of N, x % N can be calculated with:
> negs   x1, x0
> and    x0, x0, #(N - 1)
> and    x1, x1, #(N - 1)
> csneg  x0, x0, x1, mi
> 
> So, for N == 256 this would be:
> negs   x1, x0
> and    x0, x0, #255
> and    x1, x1, #255
> csneg  x0, x0, x1, mi
> 
> For comparison, the existing sequence emitted by expand_smod_pow2 in expmed.c is:
> asr     x1, x0, 63
> lsr     x1, x1, 56
> add     x0, x0, x1
> and     x0, x0, 255
> sub     x0, x0, x1
> 
> Note that the CSNEG sequence is one instruction shorter and that the two and operations
> are independent, compared to the existing sequence where all instructions are dependent
> on the preceeding instructions.
> 
> For the special case of N == 2 we can do even better:
> cmp     x0, xzr
> and     x0, x0, 1
> csneg   x0, x0, x0, ge
> 
> I first tried implementing this in the generic code in expmed.c but that didn't work
> out for a few reasons:
> 
> * This relies on having a conditional-negate instruction. We could gate it on
> HAVE_conditional_move and the combiner is capable of merging the final negate into
> the conditional move if a conditional negate is available (like on aarch64) but on
> targets without a conditional negate this would end up emitting a separate negate.
> 
> * The first negs has to be a negs for the sequence to be a win i.e. having a separate
> negate and compare makes the sequence slower than the existing one (at least in my
> microbenchmarking) and I couldn't get subsequent passes to combine the negate and combine
> into the negs (presumably due to the use of the negated result in one of the ands).
> Doing it in the aarch64 backend where I could just call the exact gen_* functions that
> I need worked much more cleanly.
> 
> The costing logic is updated to reflect this sequence during the intialisation of
> expmed.c where it calculates the smod_pow2_cheap metric.
> 
> The tests will come in patch 3 of the series which are partly shared with the equivalent
> arm implementation.
> 
> Bootstrapped and tested on aarch64.
> Ok for trunk?
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 9d88a60..7bb4a55 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -6639,8 +6639,26 @@ cost_plus:
>  	  if (VECTOR_MODE_P (mode))
>  	    *cost += extra_cost->vect.alu;
>  	  else if (GET_MODE_CLASS (mode) == MODE_INT)
> -	    *cost += (extra_cost->mult[mode == DImode].add
> -		      + extra_cost->mult[mode == DImode].idiv);
> +	    {
> +	      /* We can expand signed mod by power of 2 using a
> +		 NEGS, two parallel ANDs and a CSNEG.  Assume here
> +		 that CSNEG is COSTS_N_INSNS (1).  This case should
> +		 only ever be reached through the set_smod_pow2_cheap check
> +		 in expmed.c.  */
> +	      if (code == MOD
> +		  && CONST_INT_P (XEXP (x, 1))
> +		  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
> +		  && (mode == SImode || mode == DImode))
> +		{
> +		  *cost += COSTS_N_INSNS (3)
> +			   + 2 * extra_cost->alu.logical
> +			   + extra_cost->alu.arith;
> +		  return true;
> +		}
> +
> +	      *cost += (extra_cost->mult[mode == DImode].add
> +			+ extra_cost->mult[mode == DImode].idiv);
> +	    }
>  	  else if (mode == DFmode)
>  	    *cost += (extra_cost->fp[1].mult
>  		      + extra_cost->fp[1].div);

This looks like it calculates the wrong cost for !speed. I think we will
still expand through mod<mode>3 when compiling for size, so we probably
still want to cost the multiple instructions.

Have I misunderstood?

Thanks,
James

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
  2015-08-03 13:01 ` James Greenhalgh
@ 2015-08-13 12:41   ` Kyrill Tkachov
  2015-08-20  8:24     ` Kyrill Tkachov
  2015-09-01  9:25     ` James Greenhalgh
  0 siblings, 2 replies; 11+ messages in thread
From: Kyrill Tkachov @ 2015-08-13 12:41 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 4600 bytes --]


On 03/08/15 14:01, James Greenhalgh wrote:
> On Fri, Jul 24, 2015 at 11:55:33AM +0100, Kyrill Tkachov wrote:
>> Hi all,
>>
>> This patch implements an aarch64-specific expansion of the signed modulo by a power of 2.
>> The proposed sequence makes use of the conditional negate instruction CSNEG.
>> For a power of N, x % N can be calculated with:
>> negs   x1, x0
>> and    x0, x0, #(N - 1)
>> and    x1, x1, #(N - 1)
>> csneg  x0, x0, x1, mi
>>
>> So, for N == 256 this would be:
>> negs   x1, x0
>> and    x0, x0, #255
>> and    x1, x1, #255
>> csneg  x0, x0, x1, mi
>>
>> For comparison, the existing sequence emitted by expand_smod_pow2 in expmed.c is:
>> asr     x1, x0, 63
>> lsr     x1, x1, 56
>> add     x0, x0, x1
>> and     x0, x0, 255
>> sub     x0, x0, x1
>>
>> Note that the CSNEG sequence is one instruction shorter and that the two and operations
>> are independent, compared to the existing sequence where all instructions are dependent
>> on the preceeding instructions.
>>
>> For the special case of N == 2 we can do even better:
>> cmp     x0, xzr
>> and     x0, x0, 1
>> csneg   x0, x0, x0, ge
>>
>> I first tried implementing this in the generic code in expmed.c but that didn't work
>> out for a few reasons:
>>
>> * This relies on having a conditional-negate instruction. We could gate it on
>> HAVE_conditional_move and the combiner is capable of merging the final negate into
>> the conditional move if a conditional negate is available (like on aarch64) but on
>> targets without a conditional negate this would end up emitting a separate negate.
>>
>> * The first negs has to be a negs for the sequence to be a win i.e. having a separate
>> negate and compare makes the sequence slower than the existing one (at least in my
>> microbenchmarking) and I couldn't get subsequent passes to combine the negate and combine
>> into the negs (presumably due to the use of the negated result in one of the ands).
>> Doing it in the aarch64 backend where I could just call the exact gen_* functions that
>> I need worked much more cleanly.
>>
>> The costing logic is updated to reflect this sequence during the intialisation of
>> expmed.c where it calculates the smod_pow2_cheap metric.
>>
>> The tests will come in patch 3 of the series which are partly shared with the equivalent
>> arm implementation.
>>
>> Bootstrapped and tested on aarch64.
>> Ok for trunk?
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 9d88a60..7bb4a55 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -6639,8 +6639,26 @@ cost_plus:
>>   	  if (VECTOR_MODE_P (mode))
>>   	    *cost += extra_cost->vect.alu;
>>   	  else if (GET_MODE_CLASS (mode) == MODE_INT)
>> -	    *cost += (extra_cost->mult[mode == DImode].add
>> -		      + extra_cost->mult[mode == DImode].idiv);
>> +	    {
>> +	      /* We can expand signed mod by power of 2 using a
>> +		 NEGS, two parallel ANDs and a CSNEG.  Assume here
>> +		 that CSNEG is COSTS_N_INSNS (1).  This case should
>> +		 only ever be reached through the set_smod_pow2_cheap check
>> +		 in expmed.c.  */
>> +	      if (code == MOD
>> +		  && CONST_INT_P (XEXP (x, 1))
>> +		  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
>> +		  && (mode == SImode || mode == DImode))
>> +		{
>> +		  *cost += COSTS_N_INSNS (3)
>> +			   + 2 * extra_cost->alu.logical
>> +			   + extra_cost->alu.arith;
>> +		  return true;
>> +		}
>> +
>> +	      *cost += (extra_cost->mult[mode == DImode].add
>> +			+ extra_cost->mult[mode == DImode].idiv);
>> +	    }
>>   	  else if (mode == DFmode)
>>   	    *cost += (extra_cost->fp[1].mult
>>   		      + extra_cost->fp[1].div);
> This looks like it calculates the wrong cost for !speed. I think we will
> still expand through mod<mode>3 when compiling for size, so we probably
> still want to cost the multiple instructions.
>
> Have I misunderstood?
You're right, the logic needs a bit of wiggling to do the right
thing. I've moved this case into a separate MOD case and added
a gate on speed for the extra costs addition.

Ok?

Thanks,
Kyrill

2015-08-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md (mod<mode>3): New define_expand.
     (*neg<mode>2_compare0): Rename to...
     (neg<mode>2_compare0): ... This.
     * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case):
     Move check for speed inside the if-then-elses.  Reflect
     CSNEG sequence in MOD by power of 2 case.

>
> Thanks,
> James
>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aarch64-mod-2.patch --]
[-- Type: text/x-patch; name=aarch64-mod-2.patch, Size: 3633 bytes --]

commit de67e5fba716ce835f595c4167f57ec4faf61607
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Jul 15 17:01:13 2015 +0100

    [AArch64][1/3] Expand signed mod by power of 2 using CSNEG

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1394ed7..c8bd8d2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6652,6 +6652,25 @@ cost_plus:
       return true;
 
     case MOD:
+    /* We can expand signed mod by power of 2 using a
+       NEGS, two parallel ANDs and a CSNEG.  Assume here
+       that CSNEG is COSTS_N_INSNS (1).  This case should
+       only ever be reached through the set_smod_pow2_cheap check
+       in expmed.c.  */
+      if (CONST_INT_P (XEXP (x, 1))
+	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
+	  && (mode == SImode || mode == DImode))
+	{
+	  *cost += COSTS_N_INSNS (3);
+
+	  if (speed)
+	    *cost += 2 * extra_cost->alu.logical
+		     + extra_cost->alu.arith;
+
+	  return true;
+	}
+
+    /* Fall-through.  */
     case UMOD:
       if (speed)
 	{
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7b04c4..a515573 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -302,6 +302,62 @@ (define_expand "cmp<mode>"
   }
 )
 
+;; AArch64-specific expansion of signed mod by power of 2 using CSNEG.
+;; For x0 % n where n is a power of 2 produce:
+;; negs   x1, x0
+;; and    x0, x0, #(n - 1)
+;; and    x1, x1, #(n - 1)
+;; csneg  x0, x0, x1, mi
+
+(define_expand "mod<mode>3"
+  [(match_operand:GPI 0 "register_operand" "")
+   (match_operand:GPI 1 "register_operand" "")
+   (match_operand:GPI 2 "const_int_operand" "")]
+  ""
+  {
+    HOST_WIDE_INT val = INTVAL (operands[2]);
+
+    if (val <= 0
+       || exact_log2 (INTVAL (operands[2])) <= 0
+       || !aarch64_bitmask_imm (INTVAL (operands[2]) - 1, <MODE>mode))
+      FAIL;
+
+    rtx mask = GEN_INT (val - 1);
+
+    /* In the special case of x0 % 2 we can do the even shorter:
+	cmp     x0, xzr
+	and     x0, x0, 1
+	csneg   x0, x0, x0, ge.  */
+    if (val == 2)
+      {
+	rtx masked = gen_reg_rtx (<MODE>mode);
+	rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
+	emit_insn (gen_and<mode>3 (masked, operands[1], mask));
+	rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
+	emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
+	DONE;
+      }
+
+    rtx neg_op = gen_reg_rtx (<MODE>mode);
+    rtx_insn *insn = emit_insn (gen_neg<mode>2_compare0 (neg_op, operands[1]));
+
+    /* Extract the condition register and mode.  */
+    rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
+    rtx cc_reg = SET_DEST (cmp);
+    rtx cond = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx);
+
+    rtx masked_pos = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_pos, operands[1], mask));
+
+    rtx masked_neg = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_neg, neg_op, mask));
+
+    emit_insn (gen_csneg3<mode>_insn (operands[0], cond,
+				       masked_neg, masked_pos));
+    DONE;
+  }
+)
+
 (define_insn "*condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
 			    [(match_operand 1 "cc_register" "") (const_int 0)])
@@ -2392,7 +2448,7 @@ (define_insn "*ngcsi_uxtw"
   [(set_attr "type" "adc_reg")]
 )
 
-(define_insn "*neg<mode>2_compare0"
+(define_insn "neg<mode>2_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ (neg:GPI (match_operand:GPI 1 "register_operand" "r"))
 		       (const_int 0)))

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
  2015-08-13 12:41   ` Kyrill Tkachov
@ 2015-08-20  8:24     ` Kyrill Tkachov
  2015-09-01  8:37       ` Kyrill Tkachov
  2015-09-01  9:25     ` James Greenhalgh
  1 sibling, 1 reply; 11+ messages in thread
From: Kyrill Tkachov @ 2015-08-20  8:24 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00710.html

Thanks,
Kyrill

On 03/08/15 14:01, James Greenhalgh wrote:
> On Fri, Jul 24, 2015 at 11:55:33AM +0100, Kyrill Tkachov wrote:
>> Hi all,
>>
>> This patch implements an aarch64-specific expansion of the signed modulo
by a power of 2.
>> The proposed sequence makes use of the conditional negate instruction
CSNEG.
>> For a power of N, x % N can be calculated with:
>> negs   x1, x0
>> and    x0, x0, #(N - 1)
>> and    x1, x1, #(N - 1)
>> csneg  x0, x0, x1, mi
>>
>> So, for N == 256 this would be:
>> negs   x1, x0
>> and    x0, x0, #255
>> and    x1, x1, #255
>> csneg  x0, x0, x1, mi
>>
>> For comparison, the existing sequence emitted by expand_smod_pow2 in
expmed.c is:
>> asr     x1, x0, 63
>> lsr     x1, x1, 56
>> add     x0, x0, x1
>> and     x0, x0, 255
>> sub     x0, x0, x1
>>
>> Note that the CSNEG sequence is one instruction shorter and that the two
and operations
>> are independent, compared to the existing sequence where all instructions
are dependent
>> on the preceeding instructions.
>>
>> For the special case of N == 2 we can do even better:
>> cmp     x0, xzr
>> and     x0, x0, 1
>> csneg   x0, x0, x0, ge
>>
>> I first tried implementing this in the generic code in expmed.c but that
didn't work
>> out for a few reasons:
>>
>> * This relies on having a conditional-negate instruction. We could gate
it on
>> HAVE_conditional_move and the combiner is capable of merging the final
negate into
>> the conditional move if a conditional negate is available (like on
aarch64) but on
>> targets without a conditional negate this would end up emitting a
separate negate.
>>
>> * The first negs has to be a negs for the sequence to be a win i.e.
having a separate
>> negate and compare makes the sequence slower than the existing one (at
least in my
>> microbenchmarking) and I couldn't get subsequent passes to combine the
negate and combine
>> into the negs (presumably due to the use of the negated result in one of
the ands).
>> Doing it in the aarch64 backend where I could just call the exact gen_*
functions that
>> I need worked much more cleanly.
>>
>> The costing logic is updated to reflect this sequence during the
intialisation of
>> expmed.c where it calculates the smod_pow2_cheap metric.
>>
>> The tests will come in patch 3 of the series which are partly shared with
the equivalent
>> arm implementation.
>>
>> Bootstrapped and tested on aarch64.
>> Ok for trunk?
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 9d88a60..7bb4a55 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -6639,8 +6639,26 @@ cost_plus:
>>   	  if (VECTOR_MODE_P (mode))
>>   	    *cost += extra_cost->vect.alu;
>>   	  else if (GET_MODE_CLASS (mode) == MODE_INT)
>> -	    *cost += (extra_cost->mult[mode == DImode].add
>> -		      + extra_cost->mult[mode == DImode].idiv);
>> +	    {
>> +	      /* We can expand signed mod by power of 2 using a
>> +		 NEGS, two parallel ANDs and a CSNEG.  Assume here
>> +		 that CSNEG is COSTS_N_INSNS (1).  This case should
>> +		 only ever be reached through the set_smod_pow2_cheap check
>> +		 in expmed.c.  */
>> +	      if (code == MOD
>> +		  && CONST_INT_P (XEXP (x, 1))
>> +		  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
>> +		  && (mode == SImode || mode == DImode))
>> +		{
>> +		  *cost += COSTS_N_INSNS (3)
>> +			   + 2 * extra_cost->alu.logical
>> +			   + extra_cost->alu.arith;
>> +		  return true;
>> +		}
>> +
>> +	      *cost += (extra_cost->mult[mode == DImode].add
>> +			+ extra_cost->mult[mode == DImode].idiv);
>> +	    }
>>   	  else if (mode == DFmode)
>>   	    *cost += (extra_cost->fp[1].mult
>>   		      + extra_cost->fp[1].div);
> This looks like it calculates the wrong cost for !speed. I think we will
> still expand through mod<mode>3 when compiling for size, so we probably
> still want to cost the multiple instructions.
>
> Have I misunderstood?
You're right, the logic needs a bit of wiggling to do the right
thing. I've moved this case into a separate MOD case and added
a gate on speed for the extra costs addition.

Ok?

Thanks,
Kyrill

2015-08-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md (mod<mode>3): New define_expand.
     (*neg<mode>2_compare0): Rename to...
     (neg<mode>2_compare0): ... This.
     * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case):
     Move check for speed inside the if-then-elses.  Reflect
     CSNEG sequence in MOD by power of 2 case.

>
> Thanks,
> James
>


aarch64-mod-2.patch

commit de67e5fba716ce835f595c4167f57ec4faf61607
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Jul 15 17:01:13 2015 +0100

    [AArch64][1/3] Expand signed mod by power of 2 using CSNEG

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1394ed7..c8bd8d2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6652,6 +6652,25 @@ cost_plus:
       return true;
 
     case MOD:
+    /* We can expand signed mod by power of 2 using a
+       NEGS, two parallel ANDs and a CSNEG.  Assume here
+       that CSNEG is COSTS_N_INSNS (1).  This case should
+       only ever be reached through the set_smod_pow2_cheap check
+       in expmed.c.  */
+      if (CONST_INT_P (XEXP (x, 1))
+	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
+	  && (mode == SImode || mode == DImode))
+	{
+	  *cost += COSTS_N_INSNS (3);
+
+	  if (speed)
+	    *cost += 2 * extra_cost->alu.logical
+		     + extra_cost->alu.arith;
+
+	  return true;
+	}
+
+    /* Fall-through.  */
     case UMOD:
       if (speed)
 	{
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7b04c4..a515573 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -302,6 +302,62 @@ (define_expand "cmp<mode>"
   }
 )
 
+;; AArch64-specific expansion of signed mod by power of 2 using CSNEG.
+;; For x0 % n where n is a power of 2 produce:
+;; negs   x1, x0
+;; and    x0, x0, #(n - 1)
+;; and    x1, x1, #(n - 1)
+;; csneg  x0, x0, x1, mi
+
+(define_expand "mod<mode>3"
+  [(match_operand:GPI 0 "register_operand" "")
+   (match_operand:GPI 1 "register_operand" "")
+   (match_operand:GPI 2 "const_int_operand" "")]
+  ""
+  {
+    HOST_WIDE_INT val = INTVAL (operands[2]);
+
+    if (val <= 0
+       || exact_log2 (INTVAL (operands[2])) <= 0
+       || !aarch64_bitmask_imm (INTVAL (operands[2]) - 1, <MODE>mode))
+      FAIL;
+
+    rtx mask = GEN_INT (val - 1);
+
+    /* In the special case of x0 % 2 we can do the even shorter:
+	cmp     x0, xzr
+	and     x0, x0, 1
+	csneg   x0, x0, x0, ge.  */
+    if (val == 2)
+      {
+	rtx masked = gen_reg_rtx (<MODE>mode);
+	rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
+	emit_insn (gen_and<mode>3 (masked, operands[1], mask));
+	rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
+	emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
+	DONE;
+      }
+
+    rtx neg_op = gen_reg_rtx (<MODE>mode);
+    rtx_insn *insn = emit_insn (gen_neg<mode>2_compare0 (neg_op,
operands[1]));
+
+    /* Extract the condition register and mode.  */
+    rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
+    rtx cc_reg = SET_DEST (cmp);
+    rtx cond = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx);
+
+    rtx masked_pos = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_pos, operands[1], mask));
+
+    rtx masked_neg = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_neg, neg_op, mask));
+
+    emit_insn (gen_csneg3<mode>_insn (operands[0], cond,
+				       masked_neg, masked_pos));
+    DONE;
+  }
+)
+
 (define_insn "*condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
 			    [(match_operand 1 "cc_register" "") (const_int
0)])
@@ -2392,7 +2448,7 @@ (define_insn "*ngcsi_uxtw"
   [(set_attr "type" "adc_reg")]
 )
 
-(define_insn "*neg<mode>2_compare0"
+(define_insn "neg<mode>2_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ (neg:GPI (match_operand:GPI 1 "register_operand"
"r"))
 		       (const_int 0)))



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
  2015-08-20  8:24     ` Kyrill Tkachov
@ 2015-09-01  8:37       ` Kyrill Tkachov
  0 siblings, 0 replies; 11+ messages in thread
From: Kyrill Tkachov @ 2015-09-01  8:37 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

Ping.

Thanks,
Kyrill

On 20/08/15 09:24, Kyrill Tkachov wrote:
> Ping.
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00710.html
>
> Thanks,
> Kyrill
>
> On 03/08/15 14:01, James Greenhalgh wrote:
>> On Fri, Jul 24, 2015 at 11:55:33AM +0100, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> This patch implements an aarch64-specific expansion of the signed modulo
> by a power of 2.
>>> The proposed sequence makes use of the conditional negate instruction
> CSNEG.
>>> For a power of N, x % N can be calculated with:
>>> negs   x1, x0
>>> and    x0, x0, #(N - 1)
>>> and    x1, x1, #(N - 1)
>>> csneg  x0, x0, x1, mi
>>>
>>> So, for N == 256 this would be:
>>> negs   x1, x0
>>> and    x0, x0, #255
>>> and    x1, x1, #255
>>> csneg  x0, x0, x1, mi
>>>
>>> For comparison, the existing sequence emitted by expand_smod_pow2 in
> expmed.c is:
>>> asr     x1, x0, 63
>>> lsr     x1, x1, 56
>>> add     x0, x0, x1
>>> and     x0, x0, 255
>>> sub     x0, x0, x1
>>>
>>> Note that the CSNEG sequence is one instruction shorter and that the two
> and operations
>>> are independent, compared to the existing sequence where all instructions
> are dependent
>>> on the preceeding instructions.
>>>
>>> For the special case of N == 2 we can do even better:
>>> cmp     x0, xzr
>>> and     x0, x0, 1
>>> csneg   x0, x0, x0, ge
>>>
>>> I first tried implementing this in the generic code in expmed.c but that
> didn't work
>>> out for a few reasons:
>>>
>>> * This relies on having a conditional-negate instruction. We could gate
> it on
>>> HAVE_conditional_move and the combiner is capable of merging the final
> negate into
>>> the conditional move if a conditional negate is available (like on
> aarch64) but on
>>> targets without a conditional negate this would end up emitting a
> separate negate.
>>> * The first negs has to be a negs for the sequence to be a win i.e.
> having a separate
>>> negate and compare makes the sequence slower than the existing one (at
> least in my
>>> microbenchmarking) and I couldn't get subsequent passes to combine the
> negate and combine
>>> into the negs (presumably due to the use of the negated result in one of
> the ands).
>>> Doing it in the aarch64 backend where I could just call the exact gen_*
> functions that
>>> I need worked much more cleanly.
>>>
>>> The costing logic is updated to reflect this sequence during the
> intialisation of
>>> expmed.c where it calculates the smod_pow2_cheap metric.
>>>
>>> The tests will come in patch 3 of the series which are partly shared with
> the equivalent
>>> arm implementation.
>>>
>>> Bootstrapped and tested on aarch64.
>>> Ok for trunk?
>>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 9d88a60..7bb4a55 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -6639,8 +6639,26 @@ cost_plus:
>>>    	  if (VECTOR_MODE_P (mode))
>>>    	    *cost += extra_cost->vect.alu;
>>>    	  else if (GET_MODE_CLASS (mode) == MODE_INT)
>>> -	    *cost += (extra_cost->mult[mode == DImode].add
>>> -		      + extra_cost->mult[mode == DImode].idiv);
>>> +	    {
>>> +	      /* We can expand signed mod by power of 2 using a
>>> +		 NEGS, two parallel ANDs and a CSNEG.  Assume here
>>> +		 that CSNEG is COSTS_N_INSNS (1).  This case should
>>> +		 only ever be reached through the set_smod_pow2_cheap check
>>> +		 in expmed.c.  */
>>> +	      if (code == MOD
>>> +		  && CONST_INT_P (XEXP (x, 1))
>>> +		  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
>>> +		  && (mode == SImode || mode == DImode))
>>> +		{
>>> +		  *cost += COSTS_N_INSNS (3)
>>> +			   + 2 * extra_cost->alu.logical
>>> +			   + extra_cost->alu.arith;
>>> +		  return true;
>>> +		}
>>> +
>>> +	      *cost += (extra_cost->mult[mode == DImode].add
>>> +			+ extra_cost->mult[mode == DImode].idiv);
>>> +	    }
>>>    	  else if (mode == DFmode)
>>>    	    *cost += (extra_cost->fp[1].mult
>>>    		      + extra_cost->fp[1].div);
>> This looks like it calculates the wrong cost for !speed. I think we will
>> still expand through mod<mode>3 when compiling for size, so we probably
>> still want to cost the multiple instructions.
>>
>> Have I misunderstood?
> You're right, the logic needs a bit of wiggling to do the right
> thing. I've moved this case into a separate MOD case and added
> a gate on speed for the extra costs addition.
>
> Ok?
>
> Thanks,
> Kyrill
>
> 2015-08-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       * config/aarch64/aarch64.md (mod<mode>3): New define_expand.
>       (*neg<mode>2_compare0): Rename to...
>       (neg<mode>2_compare0): ... This.
>       * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case):
>       Move check for speed inside the if-then-elses.  Reflect
>       CSNEG sequence in MOD by power of 2 case.
>
>> Thanks,
>> James
>>
>
> aarch64-mod-2.patch
>
> commit de67e5fba716ce835f595c4167f57ec4faf61607
> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> Date:   Wed Jul 15 17:01:13 2015 +0100
>
>      [AArch64][1/3] Expand signed mod by power of 2 using CSNEG
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 1394ed7..c8bd8d2 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -6652,6 +6652,25 @@ cost_plus:
>         return true;
>   
>       case MOD:
> +    /* We can expand signed mod by power of 2 using a
> +       NEGS, two parallel ANDs and a CSNEG.  Assume here
> +       that CSNEG is COSTS_N_INSNS (1).  This case should
> +       only ever be reached through the set_smod_pow2_cheap check
> +       in expmed.c.  */
> +      if (CONST_INT_P (XEXP (x, 1))
> +	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
> +	  && (mode == SImode || mode == DImode))
> +	{
> +	  *cost += COSTS_N_INSNS (3);
> +
> +	  if (speed)
> +	    *cost += 2 * extra_cost->alu.logical
> +		     + extra_cost->alu.arith;
> +
> +	  return true;
> +	}
> +
> +    /* Fall-through.  */
>       case UMOD:
>         if (speed)
>   	{
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index b7b04c4..a515573 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -302,6 +302,62 @@ (define_expand "cmp<mode>"
>     }
>   )
>   
> +;; AArch64-specific expansion of signed mod by power of 2 using CSNEG.
> +;; For x0 % n where n is a power of 2 produce:
> +;; negs   x1, x0
> +;; and    x0, x0, #(n - 1)
> +;; and    x1, x1, #(n - 1)
> +;; csneg  x0, x0, x1, mi
> +
> +(define_expand "mod<mode>3"
> +  [(match_operand:GPI 0 "register_operand" "")
> +   (match_operand:GPI 1 "register_operand" "")
> +   (match_operand:GPI 2 "const_int_operand" "")]
> +  ""
> +  {
> +    HOST_WIDE_INT val = INTVAL (operands[2]);
> +
> +    if (val <= 0
> +       || exact_log2 (INTVAL (operands[2])) <= 0
> +       || !aarch64_bitmask_imm (INTVAL (operands[2]) - 1, <MODE>mode))
> +      FAIL;
> +
> +    rtx mask = GEN_INT (val - 1);
> +
> +    /* In the special case of x0 % 2 we can do the even shorter:
> +	cmp     x0, xzr
> +	and     x0, x0, 1
> +	csneg   x0, x0, x0, ge.  */
> +    if (val == 2)
> +      {
> +	rtx masked = gen_reg_rtx (<MODE>mode);
> +	rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
> +	emit_insn (gen_and<mode>3 (masked, operands[1], mask));
> +	rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
> +	emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
> +	DONE;
> +      }
> +
> +    rtx neg_op = gen_reg_rtx (<MODE>mode);
> +    rtx_insn *insn = emit_insn (gen_neg<mode>2_compare0 (neg_op,
> operands[1]));
> +
> +    /* Extract the condition register and mode.  */
> +    rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
> +    rtx cc_reg = SET_DEST (cmp);
> +    rtx cond = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx);
> +
> +    rtx masked_pos = gen_reg_rtx (<MODE>mode);
> +    emit_insn (gen_and<mode>3 (masked_pos, operands[1], mask));
> +
> +    rtx masked_neg = gen_reg_rtx (<MODE>mode);
> +    emit_insn (gen_and<mode>3 (masked_neg, neg_op, mask));
> +
> +    emit_insn (gen_csneg3<mode>_insn (operands[0], cond,
> +				       masked_neg, masked_pos));
> +    DONE;
> +  }
> +)
> +
>   (define_insn "*condjump"
>     [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
>   			    [(match_operand 1 "cc_register" "") (const_int
> 0)])
> @@ -2392,7 +2448,7 @@ (define_insn "*ngcsi_uxtw"
>     [(set_attr "type" "adc_reg")]
>   )
>   
> -(define_insn "*neg<mode>2_compare0"
> +(define_insn "neg<mode>2_compare0"
>     [(set (reg:CC_NZ CC_REGNUM)
>   	(compare:CC_NZ (neg:GPI (match_operand:GPI 1 "register_operand"
> "r"))
>   		       (const_int 0)))
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
  2015-08-13 12:41   ` Kyrill Tkachov
  2015-08-20  8:24     ` Kyrill Tkachov
@ 2015-09-01  9:25     ` James Greenhalgh
  2015-09-01 10:40       ` Kyrill Tkachov
  1 sibling, 1 reply; 11+ messages in thread
From: James Greenhalgh @ 2015-09-01  9:25 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

On Thu, Aug 13, 2015 at 01:36:50PM +0100, Kyrill Tkachov wrote:

Some comments below.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 1394ed7..c8bd8d2 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -6652,6 +6652,25 @@ cost_plus:
>        return true;
>  
>      case MOD:
> +    /* We can expand signed mod by power of 2 using a
> +       NEGS, two parallel ANDs and a CSNEG.  Assume here
> +       that CSNEG is COSTS_N_INSNS (1).  This case should

Why do we want to hardcode this assumption rather than parameterise? Even
if you model this as the cost of an unconditional NEG I think that is
better than hardcoding zero cost.


> +       only ever be reached through the set_smod_pow2_cheap check
> +       in expmed.c.  */
> +      if (CONST_INT_P (XEXP (x, 1))
> +	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
> +	  && (mode == SImode || mode == DImode))
> +	{
> +	  *cost += COSTS_N_INSNS (3);

Can you add am comment to make it clear why this is not off-by-one? By
quick inspection it looks like you have made a typo trying to set the
cost to be 3 instructions rather than 4 - a reader needs the extra
knowledge that we already have a COSTS_N_INSNS(1) as a baseline.

This would be clearer as:

  /* We will expand to four instructions, reset the baseline.  */
  *cost = COSTS_N_INSNS (4);

> +
> +	  if (speed)
> +	    *cost += 2 * extra_cost->alu.logical
> +		     + extra_cost->alu.arith;
> +
> +	  return true;
> +	}
> +
> +    /* Fall-through.  */
>      case UMOD:
>        if (speed)
>  	{
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index b7b04c4..a515573 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -302,6 +302,62 @@ (define_expand "cmp<mode>"
>    }
>  )
>  
> +;; AArch64-specific expansion of signed mod by power of 2 using CSNEG.

Seems a strange comment given that we are in aarch64.md :-).

> +;; For x0 % n where n is a power of 2 produce:
> +;; negs   x1, x0
> +;; and    x0, x0, #(n - 1)
> +;; and    x1, x1, #(n - 1)
> +;; csneg  x0, x0, x1, mi
> +
> +(define_expand "mod<mode>3"
> +  [(match_operand:GPI 0 "register_operand" "")
> +   (match_operand:GPI 1 "register_operand" "")
> +   (match_operand:GPI 2 "const_int_operand" "")]
> +  ""
> +  {
> +    HOST_WIDE_INT val = INTVAL (operands[2]);
> +
> +    if (val <= 0
> +       || exact_log2 (INTVAL (operands[2])) <= 0
> +       || !aarch64_bitmask_imm (INTVAL (operands[2]) - 1, <MODE>mode))
> +      FAIL;
> +
> +    rtx mask = GEN_INT (val - 1);
> +
> +    /* In the special case of x0 % 2 we can do the even shorter:
> +	cmp     x0, xzr
> +	and     x0, x0, 1
> +	csneg   x0, x0, x0, ge.  */
> +    if (val == 2)
> +      {
> +	rtx masked = gen_reg_rtx (<MODE>mode);
> +	rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);

Non-obvious why this is correct given the comment above saying we want GE.

> +	emit_insn (gen_and<mode>3 (masked, operands[1], mask));
> +	rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
> +	emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
> +	DONE;
> +      }
> +
> +    rtx neg_op = gen_reg_rtx (<MODE>mode);
> +    rtx_insn *insn = emit_insn (gen_neg<mode>2_compare0 (neg_op, operands[1]));
> +
> +    /* Extract the condition register and mode.  */
> +    rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
> +    rtx cc_reg = SET_DEST (cmp);
> +    rtx cond = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx);
> +
> +    rtx masked_pos = gen_reg_rtx (<MODE>mode);
> +    emit_insn (gen_and<mode>3 (masked_pos, operands[1], mask));
> +
> +    rtx masked_neg = gen_reg_rtx (<MODE>mode);
> +    emit_insn (gen_and<mode>3 (masked_neg, neg_op, mask));
> +
> +    emit_insn (gen_csneg3<mode>_insn (operands[0], cond,
> +				       masked_neg, masked_pos));
> +    DONE;
> +  }
> +)
> +

Thanks,
James

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
  2015-09-01  9:25     ` James Greenhalgh
@ 2015-09-01 10:40       ` Kyrill Tkachov
  2015-09-02 13:00         ` Kyrill Tkachov
  0 siblings, 1 reply; 11+ messages in thread
From: Kyrill Tkachov @ 2015-09-01 10:40 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

Hi James,

On 01/09/15 10:25, James Greenhalgh wrote:
> On Thu, Aug 13, 2015 at 01:36:50PM +0100, Kyrill Tkachov wrote:
>
> Some comments below.

Thanks, I'll incorporate them, with one clarification inline.

>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 1394ed7..c8bd8d2 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -6652,6 +6652,25 @@ cost_plus:
>>         return true;
>>   
>>       case MOD:
>> +    /* We can expand signed mod by power of 2 using a
>> +       NEGS, two parallel ANDs and a CSNEG.  Assume here
>> +       that CSNEG is COSTS_N_INSNS (1).  This case should
> Why do we want to hardcode this assumption rather than parameterise? Even
> if you model this as the cost of an unconditional NEG I think that is
> better than hardcoding zero cost.
>
>
>> +       only ever be reached through the set_smod_pow2_cheap check
>> +       in expmed.c.  */
>> +      if (CONST_INT_P (XEXP (x, 1))
>> +	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
>> +	  && (mode == SImode || mode == DImode))
>> +	{
>> +	  *cost += COSTS_N_INSNS (3);
> Can you add am comment to make it clear why this is not off-by-one? By
> quick inspection it looks like you have made a typo trying to set the
> cost to be 3 instructions rather than 4 - a reader needs the extra
> knowledge that we already have a COSTS_N_INSNS(1) as a baseline.
>
> This would be clearer as:
>
>    /* We will expand to four instructions, reset the baseline.  */
>    *cost = COSTS_N_INSNS (4);
>
>> +
>> +	  if (speed)
>> +	    *cost += 2 * extra_cost->alu.logical
>> +		     + extra_cost->alu.arith;
>> +
>> +	  return true;
>> +	}
>> +
>> +    /* Fall-through.  */
>>       case UMOD:
>>         if (speed)
>>   	{
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index b7b04c4..a515573 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -302,6 +302,62 @@ (define_expand "cmp<mode>"
>>     }
>>   )
>>   
>> +;; AArch64-specific expansion of signed mod by power of 2 using CSNEG.
> Seems a strange comment given that we are in aarch64.md :-).
>
>> +;; For x0 % n where n is a power of 2 produce:
>> +;; negs   x1, x0
>> +;; and    x0, x0, #(n - 1)
>> +;; and    x1, x1, #(n - 1)
>> +;; csneg  x0, x0, x1, mi
>> +
>> +(define_expand "mod<mode>3"
>> +  [(match_operand:GPI 0 "register_operand" "")
>> +   (match_operand:GPI 1 "register_operand" "")
>> +   (match_operand:GPI 2 "const_int_operand" "")]
>> +  ""
>> +  {
>> +    HOST_WIDE_INT val = INTVAL (operands[2]);
>> +
>> +    if (val <= 0
>> +       || exact_log2 (INTVAL (operands[2])) <= 0
>> +       || !aarch64_bitmask_imm (INTVAL (operands[2]) - 1, <MODE>mode))
>> +      FAIL;
>> +
>> +    rtx mask = GEN_INT (val - 1);
>> +
>> +    /* In the special case of x0 % 2 we can do the even shorter:
>> +	cmp     x0, xzr
>> +	and     x0, x0, 1
>> +	csneg   x0, x0, x0, ge.  */
>> +    if (val == 2)
>> +      {
>> +	rtx masked = gen_reg_rtx (<MODE>mode);
>> +	rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
> Non-obvious why this is correct given the comment above saying we want GE.

We want to negate if the comparison earlier yielded "less than zero".
Unfortunately, the CSNEG form of that is written as

csneg   x0, x0, x0, ge

which looks counter-intuitive at first glance.
With my other patch posted at https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00020.html
the generated assembly would be:
cneg x0, x0, lt
which more closely matches the RTL generation in this hunk.
If you think that patch should go in, I'll rewrite the CSNEG form in this comment to CNEG.

Kyrill



>
>> +	emit_insn (gen_and<mode>3 (masked, operands[1], mask));
>> +	rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
>> +	emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
>> +	DONE;
>> +      }
>> +
>> +    rtx neg_op = gen_reg_rtx (<MODE>mode);
>> +    rtx_insn *insn = emit_insn (gen_neg<mode>2_compare0 (neg_op, operands[1]));
>> +
>> +    /* Extract the condition register and mode.  */
>> +    rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
>> +    rtx cc_reg = SET_DEST (cmp);
>> +    rtx cond = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx);
>> +
>> +    rtx masked_pos = gen_reg_rtx (<MODE>mode);
>> +    emit_insn (gen_and<mode>3 (masked_pos, operands[1], mask));
>> +
>> +    rtx masked_neg = gen_reg_rtx (<MODE>mode);
>> +    emit_insn (gen_and<mode>3 (masked_neg, neg_op, mask));
>> +
>> +    emit_insn (gen_csneg3<mode>_insn (operands[0], cond,
>> +				       masked_neg, masked_pos));
>> +    DONE;
>> +  }
>> +)
>> +
> Thanks,
> James
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
  2015-09-01 10:40       ` Kyrill Tkachov
@ 2015-09-02 13:00         ` Kyrill Tkachov
  2015-09-08  8:42           ` James Greenhalgh
  0 siblings, 1 reply; 11+ messages in thread
From: Kyrill Tkachov @ 2015-09-02 13:00 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 5416 bytes --]


On 01/09/15 11:40, Kyrill Tkachov wrote:
> Hi James,
>
> On 01/09/15 10:25, James Greenhalgh wrote:
>> On Thu, Aug 13, 2015 at 01:36:50PM +0100, Kyrill Tkachov wrote:
>>
>> Some comments below.
> Thanks, I'll incorporate them, with one clarification inline.

And here's the updated patch.

Thanks,
Kyrill

2015-09-02  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

      * config/aarch64/aarch64.md (mod<mode>3): New define_expand.
      (*neg<mode>2_compare0): Rename to...
      (neg<mode>2_compare0): ... This.
      * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case):
      Move check for speed inside the if-then-elses.  Reflect
      CSNEG sequence in MOD by power of 2 case.

>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 1394ed7..c8bd8d2 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -6652,6 +6652,25 @@ cost_plus:
>>>          return true;
>>>    
>>>        case MOD:
>>> +    /* We can expand signed mod by power of 2 using a
>>> +       NEGS, two parallel ANDs and a CSNEG.  Assume here
>>> +       that CSNEG is COSTS_N_INSNS (1).  This case should
>> Why do we want to hardcode this assumption rather than parameterise? Even
>> if you model this as the cost of an unconditional NEG I think that is
>> better than hardcoding zero cost.
>>
>>
>>> +       only ever be reached through the set_smod_pow2_cheap check
>>> +       in expmed.c.  */
>>> +      if (CONST_INT_P (XEXP (x, 1))
>>> +	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
>>> +	  && (mode == SImode || mode == DImode))
>>> +	{
>>> +	  *cost += COSTS_N_INSNS (3);
>> Can you add am comment to make it clear why this is not off-by-one? By
>> quick inspection it looks like you have made a typo trying to set the
>> cost to be 3 instructions rather than 4 - a reader needs the extra
>> knowledge that we already have a COSTS_N_INSNS(1) as a baseline.
>>
>> This would be clearer as:
>>
>>     /* We will expand to four instructions, reset the baseline.  */
>>     *cost = COSTS_N_INSNS (4);
>>
>>> +
>>> +	  if (speed)
>>> +	    *cost += 2 * extra_cost->alu.logical
>>> +		     + extra_cost->alu.arith;
>>> +
>>> +	  return true;
>>> +	}
>>> +
>>> +    /* Fall-through.  */
>>>        case UMOD:
>>>          if (speed)
>>>    	{
>>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>>> index b7b04c4..a515573 100644
>>> --- a/gcc/config/aarch64/aarch64.md
>>> +++ b/gcc/config/aarch64/aarch64.md
>>> @@ -302,6 +302,62 @@ (define_expand "cmp<mode>"
>>>      }
>>>    )
>>>    
>>> +;; AArch64-specific expansion of signed mod by power of 2 using CSNEG.
>> Seems a strange comment given that we are in aarch64.md :-).
>>
>>> +;; For x0 % n where n is a power of 2 produce:
>>> +;; negs   x1, x0
>>> +;; and    x0, x0, #(n - 1)
>>> +;; and    x1, x1, #(n - 1)
>>> +;; csneg  x0, x0, x1, mi
>>> +
>>> +(define_expand "mod<mode>3"
>>> +  [(match_operand:GPI 0 "register_operand" "")
>>> +   (match_operand:GPI 1 "register_operand" "")
>>> +   (match_operand:GPI 2 "const_int_operand" "")]
>>> +  ""
>>> +  {
>>> +    HOST_WIDE_INT val = INTVAL (operands[2]);
>>> +
>>> +    if (val <= 0
>>> +       || exact_log2 (INTVAL (operands[2])) <= 0
>>> +       || !aarch64_bitmask_imm (INTVAL (operands[2]) - 1, <MODE>mode))
>>> +      FAIL;
>>> +
>>> +    rtx mask = GEN_INT (val - 1);
>>> +
>>> +    /* In the special case of x0 % 2 we can do the even shorter:
>>> +	cmp     x0, xzr
>>> +	and     x0, x0, 1
>>> +	csneg   x0, x0, x0, ge.  */
>>> +    if (val == 2)
>>> +      {
>>> +	rtx masked = gen_reg_rtx (<MODE>mode);
>>> +	rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
>> Non-obvious why this is correct given the comment above saying we want GE.
> We want to negate if the comparison earlier yielded "less than zero".
> Unfortunately, the CSNEG form of that is written as
>
> csneg   x0, x0, x0, ge
>
> which looks counter-intuitive at first glance.
> With my other patch posted at https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00020.html
> the generated assembly would be:
> cneg x0, x0, lt
> which more closely matches the RTL generation in this hunk.
> If you think that patch should go in, I'll rewrite the CSNEG form in this comment to CNEG.
>
> Kyrill
>
>
>
>>> +	emit_insn (gen_and<mode>3 (masked, operands[1], mask));
>>> +	rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
>>> +	emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
>>> +	DONE;
>>> +      }
>>> +
>>> +    rtx neg_op = gen_reg_rtx (<MODE>mode);
>>> +    rtx_insn *insn = emit_insn (gen_neg<mode>2_compare0 (neg_op, operands[1]));
>>> +
>>> +    /* Extract the condition register and mode.  */
>>> +    rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
>>> +    rtx cc_reg = SET_DEST (cmp);
>>> +    rtx cond = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx);
>>> +
>>> +    rtx masked_pos = gen_reg_rtx (<MODE>mode);
>>> +    emit_insn (gen_and<mode>3 (masked_pos, operands[1], mask));
>>> +
>>> +    rtx masked_neg = gen_reg_rtx (<MODE>mode);
>>> +    emit_insn (gen_and<mode>3 (masked_neg, neg_op, mask));
>>> +
>>> +    emit_insn (gen_csneg3<mode>_insn (operands[0], cond,
>>> +				       masked_neg, masked_pos));
>>> +    DONE;
>>> +  }
>>> +)
>>> +
>> Thanks,
>> James
>>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aarch64-mod-2.patch --]
[-- Type: text/x-patch; name=aarch64-mod-2.patch, Size: 3663 bytes --]

commit 534676af75436d7e11865403bdd231fadc7c19aa
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Jul 15 17:01:13 2015 +0100

    [AArch64][1/3] Expand signed mod by power of 2 using CSNEG

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ce6e116..c97cd95 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6763,6 +6763,25 @@ cost_plus:
       return true;
 
     case MOD:
+    /* We can expand signed mod by power of 2 using a NEGS, two parallel
+       ANDs and a CSNEG.  Assume here that CSNEG is the same as the cost of
+       an unconditional negate.  This case should only ever be reached through
+       the set_smod_pow2_cheap check in expmed.c.  */
+      if (CONST_INT_P (XEXP (x, 1))
+	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
+	  && (mode == SImode || mode == DImode))
+	{
+	  /* We expand to 4 instructions.  Reset the baseline.  */
+	  *cost = COSTS_N_INSNS (4);
+
+	  if (speed)
+	    *cost += 2 * extra_cost->alu.logical
+		     + 2 * extra_cost->alu.arith;
+
+	  return true;
+	}
+
+    /* Fall-through.  */
     case UMOD:
       if (speed)
 	{
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 2522982..594bef0 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -313,6 +313,62 @@ (define_expand "cmp<mode>"
   }
 )
 
+;; Expansion of signed mod by a power of 2 using CSNEG.
+;; For x0 % n where n is a power of 2 produce:
+;; negs   x1, x0
+;; and    x0, x0, #(n - 1)
+;; and    x1, x1, #(n - 1)
+;; csneg  x0, x0, x1, mi
+
+(define_expand "mod<mode>3"
+  [(match_operand:GPI 0 "register_operand" "")
+   (match_operand:GPI 1 "register_operand" "")
+   (match_operand:GPI 2 "const_int_operand" "")]
+  ""
+  {
+    HOST_WIDE_INT val = INTVAL (operands[2]);
+
+    if (val <= 0
+       || exact_log2 (val) <= 0
+       || !aarch64_bitmask_imm (val - 1, <MODE>mode))
+      FAIL;
+
+    rtx mask = GEN_INT (val - 1);
+
+    /* In the special case of x0 % 2 we can do the even shorter:
+	cmp    x0, xzr
+	and    x0, x0, 1
+	cneg   x0, x0, lt.  */
+    if (val == 2)
+      {
+	rtx masked = gen_reg_rtx (<MODE>mode);
+	rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
+	emit_insn (gen_and<mode>3 (masked, operands[1], mask));
+	rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
+	emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
+	DONE;
+      }
+
+    rtx neg_op = gen_reg_rtx (<MODE>mode);
+    rtx_insn *insn = emit_insn (gen_neg<mode>2_compare0 (neg_op, operands[1]));
+
+    /* Extract the condition register and mode.  */
+    rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
+    rtx cc_reg = SET_DEST (cmp);
+    rtx cond = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx);
+
+    rtx masked_pos = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_pos, operands[1], mask));
+
+    rtx masked_neg = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_neg, neg_op, mask));
+
+    emit_insn (gen_csneg3<mode>_insn (operands[0], cond,
+				       masked_neg, masked_pos));
+    DONE;
+  }
+)
+
 (define_insn "*condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
 			    [(match_operand 1 "cc_register" "") (const_int 0)])
@@ -2457,7 +2513,7 @@ (define_insn "*ngcsi_uxtw"
   [(set_attr "type" "adc_reg")]
 )
 
-(define_insn "*neg<mode>2_compare0"
+(define_insn "neg<mode>2_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ (neg:GPI (match_operand:GPI 1 "register_operand" "r"))
 		       (const_int 0)))

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
  2015-09-02 13:00         ` Kyrill Tkachov
@ 2015-09-08  8:42           ` James Greenhalgh
  0 siblings, 0 replies; 11+ messages in thread
From: James Greenhalgh @ 2015-09-08  8:42 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

On Wed, Sep 02, 2015 at 02:00:23PM +0100, Kyrill Tkachov wrote:
> 
> On 01/09/15 11:40, Kyrill Tkachov wrote:
> > Hi James,
> >
> > On 01/09/15 10:25, James Greenhalgh wrote:
> >> On Thu, Aug 13, 2015 at 01:36:50PM +0100, Kyrill Tkachov wrote:
> >>
> >> Some comments below.
> > Thanks, I'll incorporate them, with one clarification inline.
> 
> And here's the updated patch.

OK.

Thanks,
James

> 2015-09-02  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>       * config/aarch64/aarch64.md (mod<mode>3): New define_expand.
>       (*neg<mode>2_compare0): Rename to...
>       (neg<mode>2_compare0): ... This.
>       * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case):
>       Move check for speed inside the if-then-elses.  Reflect
>       CSNEG sequence in MOD by power of 2 case.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-09-08  8:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-24 10:55 [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG Kyrill Tkachov
2015-07-31  8:43 ` Kyrill Tkachov
2015-07-31  9:26 ` Andrew Pinski
2015-08-03 13:01 ` James Greenhalgh
2015-08-13 12:41   ` Kyrill Tkachov
2015-08-20  8:24     ` Kyrill Tkachov
2015-09-01  8:37       ` Kyrill Tkachov
2015-09-01  9:25     ` James Greenhalgh
2015-09-01 10:40       ` Kyrill Tkachov
2015-09-02 13:00         ` Kyrill Tkachov
2015-09-08  8:42           ` James Greenhalgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).