public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] [Aarch64] Variable shift count truncation issues
@ 2017-05-19 12:25 Wilco Dijkstra
  2017-05-23 14:30 ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Wilco Dijkstra @ 2017-05-19 12:25 UTC (permalink / raw)
  To: richard.sandiford, Michael Collison; +Cc: GCC Patches, nd

Richard Sandiford wrote:

> Insn patterns shouldn't check can_create_pseudo_p, because there's no
> guarantee that the associated split happens before RA.  In this case it
> should be safe to reuse operand 0 after RA if you change it to:

The goal is to only create and split this pattern before register allocation.
It's a transient pattern, combine creates it, and it is split immediately after.

Maybe !reload_completed is clearer as that is what several other
define_insn_and_split patterns do?

Wilco

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [PATCH] [Aarch64] Variable shift count truncation issues
@ 2017-06-22 10:22 James Greenhalgh
  0 siblings, 0 replies; 15+ messages in thread
From: James Greenhalgh @ 2017-06-22 10:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

Resending for the list, as the last copy got bounced.

Thanks,
James

----- Forwarded message from James Greenhalgh <james.greenhalgh@arm.com> -----

Date: Thu, 22 Jun 2017 11:16:38 +0100
From: James Greenhalgh <james.greenhalgh@arm.com>
To: Michael Collison <Michael.Collison@arm.com>, Wilco Dijkstra <Wilco.Dijkstra@arm.com>, Christophe Lyon <christophe.lyon@linaro.org>, GCC Patches <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>, richard.sandiford@linaro.org
Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues
User-Agent: Mutt/1.5.21 (2010-09-15)

On Wed, Jun 21, 2017 at 04:42:00PM +0100, Richard Sandiford wrote:
> Michael Collison <Michael.Collison@arm.com> writes:
> > Updated the patch per Richard's suggestions to allow scheduling of
> > instructions before reload.
> 
> Thanks, this looks good to me FWIW, but obviously I can't approve it.

Thanks for the review Richard, that gives me good confidence in this patch.

I have a few comments below, which are closer to nitpicking than structural
issues with the patch.

With those fixed, this is OK to commit.

> > 2017-05-22  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> > 	    Michael Collison <michael.collison@arm.com>

With the work you've done, you can probably place yourself first on the
ChangeLog now ;)

> >
> > 	PR target/70119
> > 	* config/aarch64/aarch64.md (*aarch64_<optab>_reg_<mode>3_mask1):
> > 	New pattern.
> > 	(*aarch64_reg_<mode>3_neg_mask2): New pattern.
> > 	(*aarch64_reg_<mode>3_minus_mask): New pattern.
> > 	(*aarch64_<optab>_reg_di3_mask2): New pattern.
> > 	* config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost
> > 	of shift when the shift amount is masked with constant equal to
> > 	the size of the mode.
> > 	* config/aarch64/predicates.md (subreg_lowpart_operator): New
> > 	predicate.
> >
> >
> > 2016-05-22  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> > 	    Michael Collison <michael.collison@arm.com>
> >
> > 	PR target/70119
> > 	* gcc.target/aarch64/var_shift_mask_1.c: New test.


> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index d89df66..ff5720c 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -3942,6 +3942,97 @@
> >    }
> >  )
> >  
> > +;; When the LSL, LSR, ASR, ROR instructions operate on all register arguments
> > +;; they truncate the shift/rotate amount by the size of the registers they
> > +;; operate on: 32 for W-regs, 63 for X-regs.  This allows us to optimise away

Is this "63" a typo? Should it be 64?

> > +;; such redundant masking instructions.  GCC can do that automatically when
> > +;; SHIFT_COUNT_TRUNCATED is true, but we can't enable it for TARGET_SIMD
> > +;; because some of the SISD shift alternatives don't perform this truncations.
> > +;; So this pattern exists to catch such cases.
> > +
> > +(define_insn "*aarch64_<optab>_reg_<mode>3_mask1"
> > +  [(set (match_operand:GPI 0 "register_operand" "=r")
> > +	(SHIFT:GPI
> > +	  (match_operand:GPI 1 "register_operand" "r")
> > +	  (match_operator 4 "subreg_lowpart_operator"
> > +	   [(and:GPI (match_operand:GPI 2 "register_operand" "r")
> > +		     (match_operand 3 "const_int_operand" "n"))])))]
> > +  "(~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0"

Spaces around "-"

> > +  "<shift>\t%<w>0, %<w>1, %<w>2"
> > +  [(set_attr "type" "shift_reg")]
> > +)
> > +
> > +(define_insn_and_split "*aarch64_reg_<mode>3_neg_mask2"
> > +  [(set (match_operand:GPI 0 "register_operand" "=&r")
> > +	(SHIFT:GPI
> > +	  (match_operand:GPI 1 "register_operand" "r")
> > +	  (match_operator 4 "subreg_lowpart_operator"
> > +	  [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r")
> > +			   (match_operand 3 "const_int_operand" "n")))])))]
> > +  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0)"
> > +  "#"
> > +  "&& 1"

I'd prefer "true" to "1"

> > +  [(const_int 0)]
> > +  {
> > +    rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode)
> > +	       : operands[0]);
> > +    emit_insn (gen_negsi2 (tmp, operands[2]));
> > +
> > +    rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]);
> > +    rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
> > +				     SUBREG_BYTE (operands[4]));
> > +    emit_insn (gen_<optab><mode>3 (operands[0], operands[1], subreg_tmp));
> > +    DONE;
> > +  }
> > +)
> > +
> > +(define_insn_and_split "*aarch64_reg_<mode>3_minus_mask"
> > +  [(set (match_operand:GPI 0 "register_operand" "=&r")
> > +	(ashift:GPI
> > +	  (match_operand:GPI 1 "register_operand" "r")
> > +	  (minus:QI (match_operand 2 "const_int_operand" "n")
> > +		    (match_operator 5 "subreg_lowpart_operator"
> > +		    [(and:SI (match_operand:SI 3 "register_operand" "r")
> > +			     (match_operand 4 "const_int_operand" "n"))]))))]
> > +  "((~INTVAL (operands[4]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0)
> > +   && INTVAL (operands[2]) == GET_MODE_BITSIZE (<MODE>mode)"
> > +  "#"
> > +  "&& 1"

Likewise.

Thanks,
James


----- End forwarded message -----

^ permalink raw reply	[flat|nested] 15+ messages in thread
* [PATCH] [Aarch64] Variable shift count truncation issues
@ 2017-05-18 23:17 Michael Collison
  2017-05-19  7:51 ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Collison @ 2017-05-18 23:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1806 bytes --]

This patch improves code generation for shifts with and operations that can be
omitted based on the size of the operation. AArch64 variable shifts only use the
low 5-6 bits so masking operations that clear higher bits can be removed. When
the shift instructions operate on all register arguments they truncate the shift/rotate
amount by the size of the registers they operate on: 32 for W-regs, 63 for X-regs.
This allows us to optimize away such redundant masking instructions.

The patch only applies to shifts on integer instructions; vector shifts are excluded.

unsigned int f1
(unsigned int x, int y)
{
  return x << (y & 31);
}

unsigned long
f3 (unsigned long bit_addr)
{
  unsigned long bitnumb = bit_addr & 63;
  return (1L << bitnumb);
}

With trunk at -O2 we generate:

f1:
	and	w1, w1, 31
	lsl	w0, w0, w1
	ret
	.size	f1, .-f1
	.align	2
	.p2align 3,,7
	.global	f3
	.type	f3, %function
f3:
	and	x0, x0, 63
	mov	x1, 1
	lsl	x0, x1, x0
	ret

with the patch we generate:

f1:
	lsl	w0, w0, w1
	ret

f3:
	mov	x1, 1
	lsl	x0, x1, x0
	ret


Okay for trunk?

2017-05-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
	    Michael Collison <michael.collison@arm.com>

	PR target/70119
	* config/aarch64/aarch64.md (*aarch64_<optab>_reg_<mode>3_mask1):
	New pattern.
	(*aarch64_reg_<mode>3_neg_mask2): New pattern.
	(*aarch64_reg_<mode>3_minus_mask): New pattern.
	(*aarch64_<optab>_reg_di3_mask2): New pattern.
	* config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost
	of shift when the shift amount is masked with constant equal to
	the size of the mode.


2017-05-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
	    Michael Collison <michael.collison@arm.com>

	PR target/70119
	* gcc.target/aarch64/var_shift_mask_1.c: New test.

[-- Attachment #2: pr70119.patch --]
[-- Type: application/octet-stream, Size: 7119 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 714bb79..6ff2e22 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7347,17 +7347,26 @@ cost_plus:
         }
       else
         {
-	  if (speed)
+	  if (VECTOR_MODE_P (mode))
 	    {
-	      if (VECTOR_MODE_P (mode))
-		{
-		  /* Vector shift (register).  */
-		  *cost += extra_cost->vect.alu;
-		}
-	      else
+	      if (speed)
+		/* Vector shift (register).  */
+		*cost += extra_cost->vect.alu;
+	    }
+	  else
+	    {
+	      if (speed)
+		/* LSLV.  */
+		*cost += extra_cost->alu.shift_reg;
+
+	      if (GET_CODE (op1) == AND && REG_P (XEXP (op1, 0))
+		  && CONST_INT_P (XEXP (op1, 1))
+		  && INTVAL (XEXP (op1, 1)) == GET_MODE_BITSIZE (mode) - 1)
 		{
-		  /* LSLV.  */
-		  *cost += extra_cost->alu.shift_reg;
+		  *cost += rtx_cost (op0, mode, (rtx_code) code, 0, speed);
+		  /* We already demanded XEXP (op1, 0) to be REG_P, so
+		     don't recurse into it.  */
+		  return true;
 		}
 	    }
 	  return false;  /* All arguments need to be in registers.  */
@@ -7386,14 +7395,27 @@ cost_plus:
 	}
       else
 	{
-
-	  /* ASR (register) and friends.  */
-	  if (speed)
+	  if (VECTOR_MODE_P (mode))
 	    {
-	      if (VECTOR_MODE_P (mode))
+	      if (speed)
+		/* Vector shift (register).  */
 		*cost += extra_cost->vect.alu;
-	      else
+	    }
+	  else
+	    {
+	      if (speed)
+		/* ASR (register) and friends.  */
 		*cost += extra_cost->alu.shift_reg;
+
+	      if (GET_CODE (op1) == AND && REG_P (XEXP (op1, 0))
+		  && CONST_INT_P (XEXP (op1, 1))
+		  && INTVAL (XEXP (op1, 1)) == GET_MODE_BITSIZE (mode) - 1)
+		{
+		  *cost += rtx_cost (op0, mode, (rtx_code) code, 0, speed);
+		  /* We already demanded XEXP (op1, 0) to be REG_P, so
+		     don't recurse into it.  */
+		  return true;
+		}
 	    }
 	  return false;  /* All arguments need to be in registers.  */
 	}
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5adc5ed..c6ae670 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3999,6 +3999,92 @@
   }
 )
 
+;; When the LSL, LSR, ASR, ROR instructions operate on all register arguments
+;; they truncate the shift/rotate amount by the size of the registers they
+;; operate on: 32 for W-regs, 63 for X-regs.  This allows us to optimise away
+;; such redundant masking instructions.  GCC can do that automatically when
+;; SHIFT_COUNT_TRUNCATED is true, but we can't enable it for TARGET_SIMD
+;; because some of the SISD shift alternatives don't perform this truncations.
+;; So this pattern exists to catch such cases.
+
+(define_insn "*aarch64_<optab>_reg_<mode>3_mask1"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(SHIFT:GPI
+	  (match_operand:GPI 1 "register_operand" "r")
+	  (subreg:QI (and:GPI
+		      (match_operand:GPI 2 "register_operand" "r")
+		      (match_operand 3 "const_int_operand" "n")) 0)))]
+  "(~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0"
+  "<shift>\t%<w>0, %<w>1, %<w>2"
+  [(set_attr "type" "shift_reg")]
+)
+
+
+(define_insn_and_split "*aarch64_reg_<mode>3_neg_mask2"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(SHIFT:GPI
+	  (match_operand:GPI 1 "register_operand" "r")
+	  (subreg:QI (neg:SI (and:SI
+			      (match_operand:SI 2 "register_operand" "r")
+			      (match_operand 3 "const_int_operand" "n"))) 0)))]
+  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0)
+   && can_create_pseudo_p ()"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+    rtx tmp = gen_reg_rtx (SImode);
+
+    emit_insn (gen_negsi2 (tmp, operands[2]));
+    rtx tmp2 = simplify_gen_subreg (QImode, tmp, SImode, 0);
+    emit_insn (gen_<optab><mode>3 (operands[0], operands[1], tmp2));
+    DONE;
+  }
+)
+
+(define_insn_and_split "*aarch64_reg_<mode>3_minus_mask"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(ashift:GPI
+	  (match_operand:GPI 1 "register_operand" "r")
+	  (minus:QI (match_operand 2 "const_int_operand" "n")
+		    (subreg:QI (and:SI
+				(match_operand:SI 3 "register_operand" "r")
+				(match_operand 4 "const_int_operand" "n")) 0))))]
+  "((~INTVAL (operands[4]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0)
+   && can_create_pseudo_p ()
+   && INTVAL (operands[2]) == GET_MODE_BITSIZE (<MODE>mode)"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+    rtx tmp = gen_reg_rtx (SImode);
+
+    emit_insn (gen_negsi2 (tmp, operands[3]));
+    rtx tmp2 = simplify_gen_subreg (QImode, tmp, SImode, 0);
+    emit_insn (gen_ashl<mode>3 (operands[0], operands[1], tmp2));
+    DONE;
+  }
+)
+
+(define_insn_and_split "*aarch64_<optab>_reg_di3_mask2"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(SHIFT:DI
+	  (match_operand:DI 1 "register_operand" "r")
+	  (subreg:QI (and:SI
+		      (match_operand:SI 2 "register_operand" "r")
+		      (match_operand 3 "const_int_operand" "n")) 0)))]
+  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (DImode)-1)) == 0)
+   && can_create_pseudo_p ()"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+    rtx tmp = simplify_gen_subreg (QImode, operands[2], SImode, 0);
+    emit_insn (gen_<optab>di3 (operands[0], operands[1], tmp));
+    DONE;
+  }
+)
+
 ;; Logical left shift using SISD or Integer instruction
 (define_insn "*aarch64_ashl_sisd_or_int_<mode>3"
   [(set (match_operand:GPI 0 "register_operand" "=r,r,w,w")
diff --git a/gcc/testsuite/gcc.target/aarch64/var_shift_mask_1.c b/gcc/testsuite/gcc.target/aarch64/var_shift_mask_1.c
new file mode 100644
index 0000000..e2b020e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/var_shift_mask_1.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* The integer variable shift and rotate instructions truncate their
+   shift amounts by the datasize.  Make sure that we don't emit a redundant
+   masking operation.  */
+
+unsigned
+f1 (unsigned x, int y)
+{
+  return x << (y & 31);
+}
+
+unsigned long
+f2 (unsigned long x, int y)
+{
+  return x << (y & 63);
+}
+
+unsigned long
+f3 (unsigned long bit_addr, int y)
+{
+  unsigned long bitnumb = bit_addr & 63;
+  return (1L << bitnumb);
+}
+
+unsigned int
+f4 (unsigned int x, unsigned int y)
+{
+  y &= 31;
+  return x >> y | (x << (32 - y));
+}
+
+unsigned long
+f5 (unsigned long x, unsigned long y)
+{
+  y &= 63;
+  return x >> y | (x << (64 - y));
+}
+
+unsigned long
+f6 (unsigned long x, unsigned long y)
+{
+
+  return (x << (64 - (y & 63)));
+
+}
+
+unsigned long
+f7 (unsigned long x, unsigned long y)
+{
+  return (x << -(y & 63));
+}
+
+/* { dg-final { scan-assembler-times "lsl\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "lsl\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" 4 } } */
+/* { dg-final { scan-assembler-times "ror\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "ror\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-not "and\tw\[0-9\]+, w\[0-9\]+, 31" } } */
+/* { dg-final { scan-assembler-not "and\tx\[0-9\]+, x\[0-9\]+, 63" } } */
+/* { dg-final { scan-assembler-not "sub\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */
-- 
1.9.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-06-30 18:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-19 12:25 [PATCH] [Aarch64] Variable shift count truncation issues Wilco Dijkstra
2017-05-23 14:30 ` Richard Sandiford
2017-05-23 23:16   ` Michael Collison
2017-06-15  8:06     ` Michael Collison
2017-06-15 13:39       ` Richard Sandiford
2017-06-21  8:32         ` Michael Collison
2017-06-21 15:42           ` Richard Sandiford
     [not found]             ` <20170622101638.GA30708@arm.com>
2017-06-23  9:28               ` Michael Collison
2017-06-23  9:30                 ` James Greenhalgh
2017-06-30 18:04                 ` Andreas Schwab
2017-06-30 18:54                   ` Michael Collison
  -- strict thread matches above, loose matches on Subject: below --
2017-06-22 10:22 James Greenhalgh
2017-05-18 23:17 Michael Collison
2017-05-19  7:51 ` Richard Sandiford
2017-05-19 11:03   ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).