Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
@ 2019-02-05 21:12 Wilco Dijkstra
  2019-02-06 23:42 ` Steve Ellcey
  0 siblings, 1 reply; 21+ messages in thread
From: Wilco Dijkstra @ 2019-02-05 21:12 UTC (permalink / raw)
  To: Steve Ellcey, GCC Patches; +Cc: nd

Hi Steve,

Thanks for looking at this. A few comments on the patch:

+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+				   unsigned HOST_WIDE_INT mask1,
+				   unsigned HOST_WIDE_INT shft_amnt,
+				   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if ((mask1 + mask2) != HOST_WIDE_INT_M1U)
+    return false;
+  if ((mask1 & mask2) != 0)
+    return false;

Why not check mask1 == ~mask2? That's much simpler...

+  /* Verify that the shift amount is less than the mode size.  */
+  if (shft_amnt >= GET_MODE_BITSIZE (mode))
+    return false;

The md pattern already guarantees this (this could be an assert). We need
to check that  shift+width <= mode size. Given immediates are limited to
the mode size, the easiest way is to check mask2 is not 0 or M1.

+  /* Verify that the mask being shifted is contiguous and would be in the
+     least significant bits after shifting by creating a mask 't' based on
+     the number of bits set in mask2 and the shift amount for mask2 and
+     comparing that to the actual mask2.  */
+  t = popcount_hwi (mask2);
+  t = (HOST_WIDE_INT_1U << t) - 1;
+  t = t << shft_amnt;
+  if (t != mask2)
+    return false;
+  
+  return true;

This would return true if mask2 == 0 or M1 (I think this can't happen with
current md patterns, but this would emit an illegal bfi).

After special cases you could do something like t = mask2 + (HWI_1U << shift);
return t == (t & -t) to check for a valid bfi.

+  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"

This could emit a width that may be 32 too large in SImode if bit 31 is set
(there is another use of %P in aarch64.md which may have the same issue).

Finally from some quick testing it seems one case is not yet supported:

int f1(int x, int y)
{
  return (y & 0xfffffff) | (((x <<28) & 0xf0000000));
}

This just has a shift, rather than an AND plus a shift.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-02-05 21:12 [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64 Wilco Dijkstra
@ 2019-02-06 23:42 ` Steve Ellcey
  2019-02-07 18:13   ` Wilco Dijkstra
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Ellcey @ 2019-02-06 23:42 UTC (permalink / raw)
  To: sellcey, gcc-patches, Wilco.Dijkstra; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 3381 bytes --]

On Tue, 2019-02-05 at 21:12 +0000, Wilco Dijkstra wrote:

> +bool
> +aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
> +                                  unsigned HOST_WIDE_INT mask1,
> +                                  unsigned HOST_WIDE_INT shft_amnt,
> +                                  unsigned HOST_WIDE_INT mask2)
> +{
> +  unsigned HOST_WIDE_INT t;
> +
> +  /* Verify that there is no overlap in what bits are set in the two
> masks.  */
> +  if ((mask1 + mask2) != HOST_WIDE_INT_M1U)
> +    return false;
> +  if ((mask1 & mask2) != 0)
> +    return false;
> 
> Why not check mask1 == ~mask2? That's much simpler...

Yes that would be simpler.  I made that change.
> 
> +  /* Verify that the shift amount is less than the mode size.  */
> +  if (shft_amnt >= GET_MODE_BITSIZE (mode))
> +    return false;
> 
> The md pattern already guarantees this (this could be an assert). We need
> to check that  shift+width <= mode size. Given immediates are limited to
> the mode size, the easiest way is to check mask2 is not 0 or M1.

OK, I changed the if statement to a gcc_assert and added a check to
make sure mask2 is not 0 or M1.

> +  /* Verify that the mask being shifted is contiguous and would be in the
> +     least significant bits after shifting by creating a mask 't' based on
> +     the number of bits set in mask2 and the shift amount for mask2 and
> +     comparing that to the actual mask2.  */
> +  t = popcount_hwi (mask2);
> +  t = (HOST_WIDE_INT_1U << t) - 1;
> +  t = t << shft_amnt;
> +  if (t != mask2)
> +    return false;
> +
> +  return true;
> 
> This would return true if mask2 == 0 or M1 (I think this can't happen with
> current md patterns, but this would emit an illegal bfi).
> 
> After special cases you could do something like t = mask2 + (HWI_1U << shift);
> return t == (t & -t) to check for a valid bfi.

I am not sure I follow this logic and my attempts to use this did not
work so I kept my original code.

> +  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"
> 
> This could emit a width that may be 32 too large in SImode if bit 31 is set
> (there is another use of %P in aarch64.md which may have the same
> issue).

I am not sure why having bit 31 set would be a problem.  Sign
extension?

> Finally from some quick testing it seems one case is not yet
> supported:
> 
> int f1(int x, int y)
> {
>   return (y & 0xfffffff) | (((x <<28) & 0xf0000000));
> }
> 
> This just has a shift, rather than an AND plus a shift.

I added another instruction to handle this and added a new test to
check for it.

Steve Ellcey
sellcey@marvell.com


Here is the latest version of the patch.


2018-02-06  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
	New prototype.
	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
	New function.
	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>5_shift):
	New instruction.
	(*aarch64_bfi<GPI:mode>4_noand): Ditto.
	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.
	(*aarch64_bfi<GPI:mode>4_noshift_alt): Ditto.

2018-02-06  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
	to bfi.
	* gcc.target/aarch64/combine_bfi_2.c: New test.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: combine.patch --]
[-- Type: text/x-patch; name="combine.patch", Size: 6335 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35f33b..b6c0d0a8eb6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7c453cdad0..26b5ab47d6f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9330,6 +9330,42 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	     & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+				   unsigned HOST_WIDE_INT mask1,
+				   unsigned HOST_WIDE_INT shft_amnt,
+				   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if (mask1 != ~mask2)
+    return false;
+
+  /* Verify that mask2 is not all zeros or ones.  */
+  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
+    return false;
+
+  /* The shift amount should always be less than the mode size.  */
+  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+     least significant bits after shifting by creating a mask 't' based on
+     the number of bits set in mask2 and the shift amount for mask2 and
+     comparing that to the actual mask2.  */
+  t = popcount_hwi (mask2);
+  t = (HOST_WIDE_INT_1U << t) - 1;
+  t = t << shft_amnt;
+  if (t != mask2)
+    return false;
+  
+  return true;
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
    is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7f6fe0f135..2bbd3f1055c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5476,6 +5476,76 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.  A similar instruction
+;;  with the two parts of the IOR swapped around was never triggered
+;;  in a bootstrap build and test of GCC so it was not included.
+
+(define_insn "*aarch64_bfi<GPI:mode>5_shift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (ashift:GPI
+                           (match_operand:GPI 3 "register_operand" "r")
+                           (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))
+                          (match_operand:GPI 5 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
+				      UINTVAL (operands[4]),
+				      UINTVAL(operands[5]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %P5"
+  [(set_attr "type" "bfm")]
+)
+
+;; Like *aarch64_bfi<GPI:mode>5_shift but with no and of the ashift because
+;; the shift is large enough to remove the need for an AND instruction.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noand"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (ashift:GPI
+                          (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
+				      UINTVAL (operands[4]),
+				      HOST_WIDE_INT_M1U << UINTVAL (operands[4]) )"
+{
+  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - UINTVAL (operands[4]));
+  return "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %5";
+}
+  [(set_attr "type" "bfm")]
+)
+
+;; Like *aarch64_bfi<GPI:mode>5_shift but with no shifting, we are just
+;; copying the least significant bits of OP3 to OP0.  In this case we do
+;; need two versions of the instruction to handle different checks on the
+;; constant values.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
+				      UINTVAL (operands[4]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"
+  [(set_attr "type" "bfm")]
+)
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift_alt"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "0")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[4]), 0,
+				      UINTVAL (operands[2]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"
+  [(set_attr "type" "bfm")]
+)
+
 (define_insn "*extr_insv_lower_reg<mode>"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: combine-test.patch --]
[-- Type: text/x-patch; name="combine-test.patch", Size: 1047 bytes --]

diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
index e69de29bb2d..145282d4d55 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int f1(int x, int y)
+{
+  return (y & 0xfffffff) | (((x <<28) & 0xf0000000));
+}
+
+
+int f2(int x, int y)
+{
+  return (((x <<28) & 0xf0000000)) | (y & 0xfffffff);
+}
+
+/* { dg-final { scan-assembler-times {\tbfi\t} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
index 109f989a2f0..c3b5fc58800 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -114,4 +114,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bfxil\\t" 18 } } */
+/* { dg-final { scan-assembler-times "bfxil\\t" 3 } } */
+/* { dg-final { scan-assembler-times "bfi\\t" 15 } } */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-02-06 23:42 ` Steve Ellcey
@ 2019-02-07 18:13   ` Wilco Dijkstra
  2019-02-11 18:46     ` Steve Ellcey
  0 siblings, 1 reply; 21+ messages in thread
From: Wilco Dijkstra @ 2019-02-07 18:13 UTC (permalink / raw)
  To: Steve Ellcey, sellcey, gcc-patches; +Cc: nd

Hi Steve,

>> After special cases you could do something like t = mask2 + (HWI_1U << shift);
>> return t == (t & -t) to check for a valid bfi.
>
> I am not sure I follow this logic and my attempts to use this did not
> work so I kept my original code.

It's similar to the initial code in aarch64_bitmask_imm, but rather than adding
the lowest bit to the value to verify it is a mask (val + (val & -val)), we use the
shift instead. If the shift is exactly right, it reaches the first set bit of the mask. 
Adding the low bit to a valid mask always results in zero or a single set bit. 
The standard idiom to check that is t == (t & -t).

>> +  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"
>> 
>> This could emit a width that may be 32 too large in SImode if bit 31 is set
>> (there is another use of %P in aarch64.md which may have the same
>> issue).
>
> I am not sure why having bit 31 set would be a problem.  Sign
> extension?

Yes, if bit 31 is set, %P will emit 33 for a 32-bit constant which is obviously wrong.
Your patch avoids this for bfi by explicitly computing the correct value.

This looks good to me (and creates useful bfi's as expected), but I can't approve.

Wilco

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-02-07 18:13   ` Wilco Dijkstra
@ 2019-02-11 18:46     ` Steve Ellcey
  2019-02-26 16:46       ` Steve Ellcey
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Ellcey @ 2019-02-11 18:46 UTC (permalink / raw)
  To: sellcey, gcc-patches, Wilco.Dijkstra
  Cc: Marcus.Shawcroft, james.greenhalgh, nd, Richard.Earnshaw

[-- Attachment #1: Type: text/plain, Size: 2224 bytes --]

On Thu, 2019-02-07 at 18:13 +0000, Wilco Dijkstra wrote:
> External Email
> 
> Hi Steve,
> 
> > > After special cases you could do something like t = mask2 +
> > > (HWI_1U << shift);
> > > return t == (t & -t) to check for a valid bfi.
> > 
> > I am not sure I follow this logic and my attempts to use this did not
> > work so I kept my original code.
> 
> It's similar to the initial code in aarch64_bitmask_imm, but rather than adding
> the lowest bit to the value to verify it is a mask (val + (val & -val)), we use the
> shift instead. If the shift is exactly right, it reaches the first
> set bit of the mask.
> Adding the low bit to a valid mask always results in zero or a single set bit.
> The standard idiom to check that is t == (t & -t).
> 
> > > +  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"
> > > 
> > > This could emit a width that may be 32 too large in SImode if bit 31 is set
> > > (there is another use of %P in aarch64.md which may have the same
> > > issue).
> > 
> > I am not sure why having bit 31 set would be a problem.  Sign
> > extension?
> 
> Yes, if bit 31 is set, %P will emit 33 for a 32-bit constant which is obviously wrong.
> Your patch avoids this for bfi by explicitly computing the correct value.
> 
> This looks good to me (and creates useful bfi's as expected), but I
> can't approve.
> 
> Wilco

Thanks for looking this over.  I have updated the mask check to use
your method and retested to make sure it still works.  Can one of the
aarch64 maintainers approve this patch?

2018-02-11  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
	New prototype.
	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
	New function.
	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>5_shift):
	New instruction.
	(*aarch64_bfi<GPI:mode>4_noand): Ditto.
	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.
	(*aarch64_bfi<GPI:mode>4_noshift_alt): Ditto.

2018-02-11  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
	to bfi.
	* gcc.target/aarch64/combine_bfi_2.c: New test.



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: combine.patch --]
[-- Type: text/x-patch; name="combine.patch", Size: 6131 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35f33b..b6c0d0a8eb6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7c453cdad0..a7ef952ad1b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9330,6 +9330,35 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	     & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+				   unsigned HOST_WIDE_INT mask1,
+				   unsigned HOST_WIDE_INT shft_amnt,
+				   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if (mask1 != ~mask2)
+    return false;
+
+  /* Verify that mask2 is not all zeros or ones.  */
+  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
+    return false;
+
+  /* The shift amount should always be less than the mode size.  */
+  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+     least significant bits after shifting by shft_amnt.  */
+  t = mask2 + (HOST_WIDE_INT_1U << shft_amnt);
+  return (t == (t & -t));
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
    is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7f6fe0f135..2bbd3f1055c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5476,6 +5476,76 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.  A similar instruction
+;;  with the two parts of the IOR swapped around was never triggered
+;;  in a bootstrap build and test of GCC so it was not included.
+
+(define_insn "*aarch64_bfi<GPI:mode>5_shift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (ashift:GPI
+                           (match_operand:GPI 3 "register_operand" "r")
+                           (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))
+                          (match_operand:GPI 5 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
+				      UINTVAL (operands[4]),
+				      UINTVAL(operands[5]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %P5"
+  [(set_attr "type" "bfm")]
+)
+
+;; Like *aarch64_bfi<GPI:mode>5_shift but with no and of the ashift because
+;; the shift is large enough to remove the need for an AND instruction.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noand"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (ashift:GPI
+                          (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
+				      UINTVAL (operands[4]),
+				      HOST_WIDE_INT_M1U << UINTVAL (operands[4]) )"
+{
+  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - UINTVAL (operands[4]));
+  return "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %5";
+}
+  [(set_attr "type" "bfm")]
+)
+
+;; Like *aarch64_bfi<GPI:mode>5_shift but with no shifting, we are just
+;; copying the least significant bits of OP3 to OP0.  In this case we do
+;; need two versions of the instruction to handle different checks on the
+;; constant values.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
+				      UINTVAL (operands[4]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"
+  [(set_attr "type" "bfm")]
+)
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift_alt"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "0")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[4]), 0,
+				      UINTVAL (operands[2]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"
+  [(set_attr "type" "bfm")]
+)
+
 (define_insn "*extr_insv_lower_reg<mode>"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: combine-test.patch --]
[-- Type: text/x-patch; name="combine-test.patch", Size: 1047 bytes --]

diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
index e69de29bb2d..145282d4d55 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int f1(int x, int y)
+{
+  return (y & 0xfffffff) | (((x <<28) & 0xf0000000));
+}
+
+
+int f2(int x, int y)
+{
+  return (((x <<28) & 0xf0000000)) | (y & 0xfffffff);
+}
+
+/* { dg-final { scan-assembler-times {\tbfi\t} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
index 109f989a2f0..c3b5fc58800 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -114,4 +114,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bfxil\\t" 18 } } */
+/* { dg-final { scan-assembler-times "bfxil\\t" 3 } } */
+/* { dg-final { scan-assembler-times "bfi\\t" 15 } } */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-02-11 18:46     ` Steve Ellcey
@ 2019-02-26 16:46       ` Steve Ellcey
  2019-03-14 22:54         ` Steve Ellcey
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Ellcey @ 2019-02-26 16:46 UTC (permalink / raw)
  To: sellcey, gcc-patches, Wilco.Dijkstra
  Cc: Marcus.Shawcroft, james.greenhalgh, nd, Richard.Earnshaw

Ping.

Steve Ellcey
sellcey@marvell.com


On Mon, 2019-02-11 at 10:46 -0800, Steve Ellcey wrote:
> On Thu, 2019-02-07 at 18:13 +0000, Wilco Dijkstra wrote
> > 
> > Hi Steve,
> > 
> > > > After special cases you could do something like t = mask2 +
> > > > (HWI_1U << shift);
> > > > return t == (t & -t) to check for a valid bfi.
> > > 
> > > I am not sure I follow this logic and my attempts to use this did
> > > not
> > > work so I kept my original code.
> > 
> > It's similar to the initial code in aarch64_bitmask_imm, but rather
> > than adding
> > the lowest bit to the value to verify it is a mask (val + (val &
> > -val)), we use the
> > shift instead. If the shift is exactly right, it reaches the first
> > set bit of the mask.
> > Adding the low bit to a valid mask always results in zero or a
> > single set bit.
> > The standard idiom to check that is t == (t & -t).
> > 
> > > > +  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"
> > > > 
> > > > This could emit a width that may be 32 too large in SImode if
> > > > bit 31 is set
> > > > (there is another use of %P in aarch64.md which may have the
> > > > same
> > > > issue).
> > > 
> > > I am not sure why having bit 31 set would be a problem.  Sign
> > > extension?
> > 
> > Yes, if bit 31 is set, %P will emit 33 for a 32-bit constant which
> > is obviously wrong.
> > Your patch avoids this for bfi by explicitly computing the correct
> > value.
> > 
> > This looks good to me (and creates useful bfi's as expected), but I
> > can't approve.
> > 
> > Wilco
> 
> Thanks for looking this over.  I have updated the mask check to use
> your method and retested to make sure it still works.  Can one of the
> aarch64 maintainers approve this patch?
> 
> 2018-02-11  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* config/aarch64/aarch64-protos.h
> (aarch64_masks_and_shift_for_bfi_p):
> 	New prototype.
> 	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
> 	New function.
> 	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>5_shift):
> 	New instruction.
> 	(*aarch64_bfi<GPI:mode>4_noand): Ditto.
> 	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.
> 	(*aarch64_bfi<GPI:mode>4_noshift_alt): Ditto.
> 
> 2018-02-11  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
> 	to bfi.
> 	* gcc.target/aarch64/combine_bfi_2.c: New test.
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-02-26 16:46       ` Steve Ellcey
@ 2019-03-14 22:54         ` Steve Ellcey
  2019-04-01 17:23           ` Steve Ellcey
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Ellcey @ 2019-03-14 22:54 UTC (permalink / raw)
  To: sellcey, gcc-patches, Wilco.Dijkstra
  Cc: Marcus.Shawcroft, james.greenhalgh, nd, Richard.Earnshaw

Double ping.

Steve Ellcey
sellcey@marvell.com

On Tue, 2019-02-26 at 08:44 -0800, Steve Ellcey wrote:
> Ping.
> 
> Steve Ellcey
> sellcey@marvell.com
> 
> 
> On Mon, 2019-02-11 at 10:46 -0800, Steve Ellcey wrote:
> > On Thu, 2019-02-07 at 18:13 +0000, Wilco Dijkstra wrote
> > > 
> > > Hi Steve,
> > > 
> > > > > After special cases you could do something like t = mask2 +
> > > > > (HWI_1U << shift);
> > > > > return t == (t & -t) to check for a valid bfi.
> > > > 
> > > > I am not sure I follow this logic and my attempts to use this
> > > > did
> > > > not
> > > > work so I kept my original code.
> > > 
> > > It's similar to the initial code in aarch64_bitmask_imm, but
> > > rather
> > > than adding
> > > the lowest bit to the value to verify it is a mask (val + (val &
> > > -val)), we use the
> > > shift instead. If the shift is exactly right, it reaches the
> > > first
> > > set bit of the mask.
> > > Adding the low bit to a valid mask always results in zero or a
> > > single set bit.
> > > The standard idiom to check that is t == (t & -t).
> > > 
> > > > > +  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"
> > > > > 
> > > > > This could emit a width that may be 32 too large in SImode if
> > > > > bit 31 is set
> > > > > (there is another use of %P in aarch64.md which may have the
> > > > > same
> > > > > issue).
> > > > 
> > > > I am not sure why having bit 31 set would be a problem.  Sign
> > > > extension?
> > > 
> > > Yes, if bit 31 is set, %P will emit 33 for a 32-bit constant which
> > > is obviously wrong.
> > > Your patch avoids this for bfi by explicitly computing the correct
> > > value.
> > > 
> > > This looks good to me (and creates useful bfi's as expected), but I
> > > can't approve.
> > > 
> > > Wilco
> > 
> > Thanks for looking this over.  I have updated the mask check to use
> > your method and retested to make sure it still works.  Can one of the
> > aarch64 maintainers approve this patch?
> > 
> > 2018-02-11  Steve Ellcey  <sellcey@marvell.com>
> > 
> > 	PR rtl-optimization/87763
> > 	* config/aarch64/aarch64-protos.h
> > (aarch64_masks_and_shift_for_bfi_p):
> > 	New prototype.
> > 	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
> > 	New function.
> > 	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>5_shift):
> > 	New instruction.
> > 	(*aarch64_bfi<GPI:mode>4_noand): Ditto.
> > 	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.
> > 	(*aarch64_bfi<GPI:mode>4_noshift_alt): Ditto.
> > 
> > 2018-02-11  Steve Ellcey  <sellcey@marvell.com>
> > 
> > 	PR rtl-optimization/87763
> > 	* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
> > 	to bfi.
> > 	* gcc.target/aarch64/combine_bfi_2.c: New test.
> > 
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-03-14 22:54         ` Steve Ellcey
@ 2019-04-01 17:23           ` Steve Ellcey
  2019-04-10 10:19             ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Ellcey @ 2019-04-01 17:23 UTC (permalink / raw)
  To: sellcey, gcc-patches, Wilco.Dijkstra
  Cc: Marcus.Shawcroft, james.greenhalgh, nd, Richard.Earnshaw

[-- Attachment #1: Type: text/plain, Size: 1362 bytes --]

This is a ping**3 for a patch to fix one of the test failures in PR 877763.
It fixes the gcc.target/aarch64/combine_bfi_1.c failure, but not the other
ones.

Could one of the Aarch64 maintainers take a look at it?  This version of
the patch was originally submitted on February 11 after incorporating some
changes suggested by Wilco Dijkstra but I have not gotten any comments
since then despite pinging it.  I updated it to apply cleanly on ToT but
did not make any other changes since the February 11th version.

If we want to encourage people to fix bugs in the run up to a release it
would help to get feedback on bugfix patches.

Steve Ellcey
sellcey@marvell.com


2018-04-01  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
	New prototype.
	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
	New function.
	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>5_shift):
	New instruction.
	(*aarch64_bfi<GPI:mode>4_noand): Ditto.
	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.
	(*aarch64_bfi<GPI:mode>4_noshift_alt): Ditto.


2018-04-01  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
	to bfi.
	* gcc.target/aarch64/combine_bfi_2.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: combine.patch --]
[-- Type: text/x-patch; name="combine.patch", Size: 6107 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35..b6c0d0a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b38505b..3017e99 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9336,6 +9336,35 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	     & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+				   unsigned HOST_WIDE_INT mask1,
+				   unsigned HOST_WIDE_INT shft_amnt,
+				   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if (mask1 != ~mask2)
+    return false;
+
+  /* Verify that mask2 is not all zeros or ones.  */
+  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
+    return false;
+
+  /* The shift amount should always be less than the mode size.  */
+  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+     least significant bits after shifting by shft_amnt.  */
+  t = mask2 + (HOST_WIDE_INT_1U << shft_amnt);
+  return (t == (t & -t));
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
    is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 70f0418..baa8983 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5490,6 +5490,76 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.  A similar instruction
+;;  with the two parts of the IOR swapped around was never triggered
+;;  in a bootstrap build and test of GCC so it was not included.
+
+(define_insn "*aarch64_bfi<GPI:mode>5_shift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (ashift:GPI
+                           (match_operand:GPI 3 "register_operand" "r")
+                           (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))
+                          (match_operand:GPI 5 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
+				      UINTVAL (operands[4]),
+				      UINTVAL(operands[5]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %P5"
+  [(set_attr "type" "bfm")]
+)
+
+;; Like *aarch64_bfi<GPI:mode>5_shift but with no and of the ashift because
+;; the shift is large enough to remove the need for an AND instruction.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noand"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (ashift:GPI
+                          (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
+				      UINTVAL (operands[4]),
+				      HOST_WIDE_INT_M1U << UINTVAL (operands[4]) )"
+{
+  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - UINTVAL (operands[4]));
+  return "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %5";
+}
+  [(set_attr "type" "bfm")]
+)
+
+;; Like *aarch64_bfi<GPI:mode>5_shift but with no shifting, we are just
+;; copying the least significant bits of OP3 to OP0.  In this case we do
+;; need two versions of the instruction to handle different checks on the
+;; constant values.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
+				      UINTVAL (operands[4]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"
+  [(set_attr "type" "bfm")]
+)
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift_alt"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "0")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[4]), 0,
+				      UINTVAL (operands[2]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"
+  [(set_attr "type" "bfm")]
+)
+
 (define_insn "*extr_insv_lower_reg<mode>"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: combine-test.patch --]
[-- Type: text/x-patch; name="combine-test.patch", Size: 1031 bytes --]

diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
index e69de29..145282d 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int f1(int x, int y)
+{
+  return (y & 0xfffffff) | (((x <<28) & 0xf0000000));
+}
+
+
+int f2(int x, int y)
+{
+  return (((x <<28) & 0xf0000000)) | (y & 0xfffffff);
+}
+
+/* { dg-final { scan-assembler-times {\tbfi\t} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
index 109f989..c3b5fc5 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -114,4 +114,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bfxil\\t" 18 } } */
+/* { dg-final { scan-assembler-times "bfxil\\t" 3 } } */
+/* { dg-final { scan-assembler-times "bfi\\t" 15 } } */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-04-01 17:23           ` Steve Ellcey
@ 2019-04-10 10:19             ` Richard Earnshaw (lists)
  2019-04-10 20:32               ` Steve Ellcey
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Earnshaw (lists) @ 2019-04-10 10:19 UTC (permalink / raw)
  To: Steve Ellcey, sellcey, gcc-patches, Wilco.Dijkstra
  Cc: Marcus.Shawcroft, james.greenhalgh, nd

On 01/04/2019 18:23, Steve Ellcey wrote:
> This is a ping**3 for a patch to fix one of the test failures in PR 877763.
> It fixes the gcc.target/aarch64/combine_bfi_1.c failure, but not the other
> ones.
> 
> Could one of the Aarch64 maintainers take a look at it?  This version of
> the patch was originally submitted on February 11 after incorporating some
> changes suggested by Wilco Dijkstra but I have not gotten any comments
> since then despite pinging it.  I updated it to apply cleanly on ToT but
> did not make any other changes since the February 11th version.
> 
> If we want to encourage people to fix bugs in the run up to a release it
> would help to get feedback on bugfix patches.
> 
> Steve Ellcey
> sellcey@marvell.com
> 
> 
> 2018-04-01  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
> 	New prototype.
> 	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
> 	New function.
> 	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>5_shift):
> 	New instruction.
> 	(*aarch64_bfi<GPI:mode>4_noand): Ditto.
> 	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.
> 	(*aarch64_bfi<GPI:mode>4_noshift_alt): Ditto.
> 
> 
> 2018-04-01  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
> 	to bfi.
> 	* gcc.target/aarch64/combine_bfi_2.c: New test.
> 
> 
> combine.patch
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index b035e35..b6c0d0a 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
>  void aarch64_declare_function_name (FILE *, const char*, tree);
>  bool aarch64_legitimate_pic_operand_p (rtx);
>  bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
> +bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
> +					unsigned HOST_WIDE_INT,
> +					unsigned HOST_WIDE_INT);
>  bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
>  bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
>  opt_machine_mode aarch64_sve_pred_mode (unsigned int);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b38505b..3017e99 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -9336,6 +9336,35 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
>  	     & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
>  }
>  
> +/* Return true if the masks and a shift amount from an RTX of the form
> +   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
> +   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
> +
> +bool
> +aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
> +				   unsigned HOST_WIDE_INT mask1,
> +				   unsigned HOST_WIDE_INT shft_amnt,
> +				   unsigned HOST_WIDE_INT mask2)
> +{
> +  unsigned HOST_WIDE_INT t;
> +
> +  /* Verify that there is no overlap in what bits are set in the two masks.  */
> +  if (mask1 != ~mask2)
> +    return false;
> +
> +  /* Verify that mask2 is not all zeros or ones.  */
> +  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
> +    return false;
> +
> +  /* The shift amount should always be less than the mode size.  */
> +  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
> +
> +  /* Verify that the mask being shifted is contiguous and would be in the
> +     least significant bits after shifting by shft_amnt.  */
> +  t = mask2 + (HOST_WIDE_INT_1U << shft_amnt);
> +  return (t == (t & -t));
> +}
> +
>  /* Calculate the cost of calculating X, storing it in *COST.  Result
>     is true if the total cost of the operation has now been calculated.  */
>  static bool
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 70f0418..baa8983 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5490,6 +5490,76 @@
>    [(set_attr "type" "bfm")]
>  )
>  
> +;;  Match a bfi instruction where the shift of OP3 means that we are
> +;;  actually copying the least significant bits of OP3 into OP0 by way
> +;;  of the AND masks and the IOR instruction.  A similar instruction
> +;;  with the two parts of the IOR swapped around was never triggered
> +;;  in a bootstrap build and test of GCC so it was not included.
> +
> +(define_insn "*aarch64_bfi<GPI:mode>5_shift"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))
> +                 (and:GPI (ashift:GPI
> +                           (match_operand:GPI 3 "register_operand" "r")
> +                           (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))
> +                          (match_operand:GPI 5 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
> +				      UINTVAL (operands[4]),
> +				      UINTVAL(operands[5]))"
> +  "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %P5"
> +  [(set_attr "type" "bfm")]
> +)

I think we need an 'alt' variant of this with the two AND operations
swapped, there's no canonical order here.

> +
> +;; Like *aarch64_bfi<GPI:mode>5_shift but with no and of the ashift because
> +;; the shift is large enough to remove the need for an AND instruction.
> +
> +(define_insn "*aarch64_bfi<GPI:mode>4_noand"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))
> +                 (ashift:GPI
> +                          (match_operand:GPI 3 "register_operand" "r")
> +                          (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))))]

This might need it, too.

> +  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
> +				      UINTVAL (operands[4]),
> +				      HOST_WIDE_INT_M1U << UINTVAL (operands[4]) )"
> +{
> +  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - UINTVAL (operands[4]));
> +  return "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %5";
> +}
> +  [(set_attr "type" "bfm")]
> +)
> +
> +;; Like *aarch64_bfi<GPI:mode>5_shift but with no shifting, we are just
> +;; copying the least significant bits of OP3 to OP0.  In this case we do
> +;; need two versions of the instruction to handle different checks on the
> +;; constant values.
> +
> +(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))
> +                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
> +                          (match_operand:GPI 4 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
> +				      UINTVAL (operands[4]))"
> +  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"
> +  [(set_attr "type" "bfm")]
> +)
> +
> +(define_insn "*aarch64_bfi<GPI:mode>4_noshift_alt"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))
> +                 (and:GPI (match_operand:GPI 3 "register_operand" "0")
> +                          (match_operand:GPI 4 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[4]), 0,
> +				      UINTVAL (operands[2]))"

If you adjust the operand numbering to

+(define_insn "*aarch64_bfi<GPI:mode>4_noshift_alt"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))))]

> +  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"

Then this string becomes identical to the one above and it's clear that
the two patterns are really identical (I also seem to remember that at
least historically, it was better to have a tie between adjacent operand
numbers --  but that might be a false memory and certainly might be
outdated in the LRA world).

OK with those changes.

R.


> +  [(set_attr "type" "bfm")]
> +)
> +
>  (define_insn "*extr_insv_lower_reg<mode>"
>    [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
>  			  (match_operand 1 "const_int_operand" "n")
> 
> 
> combine-test.patch
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
> index e69de29..145282d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int f1(int x, int y)
> +{
> +  return (y & 0xfffffff) | (((x <<28) & 0xf0000000));
> +}
> +
> +
> +int f2(int x, int y)
> +{
> +  return (((x <<28) & 0xf0000000)) | (y & 0xfffffff);
> +}
> +
> +/* { dg-final { scan-assembler-times {\tbfi\t} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
> index 109f989..c3b5fc5 100644
> --- a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
> +++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
> @@ -114,4 +114,5 @@ main (void)
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler-times "bfxil\\t" 18 } } */
> +/* { dg-final { scan-assembler-times "bfxil\\t" 3 } } */
> +/* { dg-final { scan-assembler-times "bfi\\t" 15 } } */
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-04-10 10:19             ` Richard Earnshaw (lists)
@ 2019-04-10 20:32               ` Steve Ellcey
  2019-04-11  9:04                 ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Ellcey @ 2019-04-10 20:32 UTC (permalink / raw)
  To: sellcey, gcc-patches, Richard.Earnshaw, Wilco.Dijkstra
  Cc: Marcus.Shawcroft, james.greenhalgh, nd

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

On Wed, 2019-04-10 at 11:10 +0100, Richard Earnshaw (lists) wrote:
> 
> OK with those changes.
> 
> R.

I made the changes you suggested and checked in the patch.  Just to be
complete, here is the final version of the patch that I checked in.

2018-04-10  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
	New prototype.
	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
	New function.
	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>5_shift):
	New instruction.
	(*aarch64_bfi<GPI:mode>5_shift_alt): Ditto.
	(*aarch64_bfi<GPI:mode>4_noand): Ditto.
	(*aarch64_bfi<GPI:mode>4_noand_alt): Ditto.
	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.

2018-04-10  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
	to bfi.
	* gcc.target/aarch64/combine_bfi_2.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: combine.patch --]
[-- Type: text/x-patch; name="combine.patch", Size: 6895 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35..b6c0d0a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 95e5b03..9be7548 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9336,6 +9336,35 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	     & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+				   unsigned HOST_WIDE_INT mask1,
+				   unsigned HOST_WIDE_INT shft_amnt,
+				   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if (mask1 != ~mask2)
+    return false;
+
+  /* Verify that mask2 is not all zeros or ones.  */
+  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
+    return false;
+
+  /* The shift amount should always be less than the mode size.  */
+  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+     least significant bits after shifting by shft_amnt.  */
+  t = mask2 + (HOST_WIDE_INT_1U << shft_amnt);
+  return (t == (t & -t));
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
    is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ab8786a..e0df975 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5491,6 +5491,94 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.  A similar instruction
+;;  with the two parts of the IOR swapped around was never triggered
+;;  in a bootstrap build and test of GCC so it was not included.
+
+(define_insn "*aarch64_bfi<GPI:mode>5_shift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (ashift:GPI
+                           (match_operand:GPI 3 "register_operand" "r")
+                           (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))
+                          (match_operand:GPI 5 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
+				      UINTVAL (operands[4]),
+				      UINTVAL(operands[5]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %P5"
+  [(set_attr "type" "bfm")]
+)
+
+(define_insn "*aarch64_bfi<GPI:mode>5_shift_alt"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (ashift:GPI
+                           (match_operand:GPI 1 "register_operand" "r")
+                           (match_operand:GPI 2 "aarch64_simd_shift_imm_<mode>" "n"))
+                          (match_operand:GPI 3 "const_int_operand" "n"))
+		 (and:GPI (match_operand:GPI 4 "register_operand" "0")
+                          (match_operand:GPI 5 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[5]),
+				      UINTVAL (operands[2]),
+				      UINTVAL(operands[3]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>1, %2, %P3"
+  [(set_attr "type" "bfm")]
+)
+
+;; Like *aarch64_bfi<GPI:mode>5_shift but with no and of the ashift because
+;; the shift is large enough to remove the need for an AND instruction.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noand"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (ashift:GPI
+                          (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
+				      UINTVAL (operands[4]),
+				      HOST_WIDE_INT_M1U << UINTVAL (operands[4]) )"
+{
+  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - UINTVAL (operands[4]));
+  return "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %5";
+}
+  [(set_attr "type" "bfm")]
+)
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noand_alt"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (ashift:GPI
+                          (match_operand:GPI 1 "register_operand" "r")
+                          (match_operand:GPI 2 "aarch64_simd_shift_imm_<mode>" "n"))
+		 (and:GPI (match_operand:GPI 3 "register_operand" "0")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[4]),
+				      UINTVAL (operands[2]),
+				      HOST_WIDE_INT_M1U << UINTVAL (operands[2]) )"
+{
+  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - UINTVAL (operands[2]));
+  return "bfi\t%<GPI:w>0, %<GPI:w>1, %2, %5";
+}
+  [(set_attr "type" "bfm")]
+)
+
+;; Like *aarch64_bfi<GPI:mode>5_shift but with no shifting, we are just
+;; copying the least significant bits of OP3 to OP0.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
+				      UINTVAL (operands[4]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"
+  [(set_attr "type" "bfm")]
+)
+
 (define_insn "*extr_insv_lower_reg<mode>"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: combine-test.patch --]
[-- Type: text/x-patch; name="combine-test.patch", Size: 1031 bytes --]

diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
index e69de29..145282d 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int f1(int x, int y)
+{
+  return (y & 0xfffffff) | (((x <<28) & 0xf0000000));
+}
+
+
+int f2(int x, int y)
+{
+  return (((x <<28) & 0xf0000000)) | (y & 0xfffffff);
+}
+
+/* { dg-final { scan-assembler-times {\tbfi\t} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
index 109f989..a2fb31c 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -114,4 +114,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bfxil\\t" 18 } } */
+/* { dg-final { scan-assembler-times "bfxil\\t" 7 } } */
+/* { dg-final { scan-assembler-times "bfi\\t" 11 } } */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-04-10 20:32               ` Steve Ellcey
@ 2019-04-11  9:04                 ` Richard Earnshaw (lists)
  2019-04-11 15:14                   ` [EXT] " Steve Ellcey
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Earnshaw (lists) @ 2019-04-11  9:04 UTC (permalink / raw)
  To: Steve Ellcey, sellcey, gcc-patches, Wilco.Dijkstra
  Cc: Marcus.Shawcroft, james.greenhalgh, nd

On 10/04/2019 21:31, Steve Ellcey wrote:
> On Wed, 2019-04-10 at 11:10 +0100, Richard Earnshaw (lists) wrote:
>>
>> OK with those changes.
>>
>> R.
> 
> I made the changes you suggested and checked in the patch.  Just to be
> complete, here is the final version of the patch that I checked in.
> 
> 2018-04-10  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
> 	New prototype.
> 	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
> 	New function.
> 	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>5_shift):
> 	New instruction.
> 	(*aarch64_bfi<GPI:mode>5_shift_alt): Ditto.
> 	(*aarch64_bfi<GPI:mode>4_noand): Ditto.
> 	(*aarch64_bfi<GPI:mode>4_noand_alt): Ditto.
> 	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.
> 

You've removed the ..._noshift_alt variant.  That wasn't my intention,
so perhaps you misunderstood what I was trying to say.

The two versions are both needed, since the register tie is not
orthogonal to the constants used in the masks and canonicalization will
not generate a consistent ordering of the operands.

My point is that you can make the 'alt' version from

+(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
+				      UINTVAL (operands[4]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"
+  [(set_attr "type" "bfm")]
+)

by simply duplicating it and permuting the numbering of the operands in
the pattern part, the remaining parts remain identical, including the
final condition and the insn printing string:

+(define_insn "*aarch64_bfi<GPI:mode>4_noshift_alt"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))))]

Operand order permuted

+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
+				      UINTVAL (operands[4]))"

does not change

+  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"

does not change

+  [(set_attr "type" "bfm")]
+)

R.

> 2018-04-10  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
> 	to bfi.
> 	* gcc.target/aarch64/combine_bfi_2.c: New test.
> 
> 
> combine.patch
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index b035e35..b6c0d0a 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
>  void aarch64_declare_function_name (FILE *, const char*, tree);
>  bool aarch64_legitimate_pic_operand_p (rtx);
>  bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
> +bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
> +					unsigned HOST_WIDE_INT,
> +					unsigned HOST_WIDE_INT);
>  bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
>  bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
>  opt_machine_mode aarch64_sve_pred_mode (unsigned int);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 95e5b03..9be7548 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -9336,6 +9336,35 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
>  	     & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
>  }
>  
> +/* Return true if the masks and a shift amount from an RTX of the form
> +   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
> +   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
> +
> +bool
> +aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
> +				   unsigned HOST_WIDE_INT mask1,
> +				   unsigned HOST_WIDE_INT shft_amnt,
> +				   unsigned HOST_WIDE_INT mask2)
> +{
> +  unsigned HOST_WIDE_INT t;
> +
> +  /* Verify that there is no overlap in what bits are set in the two masks.  */
> +  if (mask1 != ~mask2)
> +    return false;
> +
> +  /* Verify that mask2 is not all zeros or ones.  */
> +  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
> +    return false;
> +
> +  /* The shift amount should always be less than the mode size.  */
> +  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
> +
> +  /* Verify that the mask being shifted is contiguous and would be in the
> +     least significant bits after shifting by shft_amnt.  */
> +  t = mask2 + (HOST_WIDE_INT_1U << shft_amnt);
> +  return (t == (t & -t));
> +}
> +
>  /* Calculate the cost of calculating X, storing it in *COST.  Result
>     is true if the total cost of the operation has now been calculated.  */
>  static bool
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index ab8786a..e0df975 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5491,6 +5491,94 @@
>    [(set_attr "type" "bfm")]
>  )
>  
> +;;  Match a bfi instruction where the shift of OP3 means that we are
> +;;  actually copying the least significant bits of OP3 into OP0 by way
> +;;  of the AND masks and the IOR instruction.  A similar instruction
> +;;  with the two parts of the IOR swapped around was never triggered
> +;;  in a bootstrap build and test of GCC so it was not included.
> +
> +(define_insn "*aarch64_bfi<GPI:mode>5_shift"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))
> +                 (and:GPI (ashift:GPI
> +                           (match_operand:GPI 3 "register_operand" "r")
> +                           (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))
> +                          (match_operand:GPI 5 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
> +				      UINTVAL (operands[4]),
> +				      UINTVAL(operands[5]))"
> +  "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %P5"
> +  [(set_attr "type" "bfm")]
> +)
> +
> +(define_insn "*aarch64_bfi<GPI:mode>5_shift_alt"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (ashift:GPI
> +                           (match_operand:GPI 1 "register_operand" "r")
> +                           (match_operand:GPI 2 "aarch64_simd_shift_imm_<mode>" "n"))
> +                          (match_operand:GPI 3 "const_int_operand" "n"))
> +		 (and:GPI (match_operand:GPI 4 "register_operand" "0")
> +                          (match_operand:GPI 5 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[5]),
> +				      UINTVAL (operands[2]),
> +				      UINTVAL(operands[3]))"
> +  "bfi\t%<GPI:w>0, %<GPI:w>1, %2, %P3"
> +  [(set_attr "type" "bfm")]
> +)
> +
> +;; Like *aarch64_bfi<GPI:mode>5_shift but with no and of the ashift because
> +;; the shift is large enough to remove the need for an AND instruction.
> +
> +(define_insn "*aarch64_bfi<GPI:mode>4_noand"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))
> +                 (ashift:GPI
> +                          (match_operand:GPI 3 "register_operand" "r")
> +                          (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))))]
> +  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
> +				      UINTVAL (operands[4]),
> +				      HOST_WIDE_INT_M1U << UINTVAL (operands[4]) )"
> +{
> +  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - UINTVAL (operands[4]));
> +  return "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %5";
> +}
> +  [(set_attr "type" "bfm")]
> +)
> +
> +(define_insn "*aarch64_bfi<GPI:mode>4_noand_alt"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (ashift:GPI
> +                          (match_operand:GPI 1 "register_operand" "r")
> +                          (match_operand:GPI 2 "aarch64_simd_shift_imm_<mode>" "n"))
> +		 (and:GPI (match_operand:GPI 3 "register_operand" "0")
> +                          (match_operand:GPI 4 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[4]),
> +				      UINTVAL (operands[2]),
> +				      HOST_WIDE_INT_M1U << UINTVAL (operands[2]) )"
> +{
> +  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - UINTVAL (operands[2]));
> +  return "bfi\t%<GPI:w>0, %<GPI:w>1, %2, %5";
> +}
> +  [(set_attr "type" "bfm")]
> +)
> +
> +;; Like *aarch64_bfi<GPI:mode>5_shift but with no shifting, we are just
> +;; copying the least significant bits of OP3 to OP0.
> +
> +(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))
> +                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
> +                          (match_operand:GPI 4 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
> +				      UINTVAL (operands[4]))"
> +  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"
> +  [(set_attr "type" "bfm")]
> +)
> +
>  (define_insn "*extr_insv_lower_reg<mode>"
>    [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
>  			  (match_operand 1 "const_int_operand" "n")
> 
> 
> combine-test.patch
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
> index e69de29..145282d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/combine_bfi_2.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int f1(int x, int y)
> +{
> +  return (y & 0xfffffff) | (((x <<28) & 0xf0000000));
> +}
> +
> +
> +int f2(int x, int y)
> +{
> +  return (((x <<28) & 0xf0000000)) | (y & 0xfffffff);
> +}
> +
> +/* { dg-final { scan-assembler-times {\tbfi\t} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
> index 109f989..a2fb31c 100644
> --- a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
> +++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
> @@ -114,4 +114,5 @@ main (void)
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler-times "bfxil\\t" 18 } } */
> +/* { dg-final { scan-assembler-times "bfxil\\t" 7 } } */
> +/* { dg-final { scan-assembler-times "bfi\\t" 11 } } */
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [EXT] Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-04-11  9:04                 ` Richard Earnshaw (lists)
@ 2019-04-11 15:14                   ` Steve Ellcey
  2019-04-11 15:55                     ` Steve Ellcey
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Ellcey @ 2019-04-11 15:14 UTC (permalink / raw)
  To: sellcey, gcc-patches, Richard.Earnshaw, Wilco.Dijkstra
  Cc: Marcus.Shawcroft, james.greenhalgh, nd

On Thu, 2019-04-11 at 09:59 +0100, Richard Earnshaw (lists) wrote:
> 
> > 
> > 2018-04-10  Steve Ellcey  <sellcey@marvell.com>
> > 
> > 	PR rtl-optimization/87763
> > 	* config/aarch64/aarch64-protos.h
> > (aarch64_masks_and_shift_for_bfi_p):
> > 	New prototype.
> > 	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
> > 	New function.
> > 	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>5_shift):
> > 	New instruction.
> > 	(*aarch64_bfi<GPI:mode>5_shift_alt): Ditto.
> > 	(*aarch64_bfi<GPI:mode>4_noand): Ditto.
> > 	(*aarch64_bfi<GPI:mode>4_noand_alt): Ditto.
> > 	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.
> > 
> 
> You've removed the ..._noshift_alt variant.  That wasn't my intention,
> so perhaps you misunderstood what I was trying to say.
>
> The two versions are both needed, since the register tie is not
> orthogonal to the constants used in the masks and canonicalization will
> not generate a consistent ordering of the operands.

I started doing this and then convinced myself (perhaps incorrectly)
that I didn't need the alt version.  Operands 1 and 3 are registers
and Operands 2 and 4 are constants, so the only difference is in the 
call to aarch64_masks_and_shift_for_bfi_p.  Given that argument 2 to
this call is 0 this call should be the equivelent of ((x & MASK1) | (y
& MASK2)) and that should mean that:

aarch64_masks_and_shift_for_bfi_p(X,0,Y) ==
aarch64_masks_and_shift_for_bfi_p(Y,0,X)


Maybe I am wrong about that?  I will do some expirements.  My testing
did not find any cases in the testsuite where not having the _alt
version resulted in a bfi not being generated.

Steve Ellcey
sellcey@marvell.com
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-04-11 15:14                   ` [EXT] " Steve Ellcey
@ 2019-04-11 15:55                     ` Steve Ellcey
  2019-04-11 16:01                       ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Ellcey @ 2019-04-11 15:55 UTC (permalink / raw)
  To: sellcey, gcc-patches, Richard.Earnshaw, Wilco.Dijkstra
  Cc: Marcus.Shawcroft, james.greenhalgh, nd

On Thu, 2019-04-11 at 14:58 +0000, Steve Ellcey wrote:
> 
> > You've removed the ..._noshift_alt variant.  That wasn't my
> > intention,
> > so perhaps you misunderstood what I was trying to say.
> > 
> > The two versions are both needed, since the register tie is not
> > orthogonal to the constants used in the masks and canonicalization
> > will
> > not generate a consistent ordering of the operands.
> 
> I started doing this and then convinced myself (perhaps incorrectly)
> that I didn't need the alt version.  Operands 1 and 3 are registers
> and Operands 2 and 4 are constants, so the only difference is in the 
> call to aarch64_masks_and_shift_for_bfi_p.  Given that argument 2 to
> this call is 0 this call should be the equivelent of ((x & MASK1) |
> (y
> & MASK2)) and that should mean that:
> 
> aarch64_masks_and_shift_for_bfi_p(X,0,Y) ==
> aarch64_masks_and_shift_for_bfi_p(Y,0,X)
> 
> 
> Maybe I am wrong about that?  I will do some expirements.  My testing
> did not find any cases in the testsuite where not having the _alt
> version resulted in a bfi not being generated.

OK, I think I see where my idea that I didn't need both versions is
wrong.  There is an extra check on the final argument that is not done
on the initial argument.  Here is the patch that I am testing, I think
it is what you have in mind.  It looks wierd having arguments 3 and 4
before 1 and 2, I think that is why I had it differently in my original
patch but if you prefer this version, that is fine with me.

OK to check in once my build/test is finished?



2018-04-11  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>4_noshift):
	New Instruction.


diff --git a/gcc/config/aarch64/aarch64.md
b/gcc/config/aarch64/aarch64.md
index e0df975..eac688a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5565,7 +5565,8 @@
 )
 
 ;; Like *aarch64_bfi<GPI:mode>5_shift but with no shifting, we are just
-;; copying the least significant bits of OP3 to OP0.
+;; copying the least significant bits of OP3 to OP0.  We need two versions
+;; of the instruction to handle different checks on the constant values.
 
 (define_insn "*aarch64_bfi<GPI:mode>4_noshift"
   [(set (match_operand:GPI 0 "register_operand" "=r")
@@ -5579,6 +5580,18 @@
   [(set_attr "type" "bfm")]
 )
 
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 3 "register_operand" "0")
+                          (match_operand:GPI 4 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 1 "register_operand" "r")
+                          (match_operand:GPI 2 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
+				      UINTVAL (operands[4]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"
+  [(set_attr "type" "bfm")]
+)
+
 (define_insn "*extr_insv_lower_reg<mode>"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-04-11 15:55                     ` Steve Ellcey
@ 2019-04-11 16:01                       ` Richard Earnshaw (lists)
  2019-04-11 18:50                         ` Steve Ellcey
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Earnshaw (lists) @ 2019-04-11 16:01 UTC (permalink / raw)
  To: Steve Ellcey, sellcey, gcc-patches, Wilco.Dijkstra
  Cc: Marcus.Shawcroft, james.greenhalgh, nd

On 11/04/2019 16:21, Steve Ellcey wrote:
> On Thu, 2019-04-11 at 14:58 +0000, Steve Ellcey wrote:
>>
>>> You've removed the ..._noshift_alt variant.  That wasn't my
>>> intention,
>>> so perhaps you misunderstood what I was trying to say.
>>>
>>> The two versions are both needed, since the register tie is not
>>> orthogonal to the constants used in the masks and canonicalization
>>> will
>>> not generate a consistent ordering of the operands.
>>
>> I started doing this and then convinced myself (perhaps incorrectly)
>> that I didn't need the alt version.  Operands 1 and 3 are registers
>> and Operands 2 and 4 are constants, so the only difference is in the 
>> call to aarch64_masks_and_shift_for_bfi_p.  Given that argument 2 to
>> this call is 0 this call should be the equivelent of ((x & MASK1) |
>> (y
>> & MASK2)) and that should mean that:
>>
>> aarch64_masks_and_shift_for_bfi_p(X,0,Y) ==
>> aarch64_masks_and_shift_for_bfi_p(Y,0,X)
>>
>>
>> Maybe I am wrong about that?  I will do some expirements.  My testing
>> did not find any cases in the testsuite where not having the _alt
>> version resulted in a bfi not being generated.
> 
> OK, I think I see where my idea that I didn't need both versions is
> wrong.  There is an extra check on the final argument that is not done
> on the initial argument.  Here is the patch that I am testing, I think
> it is what you have in mind.  It looks wierd having arguments 3 and 4
> before 1 and 2, I think that is why I had it differently in my original
> patch but if you prefer this version, that is fine with me.
> 
> OK to check in once my build/test is finished?
> 
> 
> 
> 2018-04-11  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>4_noshift):
> 	New Instruction.
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md
> index e0df975..eac688a 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5565,7 +5565,8 @@
>  )
>  
>  ;; Like *aarch64_bfi<GPI:mode>5_shift but with no shifting, we are just
> -;; copying the least significant bits of OP3 to OP0.
> +;; copying the least significant bits of OP3 to OP0.  We need two versions
> +;; of the instruction to handle different checks on the constant values.
>  
>  (define_insn "*aarch64_bfi<GPI:mode>4_noshift"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
> @@ -5579,6 +5580,18 @@
>    [(set_attr "type" "bfm")]
>  )
>  
> +(define_insn "*aarch64_bfi<GPI:mode>4_noshift"

Please add _alt at the end, to distinguish from the insn above.

Otherwise OK.

R.

> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 3 "register_operand" "0")
> +                          (match_operand:GPI 4 "const_int_operand" "n"))
> +                 (and:GPI (match_operand:GPI 1 "register_operand" "r")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
> +				      UINTVAL (operands[4]))"
> +  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"
> +  [(set_attr "type" "bfm")]
> +)
> +
>  (define_insn "*extr_insv_lower_reg<mode>"
>    [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
>  			  (match_operand 1 "const_int_operand" "n")
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-04-11 16:01                       ` Richard Earnshaw (lists)
@ 2019-04-11 18:50                         ` Steve Ellcey
  0 siblings, 0 replies; 21+ messages in thread
From: Steve Ellcey @ 2019-04-11 18:50 UTC (permalink / raw)
  To: sellcey, gcc-patches, Richard.Earnshaw, Wilco.Dijkstra
  Cc: Marcus.Shawcroft, james.greenhalgh, nd

On Thu, 2019-04-11 at 17:00 +0100, Richard Earnshaw (lists) wrote:
> 
> 
> Please add _alt at the end, to distinguish from the insn above.
> 
> Otherwise OK.

I added _alt, I also had to move the "0" contraint and the "r"
constraint.  I had "0" with operand 3 instead of operand 1 and
that caused a couple of test failures.  I fixed it and the regressions
went away.  I also had to tweak gcc.target/aarch64/combine_bfxil.c,
some of the bfxil instructions became bfi instructions so I updated
the scan checks.

Steve Ellcey

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch]  PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-01-29  0:22   ` Steve Ellcey
  2019-01-29  0:47     ` Jakub Jelinek
@ 2019-02-04 16:55     ` Steve Ellcey
  1 sibling, 0 replies; 21+ messages in thread
From: Steve Ellcey @ 2019-02-04 16:55 UTC (permalink / raw)
  To: Marcus.Shawcroft, james.greenhalgh, jakub, Richard.Earnshaw,
	richard.sandiford
  Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2341 bytes --]

Ping.  And adding Aarch64 Maintainers.


On Mon, 2019-01-28 at 16:11 -0800, Steve Ellcey wrote:
> On Sat, 2019-01-26 at 00:00 +0100, Jakub Jelinek wrote:
> > 
> > > +  /* Verify that there is no overlap in what bits are set in the
> > > two masks.  */
> > > +  if ((m1 + m2 + 1) != 0)
> > > +    return false;
> > 
> > Wouldn't that be clearer to test
> >   if (m1 + m2 != HOST_WIDE_INT_1U)
> >     return false;
> > ?
> > You IMHO also should test
> >   if ((m1 & m2) != 0)
> >     return false;
> 
> I think you mean HOST_WIDE_INT_M1U, not HOST_WIDE_INT_1U.  That does
> seem clearer and I made that change as well as adding the second
> test.
> 
> > > 
> > > +  t = popcount_hwi (m2);
> > > +  t = (1 << t) - 1;
> > 
> > This should be (HOST_WIDE_INT_1U << t), otherwise if popcount of m2
> > is
> > > = 32, there will be UB.
> 
> Fixed.
> 
> > As mentioned in rs6000.md, I believe you also need a similar
> > pattern where
> > the two ANDs are swapped, because they have the same priority.
> 
> I fixed the long lines in aarch64.md and I added a second pattern for
> the *aarch64_bfi<GPI:mode>4_noshift pattern, swapping the order of
> the
> IOR operands.  I did not swap the AND operands, I assume the compiler
> would ensure that the register was always before the constant or that
> it would check both orderings.
> 
> I tried adding a second version of *aarch64_bfi<GPI:mode>5_shift as
> well but when I tested it, it never got used during a bootstrap build
> or a GCC test run so I decided it was not needed.
> 
> Tested on aarch64 with a bootstrap build and a GCC testsuite run.
> OK to checkin?
> 
> 
> Steve Ellcey
> sellcey@marvell.com
> 
> 
> 2018-01-28  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* config/aarch64/aarch64-protos.h
> (aarch64_masks_and_shift_for_bfi_p):
> 	New prototype.
> 	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
> 	New function.
> 	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>5_shift):
> 	New instruction.
> 	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.
> 	(*aarch64_bfi<GPI:mode>4_noshift_alt): Ditto.
> 
> 
> 2018-01-28  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
> 	to bfi.
> 
> 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: combine.patch --]
[-- Type: text/x-patch; name="combine.patch", Size: 5357 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35f33b..b6c0d0a8eb6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7c453cdad0..9a3080b5db8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9330,6 +9330,41 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	     & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+				   unsigned HOST_WIDE_INT mask1,
+				   unsigned HOST_WIDE_INT shft_amnt,
+				   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if ((mask1 + mask2) != HOST_WIDE_INT_M1U)
+    return false;
+  if ((mask1 & mask2) != 0)
+    return false;
+
+  /* Verify that the shift amount is less than the mode size.  */
+  if (shft_amnt >= GET_MODE_BITSIZE (mode))
+    return false;
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+     least significant bits after shifting by creating a mask 't' based on
+     the number of bits set in mask2 and the shift amount for mask2 and
+     comparing that to the actual mask2.  */
+  t = popcount_hwi (mask2);
+  t = (HOST_WIDE_INT_1U << t) - 1;
+  t = t << shft_amnt;
+  if (t != mask2)
+    return false;
+  
+  return true;
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
    is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7f6fe0f135..6b5339092ba 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5476,6 +5476,56 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.  A similar instruction
+;;  with the two parts of the IOR swapped around was never triggered
+;;  in a bootstrap build and test of GCC so it was not included.
+
+(define_insn "*aarch64_bfi<GPI:mode>5_shift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (ashift:GPI
+                           (match_operand:GPI 3 "register_operand" "r")
+                           (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))
+                          (match_operand:GPI 5 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
+				      UINTVAL (operands[4]),
+				      UINTVAL(operands[5]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %P5"
+  [(set_attr "type" "bfm")]
+)
+
+;; Like the above instruction but with no shifting, we are just copying
+;; the least significant bits of OP3 to OP0.  In this case we do need
+;; two versions of the instruction to handle different checks on the
+;; constant values.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
+				      UINTVAL (operands[4]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"
+  [(set_attr "type" "bfm")]
+)
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift_alt"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "0")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[4]), 0,
+				      UINTVAL (operands[2]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"
+  [(set_attr "type" "bfm")]
+)
+
 (define_insn "*extr_insv_lower_reg<mode>"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: combine-test.patch --]
[-- Type: text/x-patch; name="combine-test.patch", Size: 485 bytes --]

diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
index 109f989a2f0..c3b5fc58800 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -114,4 +114,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bfxil\\t" 18 } } */
+/* { dg-final { scan-assembler-times "bfxil\\t" 3 } } */
+/* { dg-final { scan-assembler-times "bfi\\t" 15 } } */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch]  PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-01-29  0:22   ` Steve Ellcey
@ 2019-01-29  0:47     ` Jakub Jelinek
  2019-02-04 16:55     ` Steve Ellcey
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Jelinek @ 2019-01-29  0:47 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: Richard.Earnshaw, gcc-patches

On Tue, Jan 29, 2019 at 12:11:46AM +0000, Steve Ellcey wrote:
> > As mentioned in rs6000.md, I believe you also need a similar pattern where
> > the two ANDs are swapped, because they have the same priority.
> 
> I fixed the long lines in aarch64.md and I added a second pattern for
> the *aarch64_bfi<GPI:mode>4_noshift pattern, swapping the order of the
> IOR operands.  I did not swap the AND operands, I assume the compiler
> would ensure that the register was always before the constant or that
> it would check both orderings.

Yes, you can look at commutative_operand_precedence and
swap_commutative_operands_p.  The issue is just if commutative_operand_precedence
of both operands is equal (and that function doesn't recurse for
suboperands).

> I tried adding a second version of *aarch64_bfi<GPI:mode>5_shift as
> well but when I tested it, it never got used during a bootstrap build
> or a GCC test run so I decided it was not needed.

I'll try tomorrow if I can construct a testcase or not.

In any case, you want an aarch64 maintainer to ack this.

Thanks for working on this.

	Jakub

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch]  PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-01-25 23:22 ` Jakub Jelinek
@ 2019-01-29  0:22   ` Steve Ellcey
  2019-01-29  0:47     ` Jakub Jelinek
  2019-02-04 16:55     ` Steve Ellcey
  0 siblings, 2 replies; 21+ messages in thread
From: Steve Ellcey @ 2019-01-29  0:22 UTC (permalink / raw)
  To: jakub, Richard.Earnshaw; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2081 bytes --]

On Sat, 2019-01-26 at 00:00 +0100, Jakub Jelinek wrote:
> 
> > +  /* Verify that there is no overlap in what bits are set in the two masks.  */
> > +  if ((m1 + m2 + 1) != 0)
> > +    return false;
> 
> Wouldn't that be clearer to test
>   if (m1 + m2 != HOST_WIDE_INT_1U)
>     return false;
> ?
> You IMHO also should test
>   if ((m1 & m2) != 0)
>     return false;

I think you mean HOST_WIDE_INT_M1U, not HOST_WIDE_INT_1U.  That does
seem clearer and I made that change as well as adding the second test.

> > 
> > +  t = popcount_hwi (m2);
> > +  t = (1 << t) - 1;
> 
> This should be (HOST_WIDE_INT_1U << t), otherwise if popcount of m2 is
> > = 32, there will be UB.

Fixed.

> As mentioned in rs6000.md, I believe you also need a similar pattern where
> the two ANDs are swapped, because they have the same priority.

I fixed the long lines in aarch64.md and I added a second pattern for
the *aarch64_bfi<GPI:mode>4_noshift pattern, swapping the order of the
IOR operands.  I did not swap the AND operands, I assume the compiler
would ensure that the register was always before the constant or that
it would check both orderings.

I tried adding a second version of *aarch64_bfi<GPI:mode>5_shift as
well but when I tested it, it never got used during a bootstrap build
or a GCC test run so I decided it was not needed.

Tested on aarch64 with a bootstrap build and a GCC testsuite run.
OK to checkin?


Steve Ellcey
sellcey@marvell.com


2018-01-28  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
	New prototype.
	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
	New function.
	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>5_shift):
	New instruction.
	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.
	(*aarch64_bfi<GPI:mode>4_noshift_alt): Ditto.


2018-01-28  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
	to bfi.



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: combine.patch --]
[-- Type: text/x-patch; name="combine.patch", Size: 5357 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35f33b..b6c0d0a8eb6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7c453cdad0..9a3080b5db8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9330,6 +9330,41 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	     & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+				   unsigned HOST_WIDE_INT mask1,
+				   unsigned HOST_WIDE_INT shft_amnt,
+				   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if ((mask1 + mask2) != HOST_WIDE_INT_M1U)
+    return false;
+  if ((mask1 & mask2) != 0)
+    return false;
+
+  /* Verify that the shift amount is less than the mode size.  */
+  if (shft_amnt >= GET_MODE_BITSIZE (mode))
+    return false;
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+     least significant bits after shifting by creating a mask 't' based on
+     the number of bits set in mask2 and the shift amount for mask2 and
+     comparing that to the actual mask2.  */
+  t = popcount_hwi (mask2);
+  t = (HOST_WIDE_INT_1U << t) - 1;
+  t = t << shft_amnt;
+  if (t != mask2)
+    return false;
+  
+  return true;
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
    is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7f6fe0f135..6b5339092ba 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5476,6 +5476,56 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.  A similar instruction
+;;  with the two parts of the IOR swapped around was never triggered
+;;  in a bootstrap build and test of GCC so it was not included.
+
+(define_insn "*aarch64_bfi<GPI:mode>5_shift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (ashift:GPI
+                           (match_operand:GPI 3 "register_operand" "r")
+                           (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))
+                          (match_operand:GPI 5 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]),
+				      UINTVAL (operands[4]),
+				      UINTVAL(operands[5]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %P5"
+  [(set_attr "type" "bfm")]
+)
+
+;; Like the above instruction but with no shifting, we are just copying
+;; the least significant bits of OP3 to OP0.  In this case we do need
+;; two versions of the instruction to handle different checks on the
+;; constant values.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[2]), 0,
+				      UINTVAL (operands[4]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4"
+  [(set_attr "type" "bfm")]
+)
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift_alt"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "0")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_bfi_p (<MODE>mode, UINTVAL (operands[4]), 0,
+				      UINTVAL (operands[2]))"
+  "bfi\t%<GPI:w>0, %<GPI:w>1, 0, %P2"
+  [(set_attr "type" "bfm")]
+)
+
 (define_insn "*extr_insv_lower_reg<mode>"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: combine-test.patch --]
[-- Type: text/x-patch; name="combine-test.patch", Size: 485 bytes --]

diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
index 109f989a2f0..c3b5fc58800 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -114,4 +114,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bfxil\\t" 18 } } */
+/* { dg-final { scan-assembler-times "bfxil\\t" 3 } } */
+/* { dg-final { scan-assembler-times "bfi\\t" 15 } } */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch]  PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-01-25  0:11 Steve Ellcey
  2019-01-25 10:40 ` Richard Earnshaw (lists)
@ 2019-01-25 23:22 ` Jakub Jelinek
  2019-01-29  0:22   ` Steve Ellcey
  1 sibling, 1 reply; 21+ messages in thread
From: Jakub Jelinek @ 2019-01-25 23:22 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: gcc-patches

On Thu, Jan 24, 2019 at 11:17:45PM +0000, Steve Ellcey wrote:
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -9294,6 +9294,44 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
>  	     & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
>  }
>  
> +/* Return true if the masks and a shift amount from an RTX of the form
> +   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
> +   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
> +
> +bool
> +aarch64_masks_and_shift_for_aarch64_bfi_p (scalar_int_mode mode, rtx mask1,
> +					   rtx shft_amnt, rtx mask2)
> +{
> +  unsigned HOST_WIDE_INT m1, m2, s, t;
> +
> +  if (!CONST_INT_P (mask1) || !CONST_INT_P (mask2) || !CONST_INT_P (shft_amnt))
> +    return false;
> +
> +  m1 = UINTVAL (mask1);
> +  m2 = UINTVAL (mask2);
> +  s = UINTVAL (shft_amnt);
> +
> +  /* Verify that there is no overlap in what bits are set in the two masks.  */
> +  if ((m1 + m2 + 1) != 0)
> +    return false;

Wouldn't that be clearer to test
  if (m1 + m2 != HOST_WIDE_INT_1U)
    return false;
?
You IMHO also should test
  if ((m1 & m2) != 0)
    return false;

> +
> +  /* Verify that the shift amount is less than the mode size.  */
> +  if (s >= GET_MODE_BITSIZE (mode))
> +    return false;
> +
> +  /* Verify that the mask being shifted is contigious and would be in the
> +     least significant bits after shifting by creating a mask 't' based on
> +     the number of bits set in mask2 and the shift amount for mask2 and
> +     comparing that to the actual mask2.  */
> +  t = popcount_hwi (m2);
> +  t = (1 << t) - 1;

This should be (HOST_WIDE_INT_1U << t), otherwise if popcount of m2 is
>= 32, there will be UB.

> +  t = t << s;
> +  if (t != m2)
> +    return false;
> +  
> +  return true;
> +}
> +

> +;;  Match a bfi instruction where the shift of OP3 means that we are
> +;;  actually copying the least significant bits of OP3 into OP0 by way
> +;;  of the AND masks and the IOR instruction.
> +
> +(define_insn "*aarch64_bfi<GPI:mode>4_shift"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))
> +                 (and:GPI (ashift:GPI
> +                           (match_operand:GPI 3 "register_operand" "r")
> +                           (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))
> +                          (match_operand:GPI 5 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_aarch64_bfi_p (<MODE>mode, operands[2], operands[4], operands[5])"

Too long lines.

> +{
> +  return "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %P5";
> +}
> +  [(set_attr "type" "bfm")]
> +)

As mentioned in rs6000.md, I believe you also need a similar pattern where
the two ANDs are swapped, because they have the same priority.

> +
> +;; Like the above instruction but with no shifting, we are just copying the
> +;; least significant bits of OP3 to OP0.
> +
> +(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))
> +                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
> +                          (match_operand:GPI 4 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_aarch64_bfi_p (<MODE>mode, operands[2], const0_rtx, operands[4])"

Too long line.
I guess this one has similar issue that you might need two patterns for both
AND orderings (though the "0" needs to be on the right argument in each
case).

	Jakub

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-01-25 10:40 ` Richard Earnshaw (lists)
@ 2019-01-25 22:16   ` Steve Ellcey
  0 siblings, 0 replies; 21+ messages in thread
From: Steve Ellcey @ 2019-01-25 22:16 UTC (permalink / raw)
  To: gcc-patches, Richard.Earnshaw

On Fri, 2019-01-25 at 10:32 +0000, Richard Earnshaw (lists) wrote:
> 
> Do we need another variant pattern to handle the case where the
> insertion is into the top of the destination?  In that case the
> immediate mask on the shifted operand is technically redundant as the
> bottom bits are known zero and there are no top bits.

I am not sure about this.  Do you have an example where this might be
needed?

I updated my patch to address your other comments and have bootstrapped
and tested this on aarch64.  Does this version look good for checkin?
I had to modify the two tests because with my new instructions we
sometimes generate bfi instructions where we used to generate bfxil
instructions.  The only regression this is fixing is combine_bfi_1.c,
combine_bfxil.c was passing before but still needed to be changed in
order to keep passing.

Steve Ellcey
sellcey@marvell.com


2018-01-25  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
	New prototype.
	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
	New function.
	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>4_shift):
	New instruction.
	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.


2018-01-25  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* gcc.target/aarch64/combine_bfi_1.c: Change some bfxil checks
	to bfi.
	* gcc.target/aarch64/combine_bfxil.c: Ditto.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64
  2019-01-25  0:11 Steve Ellcey
@ 2019-01-25 10:40 ` Richard Earnshaw (lists)
  2019-01-25 22:16   ` Steve Ellcey
  2019-01-25 23:22 ` Jakub Jelinek
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Earnshaw (lists) @ 2019-01-25 10:40 UTC (permalink / raw)
  To: Steve Ellcey, gcc-patches

On 24/01/2019 23:17, Steve Ellcey wrote:
> Here is my attempt at creating a couple of new instructions to
> generate more bfi instructions on aarch64.  I haven't finished
> testing this but it helps with gcc.target/aarch64/combine_bfi_1.c.
> 
> Before I went any further with it I wanted to see if anyone
> else was working on something like this and if this seems like
> a reasonable approach.
> 
> Steve Ellcey
> sellcey@marvell.com
> 
> 
> 2018-01-24  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* config/aarch64/aarch64-protos.h
> 	(aarch64_masks_and_shift_for_aarch64_bfi_p): New prototype.
> 	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_aarch64_bfi_p):
> 	New function.
> 	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>4_shift):
> 	New instruction.
> 	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.
> 
> 
> 
> combine.patch
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index b035e35..ec90053 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -429,6 +429,7 @@ bool aarch64_label_mentioned_p (rtx);
>  void aarch64_declare_function_name (FILE *, const char*, tree);
>  bool aarch64_legitimate_pic_operand_p (rtx);
>  bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
> +bool aarch64_masks_and_shift_for_aarch64_bfi_p (scalar_int_mode, rtx, rtx, rtx);
>  bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
>  bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
>  opt_machine_mode aarch64_sve_pred_mode (unsigned int);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5df5a8b..69cc69f 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -9294,6 +9294,44 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
>  	     & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
>  }
>  
> +/* Return true if the masks and a shift amount from an RTX of the form
> +   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
> +   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
> +
> +bool
> +aarch64_masks_and_shift_for_aarch64_bfi_p (scalar_int_mode mode, rtx mask1,
> +					   rtx shft_amnt, rtx mask2)
> +{
> +  unsigned HOST_WIDE_INT m1, m2, s, t;
> +
> +  if (!CONST_INT_P (mask1) || !CONST_INT_P (mask2) || !CONST_INT_P (shft_amnt))
> +    return false;

Why not change the callers to pass HWI values directly?  They all have
either immediate-only restrictions in the patters or pass a constructed
immediate value anyway.

> +
> +  m1 = UINTVAL (mask1);
> +  m2 = UINTVAL (mask2);
> +  s = UINTVAL (shft_amnt);
> +
> +  /* Verify that there is no overlap in what bits are set in the two masks.  */
> +  if ((m1 + m2 + 1) != 0)
> +    return false;
> +
> +  /* Verify that the shift amount is less than the mode size.  */
> +  if (s >= GET_MODE_BITSIZE (mode))
> +    return false;
> +
> +  /* Verify that the mask being shifted is contigious and would be in the

contiguous

> +     least significant bits after shifting by creating a mask 't' based on
> +     the number of bits set in mask2 and the shift amount for mask2 and
> +     comparing that to the actual mask2.  */
> +  t = popcount_hwi (m2);
> +  t = (1 << t) - 1;
> +  t = t << s;
> +  if (t != m2)
> +    return false;
> +  
> +  return true;
> +}
> +
>  /* Calculate the cost of calculating X, storing it in *COST.  Result
>     is true if the total cost of the operation has now been calculated.  */
>  static bool
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index b7f6fe0..e1f526b 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5476,6 +5476,41 @@
>    [(set_attr "type" "bfm")]
>  )
>  
> +;;  Match a bfi instruction where the shift of OP3 means that we are
> +;;  actually copying the least significant bits of OP3 into OP0 by way
> +;;  of the AND masks and the IOR instruction.
> +
> +(define_insn "*aarch64_bfi<GPI:mode>4_shift"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))
> +                 (and:GPI (ashift:GPI
> +                           (match_operand:GPI 3 "register_operand" "r")
> +                           (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))
> +                          (match_operand:GPI 5 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_aarch64_bfi_p (<MODE>mode, operands[2], operands[4], operands[5])"
> +{
> +  return "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %P5";

You don't need C code for this instruction printer, just use a textual
pattern.

> +}
> +  [(set_attr "type" "bfm")]
> +)
> +
> +;; Like the above instruction but with no shifting, we are just copying the
> +;; least significant bits of OP3 to OP0.
> +
> +(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +                          (match_operand:GPI 2 "const_int_operand" "n"))
> +                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
> +                          (match_operand:GPI 4 "const_int_operand" "n"))))]
> +  "aarch64_masks_and_shift_for_aarch64_bfi_p (<MODE>mode, operands[2], const0_rtx, operands[4])"
> +{
> +  return "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4";
> +}
Likewise.

Do we need another variant pattern to handle the case where the
insertion is into the top of the destination?  In that case the
immediate mask on the shifted operand is technically redundant as the
bottom bits are known zero and there are no top bits.


> +  [(set_attr "type" "bfm")]
> +)
> +
>  (define_insn "*extr_insv_lower_reg<mode>"
>    [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
>  			  (match_operand 1 "const_int_operand" "n")
> 

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Patch]  PR rtl-optimization/87763 - generate more bfi instructions on aarch64
@ 2019-01-25  0:11 Steve Ellcey
  2019-01-25 10:40 ` Richard Earnshaw (lists)
  2019-01-25 23:22 ` Jakub Jelinek
  0 siblings, 2 replies; 21+ messages in thread
From: Steve Ellcey @ 2019-01-25  0:11 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 785 bytes --]

Here is my attempt at creating a couple of new instructions to
generate more bfi instructions on aarch64.  I haven't finished
testing this but it helps with gcc.target/aarch64/combine_bfi_1.c.

Before I went any further with it I wanted to see if anyone
else was working on something like this and if this seems like
a reasonable approach.

Steve Ellcey
sellcey@marvell.com


2018-01-24  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* config/aarch64/aarch64-protos.h
	(aarch64_masks_and_shift_for_aarch64_bfi_p): New prototype.
	* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_aarch64_bfi_p):
	New function.
	* config/aarch64/aarch64.md (*aarch64_bfi<GPI:mode>4_shift):
	New instruction.
	(*aarch64_bfi<GPI:mode>4_noshift): Ditto.



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: combine.patch --]
[-- Type: text/x-patch; name="combine.patch", Size: 4407 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35..ec90053 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,7 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_aarch64_bfi_p (scalar_int_mode, rtx, rtx, rtx);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5df5a8b..69cc69f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9294,6 +9294,44 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	     & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_aarch64_bfi_p (scalar_int_mode mode, rtx mask1,
+					   rtx shft_amnt, rtx mask2)
+{
+  unsigned HOST_WIDE_INT m1, m2, s, t;
+
+  if (!CONST_INT_P (mask1) || !CONST_INT_P (mask2) || !CONST_INT_P (shft_amnt))
+    return false;
+
+  m1 = UINTVAL (mask1);
+  m2 = UINTVAL (mask2);
+  s = UINTVAL (shft_amnt);
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if ((m1 + m2 + 1) != 0)
+    return false;
+
+  /* Verify that the shift amount is less than the mode size.  */
+  if (s >= GET_MODE_BITSIZE (mode))
+    return false;
+
+  /* Verify that the mask being shifted is contigious and would be in the
+     least significant bits after shifting by creating a mask 't' based on
+     the number of bits set in mask2 and the shift amount for mask2 and
+     comparing that to the actual mask2.  */
+  t = popcount_hwi (m2);
+  t = (1 << t) - 1;
+  t = t << s;
+  if (t != m2)
+    return false;
+  
+  return true;
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
    is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7f6fe0..e1f526b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5476,6 +5476,41 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_shift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (ashift:GPI
+                           (match_operand:GPI 3 "register_operand" "r")
+                           (match_operand:GPI 4 "aarch64_simd_shift_imm_<mode>" "n"))
+                          (match_operand:GPI 5 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_aarch64_bfi_p (<MODE>mode, operands[2], operands[4], operands[5])"
+{
+  return "bfi\t%<GPI:w>0, %<GPI:w>3, %4, %P5";
+}
+  [(set_attr "type" "bfm")]
+)
+
+;; Like the above instruction but with no shifting, we are just copying the
+;; least significant bits of OP3 to OP0.
+
+(define_insn "*aarch64_bfi<GPI:mode>4_noshift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+                          (match_operand:GPI 2 "const_int_operand" "n"))
+                 (and:GPI (match_operand:GPI 3 "register_operand" "r")
+                          (match_operand:GPI 4 "const_int_operand" "n"))))]
+  "aarch64_masks_and_shift_for_aarch64_bfi_p (<MODE>mode, operands[2], const0_rtx, operands[4])"
+{
+  return "bfi\t%<GPI:w>0, %<GPI:w>3, 0, %P4";
+}
+  [(set_attr "type" "bfm")]
+)
+
 (define_insn "*extr_insv_lower_reg<mode>"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-04-11 18:09 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-05 21:12 [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64 Wilco Dijkstra
2019-02-06 23:42 ` Steve Ellcey
2019-02-07 18:13   ` Wilco Dijkstra
2019-02-11 18:46     ` Steve Ellcey
2019-02-26 16:46       ` Steve Ellcey
2019-03-14 22:54         ` Steve Ellcey
2019-04-01 17:23           ` Steve Ellcey
2019-04-10 10:19             ` Richard Earnshaw (lists)
2019-04-10 20:32               ` Steve Ellcey
2019-04-11  9:04                 ` Richard Earnshaw (lists)
2019-04-11 15:14                   ` [EXT] " Steve Ellcey
2019-04-11 15:55                     ` Steve Ellcey
2019-04-11 16:01                       ` Richard Earnshaw (lists)
2019-04-11 18:50                         ` Steve Ellcey
  -- strict thread matches above, loose matches on Subject: below --
2019-01-25  0:11 Steve Ellcey
2019-01-25 10:40 ` Richard Earnshaw (lists)
2019-01-25 22:16   ` Steve Ellcey
2019-01-25 23:22 ` Jakub Jelinek
2019-01-29  0:22   ` Steve Ellcey
2019-01-29  0:47     ` Jakub Jelinek
2019-02-04 16:55     ` Steve Ellcey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).