[PATCH] aarch64: Don't include vec_select high-half in SIMD add cost

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] aarch64: Don't include vec_select high-half in SIMD add cost
@ 2021-07-29  9:22 Jonathan Wright
  2021-08-04 15:52 ` [PATCH V2] " Jonathan Wright
  0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Wright @ 2021-07-29  9:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 992 bytes --]

Hi,

The Neon add-long/add-widen instructions can select the top or bottom
half of the operand registers. This selection does not change the
cost of the underlying instruction and this should be reflected by
the RTL cost function.

This patch adds RTL tree traversal in the Neon add cost function to
match vec_select high-half of its operands. This traversal prevents
the cost of the vec_select from being added into the cost of the
subtract - meaning that these instructions can now be emitted in the
combine pass as they are no longer deemed prohibitively expensive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-28  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
	of vec_select high-half from being added into Neon add cost.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vaddX_high_cost.c: New test.

[-- Attachment #2: rb14710.patch --]
[-- Type: application/octet-stream, Size: 2675 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a49672afe785e3517250d324468edacceab5c9d3..61982ccbf03e096c6435fc5e1e15345fb0abe4bd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13026,6 +13026,23 @@ cost_minus:
 	op1 = XEXP (x, 1);
 
 cost_plus:
+	if (VECTOR_MODE_P (mode))
+	  {
+	    /* ADDL2 and ADDW2.  */
+	    unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+	    if (vec_flags & VEC_ADVSIMD)
+	      {
+		/* The select-operand-high-half versions of the add instruction
+		   have the same cost as the regular three vector version -
+		   don't add the costs of the select into the costs of the add.
+		   */
+		if (aarch64_vec_select_high_operand_p (op0))
+		  op0 = XEXP (XEXP (op0, 0), 0);
+		if (aarch64_vec_select_high_operand_p (op1))
+		  op1 = XEXP (XEXP (op1, 0), 0);
+	      }
+	  }
+
 	if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE
 	    || GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMM_COMPARE)
 	  {
diff --git a/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c b/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c
new file mode 100644
index 0000000000000000000000000000000000000000..43f28d597a94d8aceac87ef2240a50cc56c07240
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+#include <arm_neon.h>
+
+#define TEST_ADDL(rettype, intype, ts, rs) \
+  rettype test_vaddl_ ## ts (intype a, intype b, intype c) \
+	{ \
+		rettype t0 = vaddl_ ## ts (vget_high_ ## ts (a), \
+					   vget_high_ ## ts (c)); \
+		rettype t1 = vaddl_ ## ts (vget_high_ ## ts (b), \
+					   vget_high_ ## ts (c)); \
+		return vaddq ## _ ## rs (t0, t1); \
+	}
+
+TEST_ADDL (int16x8_t, int8x16_t, s8, s16)
+TEST_ADDL (uint16x8_t, uint8x16_t, u8, u16)
+TEST_ADDL (int32x4_t, int16x8_t, s16, s32)
+TEST_ADDL (uint32x4_t, uint16x8_t, u16, u32)
+TEST_ADDL (int64x2_t, int32x4_t, s32, s64)
+TEST_ADDL (uint64x2_t, uint32x4_t, u32, u64)
+
+#define TEST_ADDW(rettype, intype, intypel, ts, rs) \
+  rettype test_vaddw_ ## ts (intype a, intype b, intypel c) \
+	{ \
+		rettype t0 = vaddw_ ## ts (a, vget_high_ ## ts (c)); \
+		rettype t1 = vaddw_ ## ts (b, vget_high_ ## ts (c)); \
+		return vaddq ## _ ## rs (t0, t1); \
+	}
+
+TEST_ADDW (int16x8_t, int16x8_t, int8x16_t, s8, s16)
+TEST_ADDW (uint16x8_t, uint16x8_t, uint8x16_t, u8, u16)
+TEST_ADDW (int32x4_t, int32x4_t, int16x8_t, s16, s32)
+TEST_ADDW (uint32x4_t, uint32x4_t, uint16x8_t, u16, u32)
+TEST_ADDW (int64x2_t, int64x2_t, int32x4_t, s32, s64)
+TEST_ADDW (uint64x2_t, uint64x2_t, uint32x4_t, u32, u64)
+
+/* { dg-final { scan-assembler-not "dup\\t" } } */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH V2] aarch64: Don't include vec_select high-half in SIMD add cost
  2021-07-29  9:22 [PATCH] aarch64: Don't include vec_select high-half in SIMD add cost Jonathan Wright
@ 2021-08-04 15:52 ` Jonathan Wright
  2021-08-05 10:43   ` Richard Sandiford
  0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Wright @ 2021-08-04 15:52 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 1817 bytes --]

Hi,

V2 of this patch uses the same approach as that just implemented
for the multiply high-half cost patch.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan 

---

gcc/ChangeLog:

2021-07-28  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
	of vec_select high-half from being added into Neon add cost.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vaddX_high_cost.c: New test.

From: Jonathan Wright
Sent: 29 July 2021 10:22
To: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: [PATCH] aarch64: Don't include vec_select high-half in SIMD add cost 
 
Hi,

The Neon add-long/add-widen instructions can select the top or bottom
half of the operand registers. This selection does not change the
cost of the underlying instruction and this should be reflected by
the RTL cost function.

This patch adds RTL tree traversal in the Neon add cost function to
match vec_select high-half of its operands. This traversal prevents
the cost of the vec_select from being added into the cost of the
subtract - meaning that these instructions can now be emitted in the
combine pass as they are no longer deemed prohibitively expensive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-28  Jonathan Wright  <jonathan.wright@arm.com>

        * config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
        of vec_select high-half from being added into Neon add cost.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/vaddX_high_cost.c: New test.

[-- Attachment #2: rb14710.patch --]
[-- Type: application/octet-stream, Size: 2599 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 10a436ad7e6fa6c5de706ee5abbdc6fb3d268076..cc92cc9c208e63f262c22c7fe8e6915825884775 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13161,6 +13161,21 @@ cost_minus:
 	op1 = XEXP (x, 1);
 
 cost_plus:
+	if (VECTOR_MODE_P (mode))
+	  {
+	    /* ADDL2 and ADDW2.  */
+	    unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+	    if (vec_flags & VEC_ADVSIMD)
+	      {
+		/* The select-operand-high-half versions of the add instruction
+		   have the same cost as the regular three vector version -
+		   don't add the costs of the select into the costs of the add.
+		   */
+		op0 = aarch64_strip_extend_vec_half (op0);
+		op1 = aarch64_strip_extend_vec_half (op1);
+	      }
+	  }
+
 	if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE
 	    || GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMM_COMPARE)
 	  {
diff --git a/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c b/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c
new file mode 100644
index 0000000000000000000000000000000000000000..43f28d597a94d8aceac87ef2240a50cc56c07240
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+#include <arm_neon.h>
+
+#define TEST_ADDL(rettype, intype, ts, rs) \
+  rettype test_vaddl_ ## ts (intype a, intype b, intype c) \
+	{ \
+		rettype t0 = vaddl_ ## ts (vget_high_ ## ts (a), \
+					   vget_high_ ## ts (c)); \
+		rettype t1 = vaddl_ ## ts (vget_high_ ## ts (b), \
+					   vget_high_ ## ts (c)); \
+		return vaddq ## _ ## rs (t0, t1); \
+	}
+
+TEST_ADDL (int16x8_t, int8x16_t, s8, s16)
+TEST_ADDL (uint16x8_t, uint8x16_t, u8, u16)
+TEST_ADDL (int32x4_t, int16x8_t, s16, s32)
+TEST_ADDL (uint32x4_t, uint16x8_t, u16, u32)
+TEST_ADDL (int64x2_t, int32x4_t, s32, s64)
+TEST_ADDL (uint64x2_t, uint32x4_t, u32, u64)
+
+#define TEST_ADDW(rettype, intype, intypel, ts, rs) \
+  rettype test_vaddw_ ## ts (intype a, intype b, intypel c) \
+	{ \
+		rettype t0 = vaddw_ ## ts (a, vget_high_ ## ts (c)); \
+		rettype t1 = vaddw_ ## ts (b, vget_high_ ## ts (c)); \
+		return vaddq ## _ ## rs (t0, t1); \
+	}
+
+TEST_ADDW (int16x8_t, int16x8_t, int8x16_t, s8, s16)
+TEST_ADDW (uint16x8_t, uint16x8_t, uint8x16_t, u8, u16)
+TEST_ADDW (int32x4_t, int32x4_t, int16x8_t, s16, s32)
+TEST_ADDW (uint32x4_t, uint32x4_t, uint16x8_t, u16, u32)
+TEST_ADDW (int64x2_t, int64x2_t, int32x4_t, s32, s64)
+TEST_ADDW (uint64x2_t, uint64x2_t, uint32x4_t, u32, u64)
+
+/* { dg-final { scan-assembler-not "dup\\t" } } */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH V2] aarch64: Don't include vec_select high-half in SIMD add cost
  2021-08-04 15:52 ` [PATCH V2] " Jonathan Wright
@ 2021-08-05 10:43   ` Richard Sandiford
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Sandiford @ 2021-08-05 10:43 UTC (permalink / raw)
  To: Jonathan Wright; +Cc: gcc-patches

Jonathan Wright <Jonathan.Wright@arm.com> writes:
> Hi,
>
> V2 of this patch uses the same approach as that just implemented
> for the multiply high-half cost patch.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-28  Jonathan Wright  <jonathan.wright@arm.com>
>
>         * config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
>         of vec_select high-half from being added into Neon add cost.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/vaddX_high_cost.c: New test.

OK, thanks.

Richard

>
> From: Jonathan Wright
> Sent: 29 July 2021 10:22
> To: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: [PATCH] aarch64: Don't include vec_select high-half in SIMD add cost
>
> Hi,
>
> The Neon add-long/add-widen instructions can select the top or bottom
> half of the operand registers. This selection does not change the
> cost of the underlying instruction and this should be reflected by
> the RTL cost function.
>
> This patch adds RTL tree traversal in the Neon add cost function to
> match vec_select high-half of its operands. This traversal prevents
> the cost of the vec_select from being added into the cost of the
> subtract - meaning that these instructions can now be emitted in the
> combine pass as they are no longer deemed prohibitively expensive.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-28  Jonathan Wright  <jonathan.wright@arm.com>
>
>         * config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
>         of vec_select high-half from being added into Neon add cost.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/vaddX_high_cost.c: New test.
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 10a436ad7e6fa6c5de706ee5abbdc6fb3d268076..cc92cc9c208e63f262c22c7fe8e6915825884775 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13161,6 +13161,21 @@ cost_minus:
>  	op1 = XEXP (x, 1);
>  
>  cost_plus:
> +	if (VECTOR_MODE_P (mode))
> +	  {
> +	    /* ADDL2 and ADDW2.  */
> +	    unsigned int vec_flags = aarch64_classify_vector_mode (mode);
> +	    if (vec_flags & VEC_ADVSIMD)
> +	      {
> +		/* The select-operand-high-half versions of the add instruction
> +		   have the same cost as the regular three vector version -
> +		   don't add the costs of the select into the costs of the add.
> +		   */
> +		op0 = aarch64_strip_extend_vec_half (op0);
> +		op1 = aarch64_strip_extend_vec_half (op1);
> +	      }
> +	  }
> +
>  	if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE
>  	    || GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMM_COMPARE)
>  	  {
> diff --git a/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c b/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..43f28d597a94d8aceac87ef2240a50cc56c07240
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +#include <arm_neon.h>
> +
> +#define TEST_ADDL(rettype, intype, ts, rs) \
> +  rettype test_vaddl_ ## ts (intype a, intype b, intype c) \
> +	{ \
> +		rettype t0 = vaddl_ ## ts (vget_high_ ## ts (a), \
> +					   vget_high_ ## ts (c)); \
> +		rettype t1 = vaddl_ ## ts (vget_high_ ## ts (b), \
> +					   vget_high_ ## ts (c)); \
> +		return vaddq ## _ ## rs (t0, t1); \
> +	}
> +
> +TEST_ADDL (int16x8_t, int8x16_t, s8, s16)
> +TEST_ADDL (uint16x8_t, uint8x16_t, u8, u16)
> +TEST_ADDL (int32x4_t, int16x8_t, s16, s32)
> +TEST_ADDL (uint32x4_t, uint16x8_t, u16, u32)
> +TEST_ADDL (int64x2_t, int32x4_t, s32, s64)
> +TEST_ADDL (uint64x2_t, uint32x4_t, u32, u64)
> +
> +#define TEST_ADDW(rettype, intype, intypel, ts, rs) \
> +  rettype test_vaddw_ ## ts (intype a, intype b, intypel c) \
> +	{ \
> +		rettype t0 = vaddw_ ## ts (a, vget_high_ ## ts (c)); \
> +		rettype t1 = vaddw_ ## ts (b, vget_high_ ## ts (c)); \
> +		return vaddq ## _ ## rs (t0, t1); \
> +	}
> +
> +TEST_ADDW (int16x8_t, int16x8_t, int8x16_t, s8, s16)
> +TEST_ADDW (uint16x8_t, uint16x8_t, uint8x16_t, u8, u16)
> +TEST_ADDW (int32x4_t, int32x4_t, int16x8_t, s16, s32)
> +TEST_ADDW (uint32x4_t, uint32x4_t, uint16x8_t, u16, u32)
> +TEST_ADDW (int64x2_t, int64x2_t, int32x4_t, s32, s64)
> +TEST_ADDW (uint64x2_t, uint64x2_t, uint32x4_t, u32, u64)
> +
> +/* { dg-final { scan-assembler-not "dup\\t" } } */

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-05 10:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-29  9:22 [PATCH] aarch64: Don't include vec_select high-half in SIMD add cost Jonathan Wright
2021-08-04 15:52 ` [PATCH V2] " Jonathan Wright
2021-08-05 10:43   ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).