public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] aarch64: Don't include vec_select in SIMD multiply cost
@ 2021-07-20 10:46 Jonathan Wright
  2021-07-22 17:16 ` Richard Sandiford
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Wright @ 2021-07-20 10:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrylo Tkachov, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]

Hi,

The Neon multiply/multiply-accumulate/multiply-subtract instructions
can take various forms - multiplying full vector registers of values
or multiplying one vector by a single element of another. Regardless
of the form used, these instructions have the same cost, and this
should be reflected by the RTL cost function.

This patch adds RTL tree traversal in the Neon multiply cost function
to match the vec_select used by the lane-referencing forms of the
instructions already mentioned. This traversal prevents the cost of
the vec_select from being added into the cost of the multiply -
meaning that these instructions can now be emitted in the combine
pass as they are no longer deemed prohibitively expensive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-19  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Traverse
	RTL tree to prevents vec_select from being added into Neon
	multiply cost.

[-- Attachment #2: rb14675.patch --]
[-- Type: application/octet-stream, Size: 1359 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f5b25a7f7041645921e6ad85714efda73b993492..b368303b0e699229266e6d008e28179c496bf8cd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11985,6 +11985,21 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code, int outer, bool speed)
 	    op0 = XEXP (op0, 0);
 	  else if (GET_CODE (op1) == VEC_DUPLICATE)
 	    op1 = XEXP (op1, 0);
+	  /* The same argument applies to the VEC_SELECT when using the lane-
+	     referencing forms of the MUL/MLA/MLS instructions. Without the
+	     traversal here, the combine pass deems these patterns too
+	     expensive and subsequently does not emit the lane-referencing
+	     forms of the instructions. In addition, canonical form is for the
+	     VEC_SELECT to be the second argument of the multiply - thus only
+	     op1 is traversed.  */
+	  if (GET_CODE (op1) == VEC_SELECT
+	      && GET_MODE_NUNITS (GET_MODE (op1)).to_constant () == 1)
+	    op1 = XEXP (op1, 0);
+	  else if ((GET_CODE (op1) == ZERO_EXTEND
+		    || GET_CODE (op1) == SIGN_EXTEND)
+		   && GET_CODE (XEXP (op1, 0)) == VEC_SELECT
+		   && GET_MODE_NUNITS (GET_MODE (op1)).to_constant () == 1)
+	    op1 = XEXP (XEXP (op1, 0), 0);
 	}
       cost += rtx_cost (op0, mode, MULT, 0, speed);
       cost += rtx_cost (op1, mode, MULT, 1, speed);

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-04  8:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-20 10:46 [PATCH] aarch64: Don't include vec_select in SIMD multiply cost Jonathan Wright
2021-07-22 17:16 ` Richard Sandiford
2021-07-28 13:34   ` [PATCH V2] " Jonathan Wright
2021-08-04  8:51     ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).