public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] AArch64: Adjust costing of by element MUL to be the same as SAME3 MUL.
@ 2020-06-08 14:14 Tamar Christina
  2020-06-08 15:42 ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: Tamar Christina @ 2020-06-08 14:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 779 bytes --]

Hi All,

The cost model is currently treating multiplication by element as being more
expensive than 3 same multiplication.  This means that if the value is on the
SIMD side we add an unneeded DUP.  If the value is on the genreg side we use the
more expensive DUP instead of fmov.

This patch corrects the costs such that the two multiplies are costed the same
which allows us to generate

        fmul    v3.4s, v3.4s, v0.s[0]

instead of

        dup     v0.4s, v0.s[0]
        fmul    v3.4s, v3.4s, v0.4s

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Adjust costs for mul.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/asimd-mull-elem.c: New test.

-- 

[-- Attachment #2: rb13166.patch --]
[-- Type: text/x-diff, Size: 1792 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 97da60762390db81df9cffaf316b909cd1609130..9cc8da338125afa01bc9fb645f4112d2d7ef548c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11279,6 +11279,14 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code, int outer, bool speed)
   if (VECTOR_MODE_P (mode))
     mode = GET_MODE_INNER (mode);
 
+  /* The by element versions of the instruction has the same costs as the
+     normal 3 vector version.  So don't add the costs of the duplicate into
+     the costs of the multiply.  */
+  if (GET_CODE (op0) == VEC_DUPLICATE)
+    op0 = XEXP (op0, 0);
+  else if (GET_CODE (op1) == VEC_DUPLICATE)
+    op1 = XEXP (op1, 0);
+
   /* Integer multiply/fma.  */
   if (GET_MODE_CLASS (mode) == MODE_INT)
     {
diff --git a/gcc/testsuite/gcc.target/aarch64/asimd-mull-elem.c b/gcc/testsuite/gcc.target/aarch64/asimd-mull-elem.c
new file mode 100644
index 0000000000000000000000000000000000000000..513721cee0c8372781e6daf33bc06e256cab8cb8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/asimd-mull-elem.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-options "-Ofast" } */
+
+#include <arm_neon.h>
+
+void s_mult_i (int32_t* restrict res, int32_t* restrict a, int32_t b)
+{
+    for (int x = 0; x < 16; x++)
+      res[x] = a[x] * b;
+}
+
+void s_mult_f (float32_t* restrict res, float32_t* restrict a, float32_t b)
+{
+    for (int x = 0; x < 16; x++)
+      res[x] = a[x] * b;
+}
+
+/* { dg-final { scan-assembler-times {\s+mul\tv[0-9]+\.4s, v[0-9]+\.4s, v[0-9]+\.s\[0\]} 4 } } */
+/* { dg-final { scan-assembler-times {\s+fmul\tv[0-9]+\.4s, v[0-9]+\.4s, v[0-9]+\.s\[0\]} 4 } } */


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-10  7:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-08 14:14 [PATCH] AArch64: Adjust costing of by element MUL to be the same as SAME3 MUL Tamar Christina
2020-06-08 15:42 ` Richard Sandiford
2020-06-09 11:21   ` Tamar Christina
2020-06-09 11:44     ` Richard Sandiford
2020-06-09 12:23       ` Tamar Christina
2020-06-10  7:32         ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).