[PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Hao Liu OS <hliu@os.amperecomputing.com>
To: "GCC-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: "richard.sandiford@arm.com" <richard.sandiford@arm.com>
Subject: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]
Date: Wed, 19 Jul 2023 04:33:48 +0000	[thread overview]
Message-ID: <SJ2PR01MB8635742E07E2076FA2BE0560E139A@SJ2PR01MB8635.prod.exchangelabs.com> (raw)

This only affects the new costs in aarch64 backend.  Currently, the reduction
latency of vector body is too large as it is multiplied by stmt count.  As the
scalar reduction latency is small, the new costs model may think "scalar code
would issue more quickly" and increase the vector body cost a lot, which will
miss vectorization opportunities.

Tested by bootstrapping on aarch64-linux-gnu.

gcc/ChangeLog:

	PR target/110625
	* config/aarch64/aarch64.cc (count_ops): Remove the '* count'
	for reduction_latency.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/pr110625.c: New testcase.
---
 gcc/config/aarch64/aarch64.cc               |  5 +--
 gcc/testsuite/gcc.target/aarch64/pr110625.c | 46 +++++++++++++++++++++
 2 files changed, 47 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 560e5431636..27afa64b7d5 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16788,10 +16788,7 @@ aarch64_vector_costs::count_ops (unsigned int count, vect_cost_for_stmt kind,
     {
       unsigned int base
 	= aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_flags);
-
-      /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately
-	 that's not yet the case.  */
-      ops->reduction_latency = MAX (ops->reduction_latency, base * count);
+      ops->reduction_latency = MAX (ops->reduction_latency, base);
     }
 
   /* Assume that multiply-adds will become a single operation.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625.c b/gcc/testsuite/gcc.target/aarch64/pr110625.c
new file mode 100644
index 00000000000..0965cac33a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr110625.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mcpu=neoverse-n2 -fdump-tree-vect-details -fno-tree-slp-vectorize" } */
+/* { dg-final { scan-tree-dump-not "reduction latency = 8" "vect" } } */
+
+/* Do not increase the vector body cost due to the incorrect reduction latency
+    Original vector body cost = 51
+    Scalar issue estimate:
+      ...
+      reduction latency = 2
+      estimated min cycles per iteration = 2.000000
+      estimated cycles per vector iteration (for VF 2) = 4.000000
+    Vector issue estimate:
+      ...
+      reduction latency = 8      <-- Too large
+      estimated min cycles per iteration = 8.000000
+    Increasing body cost to 102 because scalar code would issue more quickly
+      ...
+    missed:  cost model: the vector iteration cost = 102 divided by the scalar iteration cost = 44 is greater or equal to the vectorization factor = 2.
+    missed:  not vectorized: vectorization not profitable.  */
+
+typedef struct
+{
+  unsigned short m1, m2, m3, m4;
+} the_struct_t;
+typedef struct
+{
+  double m1, m2, m3, m4, m5;
+} the_struct2_t;
+
+double
+bar (the_struct2_t *);
+
+double
+foo (double *k, unsigned int n, the_struct_t *the_struct)
+{
+  unsigned int u;
+  the_struct2_t result;
+  for (u = 0; u < n; u++, k--)
+    {
+      result.m1 += (*k) * the_struct[u].m1;
+      result.m2 += (*k) * the_struct[u].m2;
+      result.m3 += (*k) * the_struct[u].m3;
+      result.m4 += (*k) * the_struct[u].m4;
+    }
+  return bar (&result);
+}
-- 
2.34.1

next             reply	other threads:[~2023-07-19  4:33 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-19  4:33 Hao Liu OS [this message]
2023-07-24  1:58 ` Hao Liu OS
2023-07-24 11:10 ` Richard Sandiford
2023-07-25  9:10   ` Hao Liu OS
2023-07-25  9:44     ` Richard Sandiford
2023-07-26  2:01       ` Hao Liu OS
2023-07-26  8:47         ` Richard Biener
2023-07-26  9:14           ` Richard Sandiford
2023-07-26 10:02             ` Richard Biener
2023-07-26 10:12               ` Richard Sandiford
2023-07-26 12:00                 ` Richard Biener
2023-07-26 12:54             ` Hao Liu OS
2023-07-28 10:06               ` Hao Liu OS
2023-07-28 17:35               ` Richard Sandiford
2023-07-31  2:39                 ` Hao Liu OS
2023-07-31  9:11                   ` Richard Sandiford
2023-07-31  9:25                     ` Hao Liu OS
2023-08-01  9:43                     ` Hao Liu OS
2023-08-02  3:45                       ` Hao Liu OS
2023-08-03  9:33                         ` Hao Liu OS
2023-08-03 10:10                         ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ2PR01MB8635742E07E2076FA2BE0560E139A@SJ2PR01MB8635.prod.exchangelabs.com \
    --to=hliu@os.amperecomputing.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).