From: Richard Biener <rguenther@suse.de>
To: gcc-patches@gcc.gnu.org
Cc: richard.sandiford@arm.com
Subject: [PATCH][RFC] target/110456 - avoid loop masking with zero distance dependences
Date: Thu, 29 Jun 2023 12:28:12 +0000 (UTC) [thread overview]
Message-ID: <20230629122812.gpfnAgiFPeHAyZ9xqnRe5oPohHJgBd1exzwXQrF9k0k@z> (raw)
With applying loop masking to epilogues on x86_64 AVX512 we see
some significant performance regressions when evaluating SPEC CPU 2017
that are caused by store-to-load forwarding fails across outer
loop iterations when the inner loop does not iterate. Consider
for (j = 0; j < m; ++j)
for (i = 0; i < n; ++i)
a[j*n + i] += b[j*n + i];
with 'n' chosen so that the inner loop vectorized code is fully
executed by the masked epilogue and that masked epilogue
storing O > n elements (with elements >= n masked of course).
Then the masked load performed for the next outer loop iteration
will get a hit in the store queue but it obviously cannot forward
so we have to wait for the store to retire.
That causes a significant hit to performance especially if 'n'
would have made a non-masked epilogue to fully cover 'n' as well
(say n == 4 for a V4DImode epilogue), avoiding the need for
store-forwarding and waiting for the retiring of the store.
The following applies a very simple heuristic, disabling
the use of loop masking when there's a memory reference pair
with dependence distance zero. That resolves the issue
(other problematic dependence distances seem to be less common
at least).
I have applied this heuristic in generic vectorizer code but
restricted it to non-VL vector sizes. There currently isn't
a way for the target to request disabling of masking only,
while we can reject the vectoriztion at costing time that will
not re-consider the same vector mode but without masking.
It seems simply re-costing with masking disabled should be
possible through, we'd just need an indication whether that
should be done? Maybe always when the current vector mode is
of fixed size?
I wonder how SVE vectorized code behaves in these situations?
The affected SPEC CPU 2017 benchmarks were 527.cam4_r and
503.bwaves_r though I think both will need a hardware vector
size covering at least 8 doubles to show the issue. 527.cam4_r
has 4 elements in the inner loop, 503.bwaves_r 5 IIRC.
Bootstrap / regtest running on x86_64-unknown-linux-gnu.
Any comments?
Thanks,
Richard.
PR target/110456
* tree-vectorizer.h (vec_info_shared::has_zero_dep_dist): New.
* tree-vectorizer.cc (vec_info_shared::vec_info_shared):
Initialize has_zero_dep_dist.
* tree-vect-data-refs.cc (vect_analyze_data_ref_dependence):
Remember if we've seen a dependence distance of zero.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
When we've seen a dependence distance of zero and the vector
type has constant size disable the use of partial vectors.
---
gcc/tree-vect-data-refs.cc | 2 ++
gcc/tree-vect-stmts.cc | 10 ++++++++++
gcc/tree-vectorizer.cc | 1 +
gcc/tree-vectorizer.h | 3 +++
4 files changed, 16 insertions(+)
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index ebe93832b1e..40cde95c16a 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -470,6 +470,8 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
"dependence distance == 0 between %T and %T\n",
DR_REF (dra), DR_REF (drb));
+ loop_vinfo->shared->has_zero_dep_dist = true;
+
/* When we perform grouped accesses and perform implicit CSE
by detecting equal accesses and doing disambiguation with
runtime alias tests like for
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d642d3c257f..3bcbc000323 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1839,6 +1839,16 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype,
using_partial_vectors_p = true;
}
+ if (loop_vinfo->shared->has_zero_dep_dist
+ && TYPE_VECTOR_SUBPARTS (vectype).is_constant ())
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "disabling partial vectors because of possible "
+ "STLF issues\n");
+ LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+ }
+
if (!using_partial_vectors_p)
{
if (dump_enabled_p ())
diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index a048e9d8917..74457259b6e 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -478,6 +478,7 @@ vec_info::~vec_info ()
vec_info_shared::vec_info_shared ()
: n_stmts (0),
+ has_zero_dep_dist (false),
datarefs (vNULL),
datarefs_copy (vNULL),
ddrs (vNULL)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index a36974c2c0d..7626cda2a73 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -419,6 +419,9 @@ public:
/* The number of scalar stmts. */
unsigned n_stmts;
+ /* Whether there's a dependence with zero distance. */
+ bool has_zero_dep_dist;
+
/* All data references. Freed by free_data_refs, so not an auto_vec. */
vec<data_reference_p> datarefs;
vec<data_reference> datarefs_copy;
--
2.35.3
next reply other threads:[~2023-06-29 12:28 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-29 12:28 Richard Biener [this message]
[not found] <5b45a654-529e-4b2e-8a3e-b81ba94d28a1@AM7EUR03FT055.eop-EUR03.prod.protection.outlook.com>
2023-06-29 12:55 ` Richard Sandiford
2023-06-29 13:16 ` Richard Biener
2023-07-04 13:37 ` Richard Biener
2023-07-04 19:49 ` Richard Sandiford
2023-07-05 13:29 ` Richard Biener
2023-07-05 13:55 ` Richard Sandiford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230629122812.gpfnAgiFPeHAyZ9xqnRe5oPohHJgBd1exzwXQrF9k0k@z \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).