From: Hao Liu OS <hliu@os.amperecomputing.com>
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: Richard Biener <richard.guenther@gmail.com>,
"GCC-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]
Date: Tue, 1 Aug 2023 09:43:19 +0000 [thread overview]
Message-ID: <SJ2PR01MB8635C8766D63D2059AEB19E9E10AA@SJ2PR01MB8635.prod.exchangelabs.com> (raw)
In-Reply-To: <mptila0bg2x.fsf@arm.com>
Hi Richard,
This is a quick fix to the several ICEs. It seems even STMT_VINFO_LIVE_P is true, some reduct stmts still don't have REDUC_DEF. So I change the check to STMT_VINFO_REDUC_DEF.
Is it OK for trunk?
---
Fix the ICEs on empty reduction define. Even STMT_VINFO_LIVE_P is true, some reduct stmts
still don't have definition.
gcc/ChangeLog:
PR target/110625
* config/aarch64/aarch64.cc (aarch64_force_single_cycle): check
STMT_VINFO_REDUC_DEF to avoid failures in info_for_reduction
---
gcc/config/aarch64/aarch64.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d4d76025545..5b8d8fa8e2d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16776,7 +16776,7 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, stmt_vec_info stmt_info,
static bool
aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
{
- if (!STMT_VINFO_LIVE_P (stmt_info))
+ if (!STMT_VINFO_REDUC_DEF (stmt_info))
return false;
auto reduc_info = info_for_reduction (vinfo, stmt_info);
--
2.40.0
________________________________________
From: Richard Sandiford <richard.sandiford@arm.com>
Sent: Monday, July 31, 2023 17:11
To: Hao Liu OS
Cc: Richard Biener; GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]
Hao Liu OS <hliu@os.amperecomputing.com> writes:
>> Which test case do you see this for? The two tests in the patch still
>> seem to report correct latencies for me if I make the change above.
>
> Not the newly added tests. It is still the existing case causing the previous ICE (i.e. assertion problem): gcc.target/aarch64/sve/cost_model_13.c.
>
> It's not the test case itself failed, but the dump message of vect says the "reduction latency" is 0:
>
> Before the change:
> cost_model_13.c:7:21: note: Original vector body cost = 6
> cost_model_13.c:7:21: note: Scalar issue estimate:
> cost_model_13.c:7:21: note: load operations = 1
> cost_model_13.c:7:21: note: store operations = 0
> cost_model_13.c:7:21: note: general operations = 1
> cost_model_13.c:7:21: note: reduction latency = 1
> cost_model_13.c:7:21: note: estimated min cycles per iteration = 1.000000
> cost_model_13.c:7:21: note: estimated cycles per vector iteration (for VF 8) = 8.000000
> cost_model_13.c:7:21: note: Vector issue estimate:
> cost_model_13.c:7:21: note: load operations = 1
> cost_model_13.c:7:21: note: store operations = 0
> cost_model_13.c:7:21: note: general operations = 1
> cost_model_13.c:7:21: note: reduction latency = 2
> cost_model_13.c:7:21: note: estimated min cycles per iteration = 2.000000
>
> After the change:
> cost_model_13.c:7:21: note: Original vector body cost = 6
> cost_model_13.c:7:21: note: Scalar issue estimate:
> cost_model_13.c:7:21: note: load operations = 1
> cost_model_13.c:7:21: note: store operations = 0
> cost_model_13.c:7:21: note: general operations = 1
> cost_model_13.c:7:21: note: reduction latency = 0 <--- seems not consistent with above result
> cost_model_13.c:7:21: note: estimated min cycles per iteration = 1.000000
> cost_model_13.c:7:21: note: estimated cycles per vector iteration (for VF 8) = 8.000000
> cost_model_13.c:7:21: note: Vector issue estimate:
> cost_model_13.c:7:21: note: load operations = 1
> cost_model_13.c:7:21: note: store operations = 0
> cost_model_13.c:7:21: note: general operations = 1
> cost_model_13.c:7:21: note: reduction latency = 0 <--- seems not consistent with above result
> cost_model_13.c:7:21: note: estimated min cycles per iteration = 1.000000 <--- seems not consistent with above result
>
> BTW. this should be caused by the reduction stmt is not live, which indicates whether this stmts is part of a computation whose result is used outside the loop (tree-vectorized.h:1204):
> <bb 3>:
> # res_18 = PHI <res_15(7), 0(6)>
> # i_20 = PHI <i_16(7), 0(6)>
> _1 = (long unsigned int) i_20;
> _2 = _1 * 2;
> _3 = x_14(D) + _2;
> _4 = *_3;
> _5 = (unsigned short) _4;
> res.0_6 = (unsigned short) res_18;
> _7 = _5 + res.0_6; <-- This is not live, may be caused by the below type cast stmt.
> res_15 = (short int) _7;
> i_16 = i_20 + 1;
> if (n_11(D) > i_16)
> goto <bb 7>;
> else
> goto <bb 4>;
>
> <bb 7>:
> goto <bb 3>;
Ah, I see, thanks. My concern was: if requiring !STMT_VINFO_LIVE_P stmts
can cause "normal" reductions to have a latency of 0, could the same thing
happen for single-cycle reductions? But I suppose the answer is "no".
Introducing a cast like the above would cause reduc_chain_length > 1,
and so:
if (ncopies > 1
&& (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
&& reduc_chain_length == 1
&& loop_vinfo->suggested_unroll_factor == 1)
single_defuse_cycle = true;
wouldn't trigger. Which makes the single-cycle thing a bit hit-and-miss...
So yeah, I agree the patch is safe after all.
Please split the check out into a helper though, to avoid the awkward
formatting:
/* Return true if STMT_INFO is part of a reduction that has the form:
r = r op ...;
r = r op ...;
with the single accumulator being read and written multiple times. */
static bool
aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
{
if (!STMT_VINFO_LIVE_P (stmt_info))
return false;
auto reduc_info = info_for_reduction (vinfo, stmt_info);
return STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info);
}
OK with that change, thanks.
Richard
next prev parent reply other threads:[~2023-08-01 9:43 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-19 4:33 Hao Liu OS
2023-07-24 1:58 ` Hao Liu OS
2023-07-24 11:10 ` Richard Sandiford
2023-07-25 9:10 ` Hao Liu OS
2023-07-25 9:44 ` Richard Sandiford
2023-07-26 2:01 ` Hao Liu OS
2023-07-26 8:47 ` Richard Biener
2023-07-26 9:14 ` Richard Sandiford
2023-07-26 10:02 ` Richard Biener
2023-07-26 10:12 ` Richard Sandiford
2023-07-26 12:00 ` Richard Biener
2023-07-26 12:54 ` Hao Liu OS
2023-07-28 10:06 ` Hao Liu OS
2023-07-28 17:35 ` Richard Sandiford
2023-07-31 2:39 ` Hao Liu OS
2023-07-31 9:11 ` Richard Sandiford
2023-07-31 9:25 ` Hao Liu OS
2023-08-01 9:43 ` Hao Liu OS [this message]
2023-08-02 3:45 ` Hao Liu OS
2023-08-03 9:33 ` Hao Liu OS
2023-08-03 10:10 ` Richard Sandiford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SJ2PR01MB8635C8766D63D2059AEB19E9E10AA@SJ2PR01MB8635.prod.exchangelabs.com \
--to=hliu@os.amperecomputing.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.guenther@gmail.com \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).