From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Stamatis Markianos-Wright <stam.markianos-wright@arm.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,
Richard Earnshaw <Richard.Earnshaw@arm.com>,
ramana.gcc@gmail.com, "nickc@redhat.com" <nickc@redhat.com>
Subject: Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops
Date: Fri, 23 Jun 2023 17:25:19 +0100 [thread overview]
Message-ID: <ca23f624-6b10-a8db-dac1-b1185410f4a6@arm.com> (raw)
In-Reply-To: <cada1fc1-dd8f-60b1-ecc2-f89262d458d4@arm.com>
+ /* In order to find out if the loop is of type A or B above look for the
+ loop counter: it will either be incrementing by one per iteration or
+ it will be decrementing by num_of_lanes. We can find the loop counter
+ in the condition at the end of the loop. */
+ rtx_insn *loop_cond = prev_nonnote_nondebug_insn_bb (BB_END (body));
+ gcc_assert (cc_register (XEXP (PATTERN (loop_cond), 0), VOIDmode)
+ && GET_CODE (XEXP (PATTERN (loop_cond), 1)) == COMPARE);
Not sure this should be an assert. If we do encounter a differently
formed loop, we should bail out of DLSTPing for now but we shouldn't ICE.
+ /* The loop latch has to be empty. When compiling all the known MVE
LoLs in
+ user applications, none of those with incrementing counters had
any real
+ insns in the loop latch. As such, this function has only been
tested with
+ an empty latch and may misbehave or ICE if we somehow get here with an
+ increment in the latch, so, for sanity, error out early. */
+ rtx_insn *dec_insn = BB_END (body->loop_father->latch);
+ if (NONDEBUG_INSN_P (dec_insn))
+ gcc_unreachable ();
Similarly here I'd return false rather than gcc_unreachable ();
+ /* Find where both of those are modified in the loop body bb. */
+ rtx condcount_reg_set = PATTERN (DF_REF_INSN (df_bb_regno_only_def_find
+ (body, REGNO (condcount))));
Put = on newline, breaks it down nicer.
+ counter_orig_set = XEXP (PATTERN
+ (DF_REF_INSN
+ (DF_REF_NEXT_REG
+ (DF_REG_DEF_CHAIN
+ (REGNO
+ (XEXP (condcount_reg_set, 0)))))), 1);
This makes me a bit nervous, can we be certain that the PATTERN of the
next insn that sets it is indeed a set. Heck can we even be sure
DF_REG_DEF_CHAIN returns a non-null, I can't imagine why not but maybe
there are some constructs it can't follow-up on? Might just be worth
checking these steps and bailing out.
+ /* When we find the vctp instruction: This may be followed by
+ a zero-extend insn to SImode. If it is, then save the
+ zero-extended REG into vctp_vpr_generated. If there is no
+ zero-extend, then store the raw output of the vctp.
+ For any VPT-predicated instructions we need to ensure that
+ the VPR they use is the same as the one given here and
+ they often consume the output of a subreg of the SImode
+ zero-extended VPR-reg. As a result, comparing against the
+ output of the zero-extend is more likely to succeed.
+ This code also guarantees to us that the vctp comes before
+ any instructions that use the VPR within the loop, for the
+ dlstp/letp transform to succeed. */
Wrong comment indent after first line.
+ rtx_insn *vctp_insn = arm_mve_get_loop_vctp (body);
+ if (!vctp_insn || !arm_mve_loop_valid_for_dlstp (body))
+ return GEN_INT (1);
arm_mve_loop_valid_for_dlstp already calls arm_mve_get_loop_vctp, maybe
have 'arm_mve_loop_valid_for_dlstp' return vctp_insn or null to
determine success or failure, avoids looping through the BB again.
For the same reason I'd also pass vctp_insn down to
'arm_mve_check_df_chain_back_for_implic_predic'.
+ if (GET_CODE (SET_SRC (single_set (next_use1))) == ZERO_EXTEND)
+ {
+ rtx_insn *next_use2 = NULL;
Are we sure single_set can never return 0 here? Maybe worth an extra
check and bail out if it does?
+ /* If the insn pattern requires the use of the VPR value from the
+ vctp as an input parameter. */
s/an an input parameter./as an input parameter for predication./
+ /* None of registers USE-d by the instruction need can be the VPR
+ vctp_vpr_generated. This blocks the optimisation if there any
+ instructions that use the optimised-out VPR value in any way
+ other than as a VPT block predicate. */
Reword this slightly to be less complex:
This instruction USE-s the vctp_vpr_generated other than for
predication, this blocks the transformation as we are not allowed to
optimise the VPR value away.
Will continue reviewing next week :)
On 15/06/2023 12:47, Stamatis Markianos-Wright via Gcc-patches wrote:
> Hi all,
>
> This is the 2/2 patch that contains the functional changes needed
> for MVE Tail Predicated Low Overhead Loops. See my previous email
> for a general introduction of MVE LOLs.
>
> This support is added through the already existing loop-doloop
> mechanisms that are used for non-MVE dls/le looping.
>
> Mid-end changes are:
>
> 1) Relax the loop-doloop mechanism in the mid-end to allow for
> decrement numbers other that -1 and for `count` to be an
> rtx containing a simple REG (which in this case will contain
> the number of elements to be processed), rather
> than an expression for calculating the number of iterations.
> 2) Added a new df utility function: `df_bb_regno_only_def_find` that
> will return the DEF of a REG only if it is DEF-ed once within the
> basic block.
>
> And many things in the backend to implement the above optimisation:
>
> 3) Implement the `arm_predict_doloop_p` target hook to instruct the
> mid-end about Low Overhead Loops (MVE or not), as well as
> `arm_loop_unroll_adjust` which will prevent unrolling of any loops
> that are valid for becoming MVE Tail_Predicated Low Overhead Loops
> (unrolling can transform a loop in ways that invalidate the dlstp/
> letp tranformation logic and the benefit of the dlstp/letp loop
> would be considerably higher than that of unrolling)
> 4) Appropriate changes to the define_expand of doloop_end, new
> patterns for dlstp and letp, new iterators, unspecs, etc.
> 5) `arm_mve_loop_valid_for_dlstp` and a number of checking functions:
> * `arm_mve_dlstp_check_dec_counter`
> * `arm_mve_dlstp_check_inc_counter`
> * `arm_mve_check_reg_origin_is_num_elems`
> * `arm_mve_check_df_chain_back_for_implic_predic`
> * `arm_mve_check_df_chain_fwd_for_implic_predic_impact`
> This all, in smoe way or another, are running checks on the loop
> structure in order to determine if the loop is valid for dlstp/letp
> transformation.
> 6) `arm_attempt_dlstp_transform`: (called from the define_expand of
> doloop_end) this function re-checks for the loop's suitability for
> dlstp/letp transformation and then implements it, if possible.
> 7) Various utility functions:
> *`arm_mve_get_vctp_lanes` to map
> from vctp unspecs to number of lanes, and
> `arm_get_required_vpr_reg`
> to check an insn to see if it requires the VPR or not.
> * `arm_mve_get_loop_vctp`
> * `arm_mve_get_vctp_lanes`
> * `arm_emit_mve_unpredicated_insn_to_seq`
> * `arm_get_required_vpr_reg`
> * `arm_get_required_vpr_reg_param`
> * `arm_get_required_vpr_reg_ret_val`
> * `arm_mve_vec_insn_is_predicated_with_this_predicate`
> * `arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate`
>
> No regressions on arm-none-eabi with various targets and on
> aarch64-none-elf. Thoughts on getting this into trunk?
>
> Thank you,
> Stam Markianos-Wright
>
> gcc/ChangeLog:
>
> * config/arm/arm-protos.h (arm_target_insn_ok_for_lob):
> Rename to...
> (arm_target_bb_ok_for_lob): ...this
> (arm_attempt_dlstp_transform): New.
> * config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): New.
> (TARGET_PREDICT_DOLOOP_P): New.
> (arm_block_set_vect):
> (arm_target_insn_ok_for_lob): Rename from
> arm_target_insn_ok_for_lob.
> (arm_target_bb_ok_for_lob): New.
> (arm_mve_get_vctp_lanes): New.
> (arm_get_required_vpr_reg): New.
> (arm_get_required_vpr_reg_param): New.
> (arm_get_required_vpr_reg_ret_val): New.
> (arm_mve_get_loop_vctp): New.
> (arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate): New.
> (arm_mve_vec_insn_is_predicated_with_this_predicate): New.
> (arm_mve_check_df_chain_back_for_implic_predic): New.
> (arm_mve_check_df_chain_fwd_for_implic_predic_impact): New.
> (arm_mve_check_reg_origin_is_num_elems): New.
> (arm_mve_dlstp_check_inc_counter): New.
> (arm_mve_dlstp_check_dec_counter): New.
> (arm_mve_loop_valid_for_dlstp): New.
> (arm_predict_doloop_p): New.
> (arm_loop_unroll_adjust): New.
> (arm_emit_mve_unpredicated_insn_to_seq): New.
> (arm_attempt_dlstp_transform): New.
> * config/arm/iterators.md (DLSTP): New.
> (mode1): Add DLSTP mappings.
> * config/arm/mve.md (*predicated_doloop_end_internal): New.
> (dlstp<mode1>_insn): New.
> * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
> * config/arm/unspecs.md: New unspecs.
> * df-core.cc (df_bb_regno_only_def_find): New.
> * df.h (df_bb_regno_only_def_find): New.
> * loop-doloop.cc (doloop_condition_get): Relax conditions.
> (doloop_optimize): Add support for elementwise LoLs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/lob.h: Update framework.
> * gcc.target/arm/lob1.c: Likewise.
> * gcc.target/arm/lob6.c: Likewise.
> * gcc.target/arm/mve/dlstp-compile-asm.c: New test.
> * gcc.target/arm/mve/dlstp-int16x8.c: New test.
> * gcc.target/arm/mve/dlstp-int32x4.c: New test.
> * gcc.target/arm/mve/dlstp-int64x2.c: New test.
> * gcc.target/arm/mve/dlstp-int8x16.c: New test.
> * gcc.target/arm/mve/dlstp-invalid-asm.c: New test.
next prev parent reply other threads:[~2023-06-23 16:25 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-15 11:47 Stamatis Markianos-Wright
2023-06-22 15:54 ` Andre Vieira (lists)
2023-07-05 16:11 ` Stamatis Markianos-Wright
2023-06-23 10:23 ` Andre Vieira (lists)
2023-07-05 16:13 ` Stamatis Markianos-Wright
2023-06-23 16:25 ` Andre Vieira (lists) [this message]
2023-07-05 16:41 ` Stamatis Markianos-Wright
-- strict thread matches above, loose matches on Subject: below --
2023-12-18 11:53 [PATCH 0/2] " Andre Vieira
2023-12-18 11:53 ` [PATCH 2/2] " Andre Vieira
2023-12-20 16:54 ` Andre Vieira (lists)
2023-11-06 11:20 Stamatis Markianos-Wright
2023-08-17 10:31 Stamatis Markianos-Wright
2022-11-11 17:40 Stam Markianos-Wright
2022-11-15 15:51 ` Andre Vieira (lists)
2022-11-28 12:13 ` Stam Markianos-Wright
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ca23f624-6b10-a8db-dac1-b1185410f4a6@arm.com \
--to=andre.simoesdiasvieira@arm.com \
--cc=Kyrylo.Tkachov@arm.com \
--cc=Richard.Earnshaw@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nickc@redhat.com \
--cc=ramana.gcc@gmail.com \
--cc=stam.markianos-wright@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).