Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Stamatis Markianos-Wright <stam.markianos-wright@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
	ramana.gcc@gmail.com, "nickc@redhat.com" <nickc@redhat.com>
Subject: Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops
Date: Fri, 23 Jun 2023 17:25:19 +0100	[thread overview]
Message-ID: <ca23f624-6b10-a8db-dac1-b1185410f4a6@arm.com> (raw)
In-Reply-To: <cada1fc1-dd8f-60b1-ecc2-f89262d458d4@arm.com>

+  /* In order to find out if the loop is of type A or B above look for the
+     loop counter: it will either be incrementing by one per iteration or
+     it will be decrementing by num_of_lanes.  We can find the loop counter
+     in the condition at the end of the loop.  */
+  rtx_insn *loop_cond = prev_nonnote_nondebug_insn_bb (BB_END (body));
+  gcc_assert (cc_register (XEXP (PATTERN (loop_cond), 0), VOIDmode)
+	      && GET_CODE (XEXP (PATTERN (loop_cond), 1)) == COMPARE);

Not sure this should be an assert. If we do encounter a differently 
formed loop, we should bail out of DLSTPing for now but we shouldn't ICE.


+  /* The loop latch has to be empty.  When compiling all the known MVE 
LoLs in
+     user applications, none of those with incrementing counters had 
any real
+     insns in the loop latch.  As such, this function has only been 
tested with
+     an empty latch and may misbehave or ICE if we somehow get here with an
+     increment in the latch, so, for sanity, error out early.  */
+  rtx_insn *dec_insn = BB_END (body->loop_father->latch);
+  if (NONDEBUG_INSN_P (dec_insn))
+    gcc_unreachable ();

Similarly here I'd return false rather than gcc_unreachable ();


+  /* Find where both of those are modified in the loop body bb.  */
+  rtx condcount_reg_set = PATTERN (DF_REF_INSN (df_bb_regno_only_def_find
+						 (body, REGNO (condcount))));
Put = on newline, breaks it down nicer.

+	  counter_orig_set = XEXP (PATTERN
+				    (DF_REF_INSN
+				      (DF_REF_NEXT_REG
+					(DF_REG_DEF_CHAIN
+					 (REGNO
+					   (XEXP (condcount_reg_set, 0)))))), 1);

This makes me a bit nervous, can we be certain that the PATTERN of the 
next insn that sets it is indeed a set. Heck can we even be sure 
DF_REG_DEF_CHAIN returns a non-null, I can't imagine why not but maybe 
there are some constructs it can't follow-up on? Might just be worth 
checking these steps and bailing out.



+      /* When we find the vctp instruction: This may be followed by
+      a zero-extend insn to SImode.  If it is, then save the
+      zero-extended REG into vctp_vpr_generated.  If there is no
+      zero-extend, then store the raw output of the vctp.
+      For any VPT-predicated instructions we need to ensure that
+      the VPR they use is the same as the one given here and
+      they often consume the output of a subreg of the SImode
+      zero-extended VPR-reg.  As a result, comparing against the
+      output of the zero-extend is more likely to succeed.
+      This code also guarantees to us that the vctp comes before
+      any instructions that use the VPR within the loop, for the
+      dlstp/letp transform to succeed.  */

Wrong comment indent after first line.

+  rtx_insn *vctp_insn = arm_mve_get_loop_vctp (body);
+  if (!vctp_insn || !arm_mve_loop_valid_for_dlstp (body))
+    return GEN_INT (1);

arm_mve_loop_valid_for_dlstp already calls arm_mve_get_loop_vctp, maybe 
have 'arm_mve_loop_valid_for_dlstp' return vctp_insn or null to 
determine success or failure, avoids looping through the BB again.

For the same reason I'd also pass vctp_insn down to 
'arm_mve_check_df_chain_back_for_implic_predic'.

+	  if (GET_CODE (SET_SRC (single_set (next_use1))) == ZERO_EXTEND)
+	    {
+	      rtx_insn *next_use2 = NULL;

Are we sure single_set can never return 0 here? Maybe worth an extra 
check and bail out if it does?

+       /* If the insn pattern requires the use of the VPR value from the
+	  vctp as an input parameter.  */
s/an an input parameter./as an input parameter for predication./

+	  /* None of registers USE-d by the instruction need can be the VPR
+	     vctp_vpr_generated.  This blocks the optimisation if there any
+	     instructions that use the optimised-out VPR value in any way
+	     other than as a VPT block predicate.  */

Reword this slightly to be less complex:
This instruction USE-s the vctp_vpr_generated other than for 
predication, this blocks the transformation as we are not allowed to 
optimise the VPR value away.

Will continue reviewing next week :)

On 15/06/2023 12:47, Stamatis Markianos-Wright via Gcc-patches wrote:
>      Hi all,
> 
>      This is the 2/2 patch that contains the functional changes needed
>      for MVE Tail Predicated Low Overhead Loops.  See my previous email
>      for a general introduction of MVE LOLs.
> 
>      This support is added through the already existing loop-doloop
>      mechanisms that are used for non-MVE dls/le looping.
> 
>      Mid-end changes are:
> 
>      1) Relax the loop-doloop mechanism in the mid-end to allow for
>         decrement numbers other that -1 and for `count` to be an
>         rtx containing a simple REG (which in this case will contain
>         the number of elements to be processed), rather
>         than an expression for calculating the number of iterations.
>      2) Added a new df utility function: `df_bb_regno_only_def_find` that
>         will return the DEF of a REG only if it is DEF-ed once within the
>         basic block.
> 
>      And many things in the backend to implement the above optimisation:
> 
>      3)  Implement the `arm_predict_doloop_p` target hook to instruct the
>          mid-end about Low Overhead Loops (MVE or not), as well as
>          `arm_loop_unroll_adjust` which will prevent unrolling of any loops
>          that are valid for becoming MVE Tail_Predicated Low Overhead Loops
>          (unrolling can transform a loop in ways that invalidate the dlstp/
>          letp tranformation logic and the benefit of the dlstp/letp loop
>          would be considerably higher than that of unrolling)
>      4)  Appropriate changes to the define_expand of doloop_end, new
>          patterns for dlstp and letp, new iterators,  unspecs, etc.
>      5) `arm_mve_loop_valid_for_dlstp` and a number of checking functions:
>         * `arm_mve_dlstp_check_dec_counter`
>         * `arm_mve_dlstp_check_inc_counter`
>         * `arm_mve_check_reg_origin_is_num_elems`
>         * `arm_mve_check_df_chain_back_for_implic_predic`
>         * `arm_mve_check_df_chain_fwd_for_implic_predic_impact`
>         This all, in smoe way or another, are running checks on the loop
>         structure in order to determine if the loop is valid for dlstp/letp
>         transformation.
>      6) `arm_attempt_dlstp_transform`: (called from the define_expand of
>          doloop_end) this function re-checks for the loop's suitability for
>          dlstp/letp transformation and then implements it, if possible.
>      7) Various utility functions:
>         *`arm_mve_get_vctp_lanes` to map
>         from vctp unspecs to number of lanes, and 
> `arm_get_required_vpr_reg`
>         to check an insn to see if it requires the VPR or not.
>         * `arm_mve_get_loop_vctp`
>         * `arm_mve_get_vctp_lanes`
>         * `arm_emit_mve_unpredicated_insn_to_seq`
>         * `arm_get_required_vpr_reg`
>         * `arm_get_required_vpr_reg_param`
>         * `arm_get_required_vpr_reg_ret_val`
>         * `arm_mve_vec_insn_is_predicated_with_this_predicate`
>         * `arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate`
> 
>      No regressions on arm-none-eabi with various targets and on
>      aarch64-none-elf. Thoughts on getting this into trunk?
> 
>      Thank you,
>      Stam Markianos-Wright
> 
>      gcc/ChangeLog:
> 
>              * config/arm/arm-protos.h (arm_target_insn_ok_for_lob): 
> Rename to...
>              (arm_target_bb_ok_for_lob): ...this
>              (arm_attempt_dlstp_transform): New.
>              * config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): New.
>              (TARGET_PREDICT_DOLOOP_P): New.
>              (arm_block_set_vect):
>              (arm_target_insn_ok_for_lob): Rename from 
> arm_target_insn_ok_for_lob.
>              (arm_target_bb_ok_for_lob): New.
>              (arm_mve_get_vctp_lanes): New.
>              (arm_get_required_vpr_reg): New.
>              (arm_get_required_vpr_reg_param): New.
>              (arm_get_required_vpr_reg_ret_val): New.
>              (arm_mve_get_loop_vctp): New.
> (arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate): New.
>              (arm_mve_vec_insn_is_predicated_with_this_predicate): New.
>              (arm_mve_check_df_chain_back_for_implic_predic): New.
>              (arm_mve_check_df_chain_fwd_for_implic_predic_impact): New.
>              (arm_mve_check_reg_origin_is_num_elems): New.
>              (arm_mve_dlstp_check_inc_counter): New.
>              (arm_mve_dlstp_check_dec_counter): New.
>              (arm_mve_loop_valid_for_dlstp): New.
>              (arm_predict_doloop_p): New.
>              (arm_loop_unroll_adjust): New.
>              (arm_emit_mve_unpredicated_insn_to_seq): New.
>              (arm_attempt_dlstp_transform): New.
>              * config/arm/iterators.md (DLSTP): New.
>              (mode1): Add DLSTP mappings.
>              * config/arm/mve.md (*predicated_doloop_end_internal): New.
>              (dlstp<mode1>_insn): New.
>              * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
>              * config/arm/unspecs.md: New unspecs.
>              * df-core.cc (df_bb_regno_only_def_find): New.
>              * df.h (df_bb_regno_only_def_find): New.
>              * loop-doloop.cc (doloop_condition_get): Relax conditions.
>              (doloop_optimize): Add support for elementwise LoLs.
> 
>      gcc/testsuite/ChangeLog:
> 
>              * gcc.target/arm/lob.h: Update framework.
>              * gcc.target/arm/lob1.c: Likewise.
>              * gcc.target/arm/lob6.c: Likewise.
>              * gcc.target/arm/mve/dlstp-compile-asm.c: New test.
>              * gcc.target/arm/mve/dlstp-int16x8.c: New test.
>              * gcc.target/arm/mve/dlstp-int32x4.c: New test.
>              * gcc.target/arm/mve/dlstp-int64x2.c: New test.
>              * gcc.target/arm/mve/dlstp-int8x16.c: New test.
>              * gcc.target/arm/mve/dlstp-invalid-asm.c: New test.

next prev parent reply	other threads:[~2023-06-23 16:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15 11:47 Stamatis Markianos-Wright
2023-06-22 15:54 ` Andre Vieira (lists)
2023-07-05 16:11   ` Stamatis Markianos-Wright
2023-06-23 10:23 ` Andre Vieira (lists)
2023-07-05 16:13   ` Stamatis Markianos-Wright
2023-06-23 16:25 ` Andre Vieira (lists) [this message]
2023-07-05 16:41   ` Stamatis Markianos-Wright
  -- strict thread matches above, loose matches on Subject: below --
2024-05-23 14:37 [PATCH 0/2] arm,doloop: " Andre Vieira
2024-05-23 14:37 ` [PATCH 2/2] arm: " Andre Vieira
2024-05-23  8:22 [PATCH 0/2] arm,doloop: " Andre Vieira
2024-05-23  8:22 ` [PATCH 2/2] arm: " Andre Vieira
2023-12-18 11:53 [PATCH 0/2] " Andre Vieira
2023-12-18 11:53 ` [PATCH 2/2] " Andre Vieira
2023-12-20 16:54   ` Andre Vieira (lists)
2023-11-06 11:20 Stamatis Markianos-Wright
2023-08-17 10:31 Stamatis Markianos-Wright
2022-11-11 17:40 Stam Markianos-Wright
2022-11-15 15:51 ` Andre Vieira (lists)
2022-11-28 12:13   ` Stam Markianos-Wright

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca23f624-6b10-a8db-dac1-b1185410f4a6@arm.com \
    --to=andre.simoesdiasvieira@arm.com \
    --cc=Kyrylo.Tkachov@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nickc@redhat.com \
    --cc=ramana.gcc@gmail.com \
    --cc=stam.markianos-wright@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).