Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Stamatis Markianos-Wright <stam.markianos-wright@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
	ramana.gcc@gmail.com, "nickc@redhat.com" <nickc@redhat.com>
Subject: Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops
Date: Fri, 23 Jun 2023 11:23:26 +0100	[thread overview]
Message-ID: <7dbe3fde-5cdb-cd2d-cda4-16f02385906f@arm.com> (raw)
In-Reply-To: <cada1fc1-dd8f-60b1-ecc2-f89262d458d4@arm.com>

+  if (insn != arm_mve_get_loop_vctp (body))
+    {

probably a good idea to invert the condition here and return false, 
helps reducing the indenting in this function.


+	/* Starting from the current insn, scan backwards through the insn
+	   chain until BB_HEAD: "for each insn in the BB prior to the current".
+	*/

There's a trailing whitespace after insn, but also I'd rewrite this bit. 
The "for each insn in the BB prior to the current" is superfluous and 
even confusing to me. How about:
"Scan backwards from the current INSN through the instruction chain 
until the start of the basic block.  "


  I find 'that previous insn' to be confusing as you don't mention any 
previous insn before. So how about something along the lines of:
'If a previous insn defines a register that INSN uses then return true 
if...'


Do we need to check: 'insn != prev_insn' ? Any reason why you can't 
start the loop with:
'for (rtx_insn *prev_insn = PREV_INSN (insn);'

Now I also found a case where things might go wrong in:
+	    /* Look at all the DEFs of that previous insn: if one of them is on
+	       the same REG as our current insn, then recurse in order to check
+	       that insn's USEs.  If any of these insns return true as
+	       MVE_VPT_UNPREDICATED_INSN_Ps, then the whole chain is affected
+	       by the change in behaviour from being placed in dlstp/letp loop.
+	    */
+	    df_ref prev_insn_defs = NULL;
+	    FOR_EACH_INSN_DEF (prev_insn_defs, prev_insn)
+	      {
+		if (DF_REF_REGNO (insn_uses) == DF_REF_REGNO (prev_insn_defs)
+		    && insn != prev_insn
+		    && body == BLOCK_FOR_INSN (prev_insn)
+		    && !arm_mve_vec_insn_is_predicated_with_this_predicate
+			 (insn, vctp_vpr_generated)
+		    && arm_mve_check_df_chain_back_for_implic_predic
+			 (prev_insn, vctp_vpr_generated))
+		  return true;
+	      }

The body == BLOCK_FOR_INSN (prev_insn) hinted me at it, if a def comes 
from outside of the BB (so outside of the loop's body) then its by 
definition unpredicated by vctp.  I think you want to check that if 
prev_insn defines a register used by insn then return true if prev_insn 
isn't in the same BB or has a chain that is not predicated, i.e.: 
'!arm_mve_vec_insn_is_predicated_with_this_predicate (insn, 
vctp_vpr_generated) && arm_mve_check_df_chain_back_for_implic_predic 
prev_insn, vctp_vpr_generated))' you check body != BLOCK_FOR_INSN 
(prev_insn)'


I also found some other issues, this currently loloops:

uint16_t  test (uint16_t *a, int n)
{
   uint16_t res =0;
   while (n > 0)
     {
       mve_pred16_t p = vctp16q (n);
       uint16x8_t va = vldrhq_u16 (a);
       res = vaddvaq_u16 (res, va);
       res = vaddvaq_p_u16 (res, va, p);
       a += 8;
       n -= 8;
     }
   return res;
}

But it shouldn't, this is because there's a lack of handling of across 
vector instructions. Luckily in MVE all across vector instructions have 
the side-effect that they write to a scalar register, even the vshlcq 
instruction (it writes to a scalar carry output).

Did this lead me to find an ICE with:

uint16x8_t  test (uint16_t *a, int n)
{
   uint16x8_t res = vdupq_n_u16 (0);
   while (n > 0)
     {
       uint16_t carry = 0;
       mve_pred16_t p = vctp16q (n);
       uint16x8_t va = vldrhq_u16 (a);
       res = vshlcq_u16 (va, &carry, 1);
       res = vshlcq_m_u16 (res, &carry, 1 , p);
       a += 8;
       n -= 8;
     }
   return res;
}

This is because:
+	      /* If the USE is outside the loop body bb, or it is inside, but
+		 is an unpredicated store to memory.  */
+	      if (BLOCK_FOR_INSN (insn) != BLOCK_FOR_INSN (next_use_insn)
+		 || (arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate
+		     (next_use_insn, vctp_vpr_generated)
+		    && mve_memory_operand
+			(SET_DEST (single_set (next_use_insn)),
+			 GET_MODE (SET_DEST (single_set (next_use_insn))))))
+		return true;

Assumes single_set doesn't return 0.

Let's deal with these issues and I'll continue to review.

On 15/06/2023 12:47, Stamatis Markianos-Wright via Gcc-patches wrote:
>      Hi all,
> 
>      This is the 2/2 patch that contains the functional changes needed
>      for MVE Tail Predicated Low Overhead Loops.  See my previous email
>      for a general introduction of MVE LOLs.
> 
>      This support is added through the already existing loop-doloop
>      mechanisms that are used for non-MVE dls/le looping.
> 
>      Mid-end changes are:
> 
>      1) Relax the loop-doloop mechanism in the mid-end to allow for
>         decrement numbers other that -1 and for `count` to be an
>         rtx containing a simple REG (which in this case will contain
>         the number of elements to be processed), rather
>         than an expression for calculating the number of iterations.
>      2) Added a new df utility function: `df_bb_regno_only_def_find` that
>         will return the DEF of a REG only if it is DEF-ed once within the
>         basic block.
> 
>      And many things in the backend to implement the above optimisation:
> 
>      3)  Implement the `arm_predict_doloop_p` target hook to instruct the
>          mid-end about Low Overhead Loops (MVE or not), as well as
>          `arm_loop_unroll_adjust` which will prevent unrolling of any loops
>          that are valid for becoming MVE Tail_Predicated Low Overhead Loops
>          (unrolling can transform a loop in ways that invalidate the dlstp/
>          letp tranformation logic and the benefit of the dlstp/letp loop
>          would be considerably higher than that of unrolling)
>      4)  Appropriate changes to the define_expand of doloop_end, new
>          patterns for dlstp and letp, new iterators,  unspecs, etc.
>      5) `arm_mve_loop_valid_for_dlstp` and a number of checking functions:
>         * `arm_mve_dlstp_check_dec_counter`
>         * `arm_mve_dlstp_check_inc_counter`
>         * `arm_mve_check_reg_origin_is_num_elems`
>         * `arm_mve_check_df_chain_back_for_implic_predic`
>         * `arm_mve_check_df_chain_fwd_for_implic_predic_impact`
>         This all, in smoe way or another, are running checks on the loop
>         structure in order to determine if the loop is valid for dlstp/letp
>         transformation.
>      6) `arm_attempt_dlstp_transform`: (called from the define_expand of
>          doloop_end) this function re-checks for the loop's suitability for
>          dlstp/letp transformation and then implements it, if possible.
>      7) Various utility functions:
>         *`arm_mve_get_vctp_lanes` to map
>         from vctp unspecs to number of lanes, and 
> `arm_get_required_vpr_reg`
>         to check an insn to see if it requires the VPR or not.
>         * `arm_mve_get_loop_vctp`
>         * `arm_mve_get_vctp_lanes`
>         * `arm_emit_mve_unpredicated_insn_to_seq`
>         * `arm_get_required_vpr_reg`
>         * `arm_get_required_vpr_reg_param`
>         * `arm_get_required_vpr_reg_ret_val`
>         * `arm_mve_vec_insn_is_predicated_with_this_predicate`
>         * `arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate`
> 
>      No regressions on arm-none-eabi with various targets and on
>      aarch64-none-elf. Thoughts on getting this into trunk?
> 
>      Thank you,
>      Stam Markianos-Wright
> 
>      gcc/ChangeLog:
> 
>              * config/arm/arm-protos.h (arm_target_insn_ok_for_lob): 
> Rename to...
>              (arm_target_bb_ok_for_lob): ...this
>              (arm_attempt_dlstp_transform): New.
>              * config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): New.
>              (TARGET_PREDICT_DOLOOP_P): New.
>              (arm_block_set_vect):
>              (arm_target_insn_ok_for_lob): Rename from 
> arm_target_insn_ok_for_lob.
>              (arm_target_bb_ok_for_lob): New.
>              (arm_mve_get_vctp_lanes): New.
>              (arm_get_required_vpr_reg): New.
>              (arm_get_required_vpr_reg_param): New.
>              (arm_get_required_vpr_reg_ret_val): New.
>              (arm_mve_get_loop_vctp): New.
> (arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate): New.
>              (arm_mve_vec_insn_is_predicated_with_this_predicate): New.
>              (arm_mve_check_df_chain_back_for_implic_predic): New.
>              (arm_mve_check_df_chain_fwd_for_implic_predic_impact): New.
>              (arm_mve_check_reg_origin_is_num_elems): New.
>              (arm_mve_dlstp_check_inc_counter): New.
>              (arm_mve_dlstp_check_dec_counter): New.
>              (arm_mve_loop_valid_for_dlstp): New.
>              (arm_predict_doloop_p): New.
>              (arm_loop_unroll_adjust): New.
>              (arm_emit_mve_unpredicated_insn_to_seq): New.
>              (arm_attempt_dlstp_transform): New.
>              * config/arm/iterators.md (DLSTP): New.
>              (mode1): Add DLSTP mappings.
>              * config/arm/mve.md (*predicated_doloop_end_internal): New.
>              (dlstp<mode1>_insn): New.
>              * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
>              * config/arm/unspecs.md: New unspecs.
>              * df-core.cc (df_bb_regno_only_def_find): New.
>              * df.h (df_bb_regno_only_def_find): New.
>              * loop-doloop.cc (doloop_condition_get): Relax conditions.
>              (doloop_optimize): Add support for elementwise LoLs.
> 
>      gcc/testsuite/ChangeLog:
> 
>              * gcc.target/arm/lob.h: Update framework.
>              * gcc.target/arm/lob1.c: Likewise.
>              * gcc.target/arm/lob6.c: Likewise.
>              * gcc.target/arm/mve/dlstp-compile-asm.c: New test.
>              * gcc.target/arm/mve/dlstp-int16x8.c: New test.
>              * gcc.target/arm/mve/dlstp-int32x4.c: New test.
>              * gcc.target/arm/mve/dlstp-int64x2.c: New test.
>              * gcc.target/arm/mve/dlstp-int8x16.c: New test.
>              * gcc.target/arm/mve/dlstp-invalid-asm.c: New test.

next prev parent reply	other threads:[~2023-06-23 10:23 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15 11:47 Stamatis Markianos-Wright
2023-06-22 15:54 ` Andre Vieira (lists)
2023-07-05 16:11   ` Stamatis Markianos-Wright
2023-06-23 10:23 ` Andre Vieira (lists) [this message]
2023-07-05 16:13   ` Stamatis Markianos-Wright
2023-06-23 16:25 ` Andre Vieira (lists)
2023-07-05 16:41   ` Stamatis Markianos-Wright
  -- strict thread matches above, loose matches on Subject: below --
2023-12-18 11:53 [PATCH 0/2] " Andre Vieira
2023-12-18 11:53 ` [PATCH 2/2] " Andre Vieira
2023-12-20 16:54   ` Andre Vieira (lists)
2023-11-06 11:20 Stamatis Markianos-Wright
2023-08-17 10:31 Stamatis Markianos-Wright
2022-11-11 17:40 Stam Markianos-Wright
2022-11-15 15:51 ` Andre Vieira (lists)
2022-11-28 12:13   ` Stam Markianos-Wright

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7dbe3fde-5cdb-cd2d-cda4-16f02385906f@arm.com \
    --to=andre.simoesdiasvieira@arm.com \
    --cc=Kyrylo.Tkachov@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nickc@redhat.com \
    --cc=ramana.gcc@gmail.com \
    --cc=stam.markianos-wright@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).