Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Stamatis Markianos-Wright <stam.markianos-wright@arm.com>
To: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
	ramana.gcc@gmail.com, "nickc@redhat.com" <nickc@redhat.com>
Subject: Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops
Date: Wed, 5 Jul 2023 17:13:32 +0100	[thread overview]
Message-ID: <1b7a0eac-724f-52fe-6bac-ee6c86136525@arm.com> (raw)
In-Reply-To: <7dbe3fde-5cdb-cd2d-cda4-16f02385906f@arm.com>


On 23/06/2023 11:23, Andre Vieira (lists) wrote:
> +  if (insn != arm_mve_get_loop_vctp (body))
> +    {
>
> probably a good idea to invert the condition here and return false, 
> helps reducing the indenting in this function.

Done, thanks


>
>
> +    /* Starting from the current insn, scan backwards through the insn
> +       chain until BB_HEAD: "for each insn in the BB prior to the 
> current".
> +    */
>
> There's a trailing whitespace after insn, but also I'd rewrite this 
> bit. The "for each insn in the BB prior to the current" is superfluous 
> and even confusing to me. How about:
> "Scan backwards from the current INSN through the instruction chain 
> until the start of the basic block.  "
Yes, agreed, it wasn't very clear. Done.
>
>
>  I find 'that previous insn' to be confusing as you don't mention any 
> previous insn before. So how about something along the lines of:
> 'If a previous insn defines a register that INSN uses then return true 
> if...'
Done
>
>
> Do we need to check: 'insn != prev_insn' ? Any reason why you can't 
> start the loop with:
> 'for (rtx_insn *prev_insn = PREV_INSN (insn);'

True! Done.

>
> Now I also found a case where things might go wrong in:
> +        /* Look at all the DEFs of that previous insn: if one of them 
> is on
> +           the same REG as our current insn, then recurse in order to 
> check
> +           that insn's USEs.  If any of these insns return true as
> +           MVE_VPT_UNPREDICATED_INSN_Ps, then the whole chain is 
> affected
> +           by the change in behaviour from being placed in dlstp/letp 
> loop.
> +        */
> +        df_ref prev_insn_defs = NULL;
> +        FOR_EACH_INSN_DEF (prev_insn_defs, prev_insn)
> +          {
> +        if (DF_REF_REGNO (insn_uses) == DF_REF_REGNO (prev_insn_defs)
> +            && insn != prev_insn
> +            && body == BLOCK_FOR_INSN (prev_insn)
> +            && !arm_mve_vec_insn_is_predicated_with_this_predicate
> +             (insn, vctp_vpr_generated)
> +            && arm_mve_check_df_chain_back_for_implic_predic
> +             (prev_insn, vctp_vpr_generated))
> +          return true;
> +          }
>
> The body == BLOCK_FOR_INSN (prev_insn) hinted me at it, if a def comes 
> from outside of the BB (so outside of the loop's body) then its by 
> definition unpredicated by vctp.  I think you want to check that if 
> prev_insn defines a register used by insn then return true if 
> prev_insn isn't in the same BB or has a chain that is not predicated, 
> i.e.: '!arm_mve_vec_insn_is_predicated_with_this_predicate (insn, 
> vctp_vpr_generated) && arm_mve_check_df_chain_back_for_implic_predic 
> prev_insn, vctp_vpr_generated))' you check body != BLOCK_FOR_INSN 
> (prev_insn)'

Yes, you're right, this is vulnerable here. A neater fix to this (I 
think?) is to make the above REGNO_REG_SET_P more generic, so that it 
covers all scalar values and scalar ops, as well.
Then it's a "if this insn in the loop has any input that originates 
outside the bb, then it's unsafe" check and the recursive loop backwards 
is only for the recursive "are any previous insns unsafe"


>
>
> I also found some other issues, this currently loloops:
>
> uint16_t  test (uint16_t *a, int n)
> {
>   uint16_t res =0;
>   while (n > 0)
>     {
>       mve_pred16_t p = vctp16q (n);
>       uint16x8_t va = vldrhq_u16 (a);
>       res = vaddvaq_u16 (res, va);
>       res = vaddvaq_p_u16 (res, va, p);
>       a += 8;
>       n -= 8;
>     }
>   return res;
> }
>
> But it shouldn't, this is because there's a lack of handling of across 
> vector instructions. Luckily in MVE all across vector instructions 
> have the side-effect that they write to a scalar register, even the 
> vshlcq instruction (it writes to a scalar carry output).

Added support for them (you were right, there was some special handling 
needed!)


>
> Did this lead me to find an ICE with:
>
> uint16x8_t  test (uint16_t *a, int n)
> {
>   uint16x8_t res = vdupq_n_u16 (0);
>   while (n > 0)
>     {
>       uint16_t carry = 0;
>       mve_pred16_t p = vctp16q (n);
>       uint16x8_t va = vldrhq_u16 (a);
>       res = vshlcq_u16 (va, &carry, 1);
>       res = vshlcq_m_u16 (res, &carry, 1 , p);
>       a += 8;
>       n -= 8;
>     }
>   return res;
> }
>
> This is because:
> +          /* If the USE is outside the loop body bb, or it is inside, 
> but
> +         is an unpredicated store to memory.  */
> +          if (BLOCK_FOR_INSN (insn) != BLOCK_FOR_INSN (next_use_insn)
> +         || (arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate
> +             (next_use_insn, vctp_vpr_generated)
> +            && mve_memory_operand
> +            (SET_DEST (single_set (next_use_insn)),
> +             GET_MODE (SET_DEST (single_set (next_use_insn))))))
> +        return true;
>
> Assumes single_set doesn't return 0.

Thanks! That is indeed correct.

Corrected this by having a utility function to scan insn operands and 
check against mve_memory_operand that supports any number of 
operands/SETs in the insn

>
> Let's deal with these issues and I'll continue to review.
>
> On 15/06/2023 12:47, Stamatis Markianos-Wright via Gcc-patches wrote:
>>      Hi all,
>>
>>      This is the 2/2 patch that contains the functional changes needed
>>      for MVE Tail Predicated Low Overhead Loops.  See my previous email
>>      for a general introduction of MVE LOLs.
>>
>>      This support is added through the already existing loop-doloop
>>      mechanisms that are used for non-MVE dls/le looping.
>>
>>      Mid-end changes are:
>>
>>      1) Relax the loop-doloop mechanism in the mid-end to allow for
>>         decrement numbers other that -1 and for `count` to be an
>>         rtx containing a simple REG (which in this case will contain
>>         the number of elements to be processed), rather
>>         than an expression for calculating the number of iterations.
>>      2) Added a new df utility function: `df_bb_regno_only_def_find` 
>> that
>>         will return the DEF of a REG only if it is DEF-ed once within 
>> the
>>         basic block.
>>
>>      And many things in the backend to implement the above optimisation:
>>
>>      3)  Implement the `arm_predict_doloop_p` target hook to instruct 
>> the
>>          mid-end about Low Overhead Loops (MVE or not), as well as
>>          `arm_loop_unroll_adjust` which will prevent unrolling of any 
>> loops
>>          that are valid for becoming MVE Tail_Predicated Low Overhead 
>> Loops
>>          (unrolling can transform a loop in ways that invalidate the 
>> dlstp/
>>          letp tranformation logic and the benefit of the dlstp/letp loop
>>          would be considerably higher than that of unrolling)
>>      4)  Appropriate changes to the define_expand of doloop_end, new
>>          patterns for dlstp and letp, new iterators,  unspecs, etc.
>>      5) `arm_mve_loop_valid_for_dlstp` and a number of checking 
>> functions:
>>         * `arm_mve_dlstp_check_dec_counter`
>>         * `arm_mve_dlstp_check_inc_counter`
>>         * `arm_mve_check_reg_origin_is_num_elems`
>>         * `arm_mve_check_df_chain_back_for_implic_predic`
>>         * `arm_mve_check_df_chain_fwd_for_implic_predic_impact`
>>         This all, in smoe way or another, are running checks on the loop
>>         structure in order to determine if the loop is valid for 
>> dlstp/letp
>>         transformation.
>>      6) `arm_attempt_dlstp_transform`: (called from the define_expand of
>>          doloop_end) this function re-checks for the loop's 
>> suitability for
>>          dlstp/letp transformation and then implements it, if possible.
>>      7) Various utility functions:
>>         *`arm_mve_get_vctp_lanes` to map
>>         from vctp unspecs to number of lanes, and 
>> `arm_get_required_vpr_reg`
>>         to check an insn to see if it requires the VPR or not.
>>         * `arm_mve_get_loop_vctp`
>>         * `arm_mve_get_vctp_lanes`
>>         * `arm_emit_mve_unpredicated_insn_to_seq`
>>         * `arm_get_required_vpr_reg`
>>         * `arm_get_required_vpr_reg_param`
>>         * `arm_get_required_vpr_reg_ret_val`
>>         * `arm_mve_vec_insn_is_predicated_with_this_predicate`
>>         * `arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate`
>>
>>      No regressions on arm-none-eabi with various targets and on
>>      aarch64-none-elf. Thoughts on getting this into trunk?
>>
>>      Thank you,
>>      Stam Markianos-Wright
>>
>>      gcc/ChangeLog:
>>
>>              * config/arm/arm-protos.h (arm_target_insn_ok_for_lob): 
>> Rename to...
>>              (arm_target_bb_ok_for_lob): ...this
>>              (arm_attempt_dlstp_transform): New.
>>              * config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): New.
>>              (TARGET_PREDICT_DOLOOP_P): New.
>>              (arm_block_set_vect):
>>              (arm_target_insn_ok_for_lob): Rename from 
>> arm_target_insn_ok_for_lob.
>>              (arm_target_bb_ok_for_lob): New.
>>              (arm_mve_get_vctp_lanes): New.
>>              (arm_get_required_vpr_reg): New.
>>              (arm_get_required_vpr_reg_param): New.
>>              (arm_get_required_vpr_reg_ret_val): New.
>>              (arm_mve_get_loop_vctp): New.
>> (arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate): New.
>> (arm_mve_vec_insn_is_predicated_with_this_predicate): New.
>>              (arm_mve_check_df_chain_back_for_implic_predic): New.
>> (arm_mve_check_df_chain_fwd_for_implic_predic_impact): New.
>>              (arm_mve_check_reg_origin_is_num_elems): New.
>>              (arm_mve_dlstp_check_inc_counter): New.
>>              (arm_mve_dlstp_check_dec_counter): New.
>>              (arm_mve_loop_valid_for_dlstp): New.
>>              (arm_predict_doloop_p): New.
>>              (arm_loop_unroll_adjust): New.
>>              (arm_emit_mve_unpredicated_insn_to_seq): New.
>>              (arm_attempt_dlstp_transform): New.
>>              * config/arm/iterators.md (DLSTP): New.
>>              (mode1): Add DLSTP mappings.
>>              * config/arm/mve.md (*predicated_doloop_end_internal): New.
>>              (dlstp<mode1>_insn): New.
>>              * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
>>              * config/arm/unspecs.md: New unspecs.
>>              * df-core.cc (df_bb_regno_only_def_find): New.
>>              * df.h (df_bb_regno_only_def_find): New.
>>              * loop-doloop.cc (doloop_condition_get): Relax conditions.
>>              (doloop_optimize): Add support for elementwise LoLs.
>>
>>      gcc/testsuite/ChangeLog:
>>
>>              * gcc.target/arm/lob.h: Update framework.
>>              * gcc.target/arm/lob1.c: Likewise.
>>              * gcc.target/arm/lob6.c: Likewise.
>>              * gcc.target/arm/mve/dlstp-compile-asm.c: New test.
>>              * gcc.target/arm/mve/dlstp-int16x8.c: New test.
>>              * gcc.target/arm/mve/dlstp-int32x4.c: New test.
>>              * gcc.target/arm/mve/dlstp-int64x2.c: New test.
>>              * gcc.target/arm/mve/dlstp-int8x16.c: New test.
>>              * gcc.target/arm/mve/dlstp-invalid-asm.c: New test.

next prev parent reply	other threads:[~2023-07-05 16:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15 11:47 Stamatis Markianos-Wright
2023-06-22 15:54 ` Andre Vieira (lists)
2023-07-05 16:11   ` Stamatis Markianos-Wright
2023-06-23 10:23 ` Andre Vieira (lists)
2023-07-05 16:13   ` Stamatis Markianos-Wright [this message]
2023-06-23 16:25 ` Andre Vieira (lists)
2023-07-05 16:41   ` Stamatis Markianos-Wright
  -- strict thread matches above, loose matches on Subject: below --
2023-12-18 11:53 [PATCH 0/2] " Andre Vieira
2023-12-18 11:53 ` [PATCH 2/2] " Andre Vieira
2023-12-20 16:54   ` Andre Vieira (lists)
2023-11-06 11:20 Stamatis Markianos-Wright
2023-08-17 10:31 Stamatis Markianos-Wright
2022-11-11 17:40 Stam Markianos-Wright
2022-11-15 15:51 ` Andre Vieira (lists)
2022-11-28 12:13   ` Stam Markianos-Wright

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b7a0eac-724f-52fe-6bac-ee6c86136525@arm.com \
    --to=stam.markianos-wright@arm.com \
    --cc=Kyrylo.Tkachov@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=andre.simoesdiasvieira@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nickc@redhat.com \
    --cc=ramana.gcc@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).