public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Stam Markianos-Wright <Stam.Markianos-Wright@arm.com>,
	Stamatis Markianos-Wright via Gcc-patches
	<gcc-patches@gcc.gnu.org>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
	Richard Sandiford <Richard.Sandiford@arm.com>,
	Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: Re: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops
Date: Mon, 27 Nov 2023 12:47:46 +0000	[thread overview]
Message-ID: <76426214-33e6-40eb-aa9c-94f357945c4b@arm.com> (raw)
In-Reply-To: <6cab7952-9b00-4d85-8fbb-c8058d2142d2@arm.com>

Hi Stam,

Just some comments.

+/* Recursively scan through the DF chain backwards within the basic 
block and
+   determine if any of the USEs of the original insn (or the USEs of 
the insns
s/Recursively scan/Scan/ as you no longer recurse, thanks for that by 
the way :) +   where thy were DEF-ed, etc., recursively) were affected 
by implicit VPT
remove recursively for the same reasons.

+      if (!CONST_INT_P (cond_counter_iv.step) || !CONST_INT_P 
(cond_temp_iv.step))
+	return NULL;
+      /* Look at the steps and swap around the rtx's if needed.  Error 
out if
+	 one of them cannot be identified as constant.  */
+      if (INTVAL (cond_counter_iv.step) != 0 && INTVAL 
(cond_temp_iv.step) != 0)
+	return NULL;

Move the comment above the if before, as the erroring out it talks about 
is there.

+      emit_note_after ((enum insn_note)NOTE_KIND (insn), BB_END (body));
  space after 'insn_note)'

@@ -173,14 +176,14 @@ doloop_condition_get (rtx_insn *doloop_pat)
    if (! REG_P (reg))
      return 0;
  -  /* Check if something = (plus (reg) (const_int -1)).
+  /* Check if something = (plus (reg) (const_int -n)).
       On IA-64, this decrement is wrapped in an if_then_else.  */
    inc_src = SET_SRC (inc);
    if (GET_CODE (inc_src) == IF_THEN_ELSE)
      inc_src = XEXP (inc_src, 1);
    if (GET_CODE (inc_src) != PLUS
        || XEXP (inc_src, 0) != reg
-      || XEXP (inc_src, 1) != constm1_rtx)
+      || !CONST_INT_P (XEXP (inc_src, 1)))

Do we ever check that inc_src is negative? We used to check if it was 
-1, now we only check it's a constnat, but not a negative one, so I 
suspect this needs a:
|| INTVAL (XEXP (inc_src, 1)) >= 0

@@ -492,7 +519,8 @@ doloop_modify (class loop *loop, class niter_desc *desc,
      case GE:
        /* Currently only GE tests against zero are supported.  */
        gcc_assert (XEXP (condition, 1) == const0_rtx);
-
+      /* FALLTHRU */
+    case GTU:
        noloop = constm1_rtx;

I spent a very long time staring at this trying to understand why noloop 
= constm1_rtx for GTU, where I thought it should've been (count & 
(n-1)). For the current use of doloop it doesn't matter because ARM is 
the only target using it and you set desc->noloop_assumptions to 
null_rtx in 'arm_attempt_dlstp_transform' so noloop is never used. 
However, if a different target accepts this GTU pattern then this target 
agnostic code will do the wrong thing.  I suggest we either:
  - set noloop to what we think might be the correct value, which if you 
ask me should be 'count & (XEXP (condition, 1))',
  - or add a gcc_assert (GET_CODE (condition) != GTU); under the if 
(desc->noloop_assumption); part and document why.  I have a slight 
preference for the assert given otherwise we are adding code that we 
can't test.

LGTM otherwise (but I don't have the power to approve this ;)).

Kind regards,
Andre
________________________________________
From: Stamatis Markianos-Wright <stam.markianos-wright@arm.com>
Sent: Thursday, November 16, 2023 11:36 AM
To: Stamatis Markianos-Wright via Gcc-patches; Richard Earnshaw; Richard 
Sandiford; Kyrylo Tkachov
Subject: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low 
Overhead Loops

Pinging back to the top of reviewers' inboxes due to worry about Stage 1
End in a few days :)


See the last email for the latest version of the 2/2 patch. The 1/2
patch is A-Ok from Kyrill's earlier target-backend review.


On 10/11/2023 12:41, Stamatis Markianos-Wright wrote:
>
> On 06/11/2023 17:29, Stamatis Markianos-Wright wrote:
>>
>> On 06/11/2023 11:24, Richard Sandiford wrote:
>>> Stamatis Markianos-Wright <stam.markianos-wright@arm.com> writes:
>>>>> One of the main reasons for reading the arm bits was to try to answer
>>>>> the question: if we switch to a downcounting loop with a GE
>>>>> condition,
>>>>> how do we make sure that the start value is not a large unsigned
>>>>> number that is interpreted as negative by GE?  E.g. if the loop
>>>>> originally counted up in steps of N and used an LTU condition,
>>>>> it could stop at a value in the range [INT_MAX + 1, UINT_MAX].
>>>>> But the loop might never iterate if we start counting down from
>>>>> most values in that range.
>>>>>
>>>>> Does the patch handle that?
>>>> So AFAICT this is actually handled in the generic code in
>>>> `doloop_valid_p`:
>>>>
>>>> This kind of loops fail because of they are "desc->infinite", then no
>>>> loop-doloop conversion is attempted at all (even for standard
>>>> dls/le loops)
>>>>
>>>> Thanks to that check I haven't been able to trigger anything like the
>>>> behaviour you describe, do you think the doloop_valid_p checks are
>>>> robust enough?
>>> The loops I was thinking of are provably not infinite though. E.g.:
>>>
>>>    for (unsigned int i = 0; i < UINT_MAX - 100; ++i)
>>>      ...
>>>
>>> is known to terminate.  And doloop conversion is safe with the normal
>>> count-down-by-1 approach, so I don't think current code would need
>>> to reject it.  I.e. a conversion to:
>>>
>>>    unsigned int i = UINT_MAX - 101;
>>>    do
>>>      ...
>>>    while (--i != ~0U);
>>>
>>> would be safe, but a conversion to:
>>>
>>>    int i = UINT_MAX - 101;
>>>    do
>>>      ...
>>>    while ((i -= step, i > 0));
>>>
>>> wouldn't, because the loop body would only be executed once.
>>>
>>> I'm only going off the name "infinite" though :)  It's possible that
>>> it has more connotations than that.
>>>
>>> Thanks,
>>> Richard
>>
>> Ack, yep, I see what you mean now, and yep, that kind of loop does
>> indeed pass through doloop_valid_p
>>
>> Interestingly , in the v8-M Arm ARM this is done with:
>>
>> ```
>>
>> boolean IsLastLowOverheadLoop(INSTR_EXEC_STATE_Type state)
>> // This does not check whether a loop is currently active.
>> // If the PE were in a loop, would this be the last one?
>> return UInt(state.LoopCount) <= (1 << (4 - LTPSIZE));
>>
>> ```
>>
>> So architecturally the asm we output would be ok (except maybe the
>> "branch too far subs;bgt;lctp" fallback at
>> `predicated_doloop_end_internal` (maybe that should be `bhi`))... But
>> now GE: isn't looking like an accurate representation of this
>> operation in the compiler.
>>
>> I'm wondering if I should try to make
>> `predicated_doloop_end_internal` contain a comparison along the lines
>> of:
>> (gtu: (plus: (LR) (const_int -num_lanes)) (const_int num_lanes_minus_1))
>>
>> I'll give that a try :)
>>
>> The only reason I'd chosen to go with GE earlier, tbh, was because of
>> the existing handling of GE in loop-doloop.cc
>>
>> Let me know if any other ideas come to your mind!
>>
>>
>> Cheers,
>>
>> Stam
>
>
> It looks like I've had success with the below (diff to previous patch),
> trimmed a bit to only the functionally interesting things::
>
>
>
>
> diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
> index 368d5138ca1..54dd4ee564b 100644
> --- a/gcc/config/arm/thumb2.md
> +++ b/gcc/config/arm/thumb2.md
> @@ -1649,16 +1649,28 @@
>            && (decrement_num = arm_attempt_dlstp_transform (operands[1]))
>            && (INTVAL (decrement_num) != 1))
>          {
> -          insn = emit_insn
> -              (gen_thumb2_addsi3_compare0
> -              (s0, s0, GEN_INT ((-1) * (INTVAL (decrement_num)))));
> -          cmp = XVECEXP (PATTERN (insn), 0, 0);
> -          cc_reg = SET_DEST (cmp);
> -          bcomp = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx);
>            loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[1]);
> -          emit_jump_insn (gen_rtx_SET (pc_rtx,
> -                       gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
> -                                 loc_ref, pc_rtx)));
> +          switch (INTVAL (decrement_num))
> +        {
> +          case 2:
> +            insn = emit_jump_insn (gen_predicated_doloop_end_internal2
> +                        (s0, loc_ref));
> +            break;
> +          case 4:
> +            insn = emit_jump_insn (gen_predicated_doloop_end_internal4
> +                        (s0, loc_ref));
> +            break;
> +          case 8:
> +            insn = emit_jump_insn (gen_predicated_doloop_end_internal8
> +                        (s0, loc_ref));
> +            break;
> +          case 16:
> +            insn = emit_jump_insn (gen_predicated_doloop_end_internal16
> +                        (s0, loc_ref));
> +            break;
> +          default:
> +            gcc_unreachable ();
> +        }
>            DONE;
>          }
>      }
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 93905583b18..c083f965fa9 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -6922,23 +6922,24 @@
>  ;; Originally expanded by 'predicated_doloop_end'.
>  ;; In the rare situation where the branch is too far, we do also need to
>  ;; revert FPSCR.LTPSIZE back to 0x100 after the last iteration.
> -(define_insn "*predicated_doloop_end_internal"
> +(define_insn "predicated_doloop_end_internal<letp_num_lanes>"
>    [(set (pc)
>      (if_then_else
> -       (ge (plus:SI (reg:SI LR_REGNUM)
> -            (match_operand:SI 0 "const_int_operand" ""))
> -        (const_int 0))
> -     (label_ref (match_operand 1 "" ""))
> +       (gtu (unspec:SI [(plus:SI (match_operand:SI 0
> "s_register_operand" "=r")
> +                     (const_int <letp_num_lanes_neg>))]
> +        LETP)
> +        (const_int <letp_num_lanes_minus_1>))
> +     (match_operand 1 "" "")
>       (pc)))
> -   (set (reg:SI LR_REGNUM)
> -    (plus:SI (reg:SI LR_REGNUM) (match_dup 0)))
> +   (set (match_dup 0)
> +    (plus:SI (match_dup 0) (const_int <letp_num_lanes_neg>)))
>     (clobber (reg:CC CC_REGNUM))]
>    "TARGET_HAVE_MVE"
>    {
>      if (get_attr_length (insn) == 4)
>        return "letp\t%|lr, %l1";
>      else
> -      return "subs\t%|lr, #%n0\n\tbgt\t%l1\n\tlctp";
> +      return "subs\t%|lr, #<letp_num_lanes>\n\tbhi\t%l1\n\tlctp";
>    }
>    [(set (attr "length")
>      (if_then_else
> @@ -6947,11 +6948,11 @@
>          (const_int 6)))
>     (set_attr "type" "branch")])
>
> -(define_insn "dlstp<mode1>_insn"
> +(define_insn "dlstp<dlstp_elemsize>_insn"
>    [
>      (set (reg:SI LR_REGNUM)
>       (unspec:SI [(match_operand:SI 0 "s_register_operand" "r")]
>        DLSTP))
>    ]
>    "TARGET_HAVE_MVE"
> -  "dlstp.<mode1>\t%|lr, %0")
> +  "dlstp.<dlstp_elemsize>\t%|lr, %0")
>
> diff --git a/gcc/loop-doloop.cc b/gcc/loop-doloop.cc
> index 6a72700a127..47fdef989b4 100644
> --- a/gcc/loop-doloop.cc
> +++ b/gcc/loop-doloop.cc
> @@ -185,6 +185,7 @@ doloop_condition_get (rtx_insn *doloop_pat)
>        || XEXP (inc_src, 0) != reg
>        || !CONST_INT_P (XEXP (inc_src, 1)))
>      return 0;
> +  int dec_num = abs (INTVAL (XEXP (inc_src, 1)));
>
>    /* Check for (set (pc) (if_then_else (condition)
>                                         (label_ref (label))
> @@ -199,21 +200,32 @@ doloop_condition_get (rtx_insn *doloop_pat)
>    /* Extract loop termination condition.  */
>    condition = XEXP (SET_SRC (cmp), 0);
>
> -  /* We expect a GE or NE comparison with 0 or 1.  */
> -  if ((GET_CODE (condition) != GE
> -       && GET_CODE (condition) != NE)
> -      || (XEXP (condition, 1) != const0_rtx
> -          && XEXP (condition, 1) != const1_rtx))
> +  /* We expect a GE or NE comparison with 0 or 1, or a GTU comparison
> with
> +     dec_num - 1.  */
> +  if (!((GET_CODE (condition) == GE
> +     || GET_CODE (condition) == NE)
> +    && (XEXP (condition, 1) == const0_rtx
> +        || XEXP (condition, 1) == const1_rtx ))
> +      &&!(GET_CODE (condition) == GTU
> +      && ((INTVAL (XEXP (condition, 1))) == (dec_num - 1))))
>      return 0;
>
> -  if ((XEXP (condition, 0) == reg)
> +  /* For the ARM special case of having a GTU: re-form the condition
> without
> +     the unspec for the benefit of the middle-end.  */
> +  if (GET_CODE (condition) == GTU)
> +    {
> +      condition = gen_rtx_fmt_ee (GTU, VOIDmode, inc_src, GEN_INT
> (dec_num - 1));
> +      return condition;
> +    }
> +  else if ((XEXP (condition, 0) == reg)
>        /* For the third case:  */
>        || ((cc_reg != NULL_RTX)
>        && (XEXP (condition, 0) == cc_reg)
>        && (reg_orig == reg))
> @@ -245,20 +257,11 @@ doloop_condition_get (rtx_insn *doloop_pat)
>                         (label_ref (label))
>                         (pc))))])
>
> -    So we return the second form instead for the two cases when n == 1.
> -
> -    For n > 1, the final value may be exceeded, so use GE instead of NE.
> +    So we return the second form instead for the two cases.
>       */
> -     if (GET_CODE (pattern) != PARALLEL)
> -       {
> -    if (INTVAL (XEXP (inc_src, 1)) != -1)
> -      condition = gen_rtx_fmt_ee (GE, VOIDmode, inc_src, const0_rtx);
> -    else
> -      condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx);;
> -       }
> -
> +    condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx);
>      return condition;
> -   }
> +    }
>
>    /* ??? If a machine uses a funny comparison, we could return a
>       canonicalized form here.  */
> @@ -501,7 +504,8 @@ doloop_modify (class loop *loop, class niter_desc
> *desc,
>      case GE:
>        /* Currently only GE tests against zero are supported.  */
>        gcc_assert (XEXP (condition, 1) == const0_rtx);
> -
> +      /* FALLTHRU */
> +    case GTU:
>        noloop = constm1_rtx;
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index a6a7ff507a5..9398702cddd 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -2673,8 +2673,16 @@
>  (define_int_attr mrrc [(VUNSPEC_MRRC "mrrc") (VUNSPEC_MRRC2 "mrrc2")])
>  (define_int_attr MRRC [(VUNSPEC_MRRC "MRRC") (VUNSPEC_MRRC2 "MRRC2")])
>
> -(define_int_attr mode1 [(DLSTP8 "8") (DLSTP16 "16") (DLSTP32 "32")
> -            (DLSTP64 "64")])
> +(define_int_attr dlstp_elemsize [(DLSTP8 "8") (DLSTP16 "16") (DLSTP32
> "32")
> +                 (DLSTP64 "64")])
> +
> +(define_int_attr letp_num_lanes [(LETP8 "16") (LETP16 "8") (LETP32 "4")
> +                 (LETP64 "2")])
> +(define_int_attr letp_num_lanes_neg [(LETP8 "-16") (LETP16 "-8")
> (LETP32 "-4")
> +                     (LETP64 "-2")])
> +
> +(define_int_attr letp_num_lanes_minus_1 [(LETP8 "15") (LETP16 "7")
> (LETP32 "3")
> +                     (LETP64 "1")])
>
>  (define_int_attr opsuffix [(UNSPEC_DOT_S "s8")
>                 (UNSPEC_DOT_U "u8")
> @@ -2921,6 +2929,8 @@
>  (define_int_iterator VQSHLUQ_N [VQSHLUQ_N_S])
>  (define_int_iterator DLSTP [DLSTP8 DLSTP16 DLSTP32
>                     DLSTP64])
> +(define_int_iterator LETP [LETP8 LETP16 LETP32
> +               LETP64])
>
>  ;; Define iterators for VCMLA operations
>  (define_int_iterator VCMLA_OP [UNSPEC_VCMLA
>        /* The iteration count does not need incrementing for a GE
> test.  */
> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
> index 12ae4c4f820..2d6f27c14f4 100644
> --- a/gcc/config/arm/unspecs.md
> +++ b/gcc/config/arm/unspecs.md
> @@ -587,6 +587,10 @@
>    DLSTP16
>    DLSTP32
>    DLSTP64
> +  LETP8
> +  LETP16
> +  LETP32
> +  LETP64
>    VPNOT
>    VCREATEQ_F
>    VCVTQ_N_TO_F_S
>
>
> I've attached the whole [2/2] patch diff with this change and
> the required comment changes in doloop_condition_get.
> WDYT?
>
>
> Thanks,
>
> Stam
>
>
>>
>>

  reply	other threads:[~2023-11-27 12:47 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-17 10:31 [PATCH " Stamatis Markianos-Wright
2023-09-06 17:19 ` [PING][PATCH " Stamatis Markianos-Wright
2023-09-14 12:10   ` Kyrylo Tkachov
2023-09-28 12:51     ` Andre Vieira (lists)
2023-10-11 11:34       ` Stamatis Markianos-Wright
2023-10-23 10:16         ` Andre Vieira (lists)
2023-10-24 15:11   ` Richard Sandiford
2023-11-06 11:03     ` Stamatis Markianos-Wright
2023-11-06 11:24       ` Richard Sandiford
2023-11-06 17:29         ` Stamatis Markianos-Wright
2023-11-10 12:41           ` Stamatis Markianos-Wright
2023-11-16 11:36             ` Stamatis Markianos-Wright
2023-11-27 12:47               ` Andre Vieira (lists) [this message]
2023-11-30 12:55                 ` Stamatis Markianos-Wright
2023-12-07 18:08                   ` Andre Vieira (lists)
2023-12-09 18:31                   ` Richard Sandiford
2023-12-12 17:56                   ` Richard Earnshaw

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=76426214-33e6-40eb-aa9c-94f357945c4b@arm.com \
    --to=andre.simoesdiasvieira@arm.com \
    --cc=Kyrylo.Tkachov@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=Richard.Sandiford@arm.com \
    --cc=Stam.Markianos-Wright@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).