From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=lSxa=GB=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 238113858D33
	for <gcc-patches@gcc.gnu.org>; Thu, 19 Oct 2023 19:41:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 238113858D33
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 238113858D33
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697744493; cv=none;
	b=QntgymK29C9outfiNRfRa+kPQ1Te2ewkqWCOt7QN/jZM4IVowDLhI17m4nd5tLsKuxJugooxBzjwWOb7e5vKL0HJlvxk1HtBEEMzTp8V9ogrHh7tHxxrt8fCE+Rf/Epzm2E2OXhT5kL/h/t6xtDz1ipc7WHMxiDKyfCc263YBaw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1697744493; c=relaxed/simple;
	bh=14En31sSSw6J5e1qaqCNRSZPlOrQRewCjeOK79PuZZg=;
	h=From:To:Subject:Date:Message-ID:MIME-Version; b=hC7EqUecxsMdB9CF2JaDJ0NHPOMQPnU3jvNz2SKExStNwkHcWWId0uk1Dmio6VOdUEmZpCGU87CWjPl7QS/HEJJ7j/YS6pVecs18UjT36B4DFBswilFa/RV1+E4jYzMgV/GEvZe6ANdI5V/4ZI8A1/xQ3Qn7E9eUXLEcKzr5bv8=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B8FD92F4;
	Thu, 19 Oct 2023 12:42:12 -0700 (PDT)
Received: from localhost (unknown [10.32.110.65])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E92143F5A1;
	Thu, 19 Oct 2023 12:41:30 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Manolis Tsamis <manolis.tsamis@vrull.eu>
Mail-Followup-To: Manolis Tsamis <manolis.tsamis@vrull.eu>,gcc-patches@gcc.gnu.org,  Philipp Tomsich <philipp.tomsich@vrull.eu>,  Robin Dapp <rdapp@linux.ibm.com>,  Jakub Jelinek <jakub@redhat.com>, richard.sandiford@arm.com
Cc: gcc-patches@gcc.gnu.org,  Philipp Tomsich <philipp.tomsich@vrull.eu>,  Robin Dapp <rdapp@linux.ibm.com>,  Jakub Jelinek <jakub@redhat.com>
Subject: Re: [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets
References: <20230830101400.1539313-1-manolis.tsamis@vrull.eu>
	<20230830101400.1539313-2-manolis.tsamis@vrull.eu>
Date: Thu, 19 Oct 2023 20:41:29 +0100
In-Reply-To: <20230830101400.1539313-2-manolis.tsamis@vrull.eu> (Manolis
	Tsamis's message of "Wed, 30 Aug 2023 12:13:57 +0200")
Message-ID: <mptedhqpfh2.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-23.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> This is an extension of what was done in PR106590.
>
> Currently if a sequence generated in noce_convert_multiple_sets clobbers the
> condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
> (sequences that emit the comparison itself). Since this applies only from the
> next iteration it assumes that the sequences generated (in particular seq2)
> doesn't clobber the condition rtx itself before using it in the if_then_else,
> which is only true in specific cases (currently only register/subregister moves
> are allowed).
>
> This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in
> the current iteration. This makes it possible to include arithmetic operations
> in noce_convert_multiple_sets.
>
> gcc/ChangeLog:
>
> 	* ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead.
> 	(noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp.
>
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> ---
>
> (no changes since v1)
>
>  gcc/ifcvt.cc | 49 +++++++++++++++++++------------------------------
>  1 file changed, 19 insertions(+), 30 deletions(-)

Sorry for the slow review.  TBH I was hoping someone else would pick
it up, since (a) I'm not very familiar with this code, and (b) I don't
really agree with the way that the current code works.  I'm not sure the
current dependency checking is safe, so I'm nervous about adding even
more cases to it.  And it feels like the different ifcvt techniques are
not now well distinguished, so that they're beginning to overlap and
compete with one another.  None of that is your fault, of course. :)

> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index a0af553b9ff..3273aeca125 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3375,20 +3375,6 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>    return true;
>  }
>  
> -/* Helper function for noce_convert_multiple_sets_1.  If store to
> -   DEST can affect P[0] or P[1], clear P[0].  Called via note_stores.  */
> -
> -static void
> -check_for_cc_cmp_clobbers (rtx dest, const_rtx, void *p0)
> -{
> -  rtx *p = (rtx *) p0;
> -  if (p[0] == NULL_RTX)
> -    return;
> -  if (reg_overlap_mentioned_p (dest, p[0])
> -      || (p[1] && reg_overlap_mentioned_p (dest, p[1])))
> -    p[0] = NULL_RTX;
> -}
> -
>  /* This goes through all relevant insns of IF_INFO->then_bb and tries to
>     create conditional moves.  In case a simple move sufficis the insn
>     should be listed in NEED_NO_CMOV.  The rewired-src cases should be
> @@ -3552,9 +3538,17 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>  	 creating an additional compare for each.  If successful, costing
>  	 is easier and this sequence is usually preferred.  */
>        if (cc_cmp)
> -	seq2 = try_emit_cmove_seq (if_info, temp, cond,
> -				   new_val, old_val, need_cmov,
> -				   &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
> +	{
> +	  seq2 = try_emit_cmove_seq (if_info, temp, cond,
> +				     new_val, old_val, need_cmov,
> +				     &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
> +
> +	  /* The if_then_else in SEQ2 may be affected when cc_cmp/rev_cc_cmp is
> +	     clobbered.  We can't safely use the sequence in this case.  */
> +	  if (seq2 && (modified_in_p (cc_cmp, seq2)
> +	      || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq2))))
> +	    seq2 = NULL;

modified_in_p only checks the first instruction in seq2, not the whole
sequence.

I think the unpatched approach is OK in cases where seq2 clobbers
cc_cmp/rev_cc_cmp in or after the last use, since instructions are
defined to operate on a read-all/compute/write-all basis.

Soon after the snippet above, the unpatched code has this loop:

      /* The backend might have created a sequence that uses the
	 condition.  Check this.  */
      rtx_insn *walk = seq2;
      while (walk)
	{
	  rtx set = single_set (walk);

	  if (!set || !SET_SRC (set))

This condition looks odd.  A SET_SRC is never null.  But more importantly,
continuing means "assume the best", and I don't think we should assume
the best for (say) an insn with two parallel sets.

It doesn't look like the series addresses this, but !set seems more
likely to occur if we extend the function to general operations.

	    {
	      walk = NEXT_INSN (walk);
	      continue;
	    }

	  rtx src = SET_SRC (set);

	  if (XEXP (set, 1) && GET_CODE (XEXP (set, 1)) == IF_THEN_ELSE)
	    ; /* We assume that this is the cmove created by the backend that
		 naturally uses the condition.  Therefore we ignore it.  */

XEXP (set, 1) is src.  But here too, I'm a bit nervous about assuming
that any if_then_else is safe to ignore, even currently.  It seems
more dangerous if we extend the function to handle more cases.

	  else
	    {
	      if (reg_mentioned_p (XEXP (cond, 0), src)
		  || reg_mentioned_p (XEXP (cond, 1), src))
		{
		  read_comparison = true;
		  break;
		}
	    }

	  walk = NEXT_INSN (walk);
	}

Maybe a safer way to check whether "insn" is sensitive to the values
in "val" is to use:

  subrtx_iterator::array_type array;
  FOR_EACH_SUBRTX (iter, array, val, NONCONST)
    if (OBJECT_P (*iter) && reg_overlap_mentioned_p (*iter, insn))
      return true;
  return false;

which doesn't assume that insn is a single set.

But I'm not sure which cases this code is trying to catch.  Is it trying
to catch cases where seq2 "spontaneously" uses registers that happen to
overlap with cond?  If so, then when does that happen?  And if it does
happen, wouldn't the sequence also have to set the registers first?

If instead the code is trying to detect uses of cond that were fed
to seq2 outside of cond itself, via try_emit_cmove_seq, then wouldn't
it be enough to check old_val and new_val?

Anyway, the point I was trying to get to was: we could use the
FOR_EACH_SUBRTX above to check whether an instruction in seq2
is sensitive to the value of cc_cmp/rev_cc_cmp.  And we can use
modified_in_p, as in your patch, to check whether an instruction
changes the value of cc_cmp/rev_cc_cmp.  We could therefore record
whether we've seen a modification of cc_cmp/rev_cc_cmp, then reject
seq2 if we see a subsequent use.

Thanks,
Richard

> +	}
>  
>        /* The backend might have created a sequence that uses the
>  	 condition.  Check this.  */
> @@ -3609,21 +3603,16 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>  	  return false;
>  	}
>  
> -      if (cc_cmp)
> +      if (cc_cmp && seq == seq1)
>  	{
> -	  /* Check if SEQ can clobber registers mentioned in
> -	     cc_cmp and/or rev_cc_cmp.  If yes, we need to use
> -	     only seq1 from that point on.  */
> -	  rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp };
> -	  for (walk = seq; walk; walk = NEXT_INSN (walk))
> +	  /* Check if SEQ can clobber registers mentioned in cc_cmp/rev_cc_cmp.
> +	     If yes, we need to use only seq1 from that point on.
> +	     Only check when we use seq1 since we have already tested seq2.  */
> +	  if (modified_in_p (cc_cmp, seq)
> +	      || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq)))
>  	    {
> -	      note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair);
> -	      if (cc_cmp_pair[0] == NULL_RTX)
> -		{
> -		  cc_cmp = NULL_RTX;
> -		  rev_cc_cmp = NULL_RTX;
> -		  break;
> -		}
> +	      cc_cmp = NULL_RTX;
> +	      rev_cc_cmp = NULL_RTX;
>  	    }
>  	}