Re: [PR103302] skip multi-word pre-move clobber during lra

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Alexandre Oliva <oliva@adacore.com>
To: Jeff Law <jeffreyalaw@gmail.com>, Vladimir Makarov <vmakarov@redhat.com>
Cc: gcc-patches@gcc.gnu.org, zsojka@seznam.cz, wilson@gcc.gnu.org
Cc: Sebastian Huber <sebastian.huber@embedded-brains.de>
Subject: Re: [PR103302] skip multi-word pre-move clobber during lra
Date: Fri, 18 Feb 2022 20:27:21 -0300	[thread overview]
Message-ID: <orbkz37o2u.fsf@lxoliva.fsfla.org> (raw)
In-Reply-To: <03226f85-0334-1840-94f9-6c618175f974@gmail.com> (Jeff Law's message of "Wed, 15 Dec 2021 09:00:01 -0700")

On Dec 15, 2021, Jeff Law <jeffreyalaw@gmail.com> wrote:

>> * expr.c (emit_move_complex_parts): Skip clobbers during lra.
> OK for the next cycle.

Thanks, but having looked into PR 104121, I withdraw this patch and also
the already-installed patch for PR 103302.  As I found out, LRA does
worse without the clobbers for multi-word moves, not only because the
clobbers shorten live ranges and enable earlier and better allocations,
but also because find_reload_regno_insns implicitly, possibly
unknowingly, relied on the clobbers to avoid the risk of an infinite
loop.

As noted in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104121#c11 with
the clobber, a multi-word reload, and the insn the reload applies to, we
get 4 insns, so find_reload_regno_insns avoids the loop.  Without the
clobber, a multi-word reload for either input or output makes for n==3,
so we enter the loop and don't ever exit it: we'll find first_insn
(input) or second_insn (output), but then we'll loop forever because we
won't iterate again on {prev,next}_insn, respectively, and the other
iterator won't find the other word reload.  We advance the other till
the end, but that's not enough for us to terminate the loop.

With the proposed patch reversal, we no longer hit the problem with the
v850 testcase in 104121, but I'm concerned we might still get an
infinite loop on ports whose input or output reloads might emit a pair
of insns without a clobber.

I see two ways to robustify it.  One is to find a complete reload
sequence:

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index c1d40ea2a14bd..ff1688917cbce 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1716,9 +1716,18 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish)
 	start_insn = lra_insn_recog_data[uid]->insn;
       n++;
     }
-  /* For reload pseudo we should have at most 3 insns referring for
-     it: input/output reload insns and the original insn.  */
-  if (n > 3)
+  /* For reload pseudo we should have at most 3 (sequences of) insns
+     referring for it: input/output reload insn sequences and the
+     original insn.  Each reload insn sequence may amount to multiple
+     insns, but we expect to find each of them contiguous, one before
+     start_insn, one after it.  We know start_insn is between the
+     sequences because it's the lowest-numbered insn, thus the first
+     we'll have found above.  The reload insns, emitted later, will
+     have been assigned higher insn uids.  If this assumption doesn't
+     hold, and there happen to be intervening reload insns for other
+     pseudos, we may end up returning false after searching to the end
+     in the wrong direction.  */
+  if (n > 1 + 2 * CEIL (lra_reg_info[regno].biggest_mode, UNITS_PER_WORD))
     return false;
   if (n > 1)
     {
@@ -1726,26 +1735,52 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish)
 	     next_insn = NEXT_INSN (start_insn);
 	   n != 1 && (prev_insn != NULL || next_insn != NULL); )
 	{
-	  if (prev_insn != NULL && first_insn == NULL)
+	  if (prev_insn != NULL)
 	    {
 	      if (! bitmap_bit_p (&lra_reg_info[regno].insn_bitmap,
 				  INSN_UID (prev_insn)))
 		prev_insn = PREV_INSN (prev_insn);
 	      else
 		{
-		  first_insn = prev_insn;
-		  n--;
+		  /* A reload sequence may have multiple insns, but
+		     they must be contiguous.  */
+		  do
+		    {
+		      first_insn = prev_insn;
+		      n--;
+		      prev_insn = PREV_INSN (prev_insn);
+		    }
+		  while (n > 1 && prev_insn
+			 && bitmap_bit_p (&lra_reg_info[regno].insn_bitmap,
+					  INSN_UID (prev_insn)));
+		  /* After finding first_insn, we don't want to search
+		     backward any more, so set prev_insn to NULL so as
+		     to not loop indefinitely.  */
+		  prev_insn = NULL;
 		}
 	    }
-	  if (next_insn != NULL && second_insn == NULL)
+	  else if (next_insn != NULL)
 	    {
 	      if (! bitmap_bit_p (&lra_reg_info[regno].insn_bitmap,
 				INSN_UID (next_insn)))
 		next_insn = NEXT_INSN (next_insn);
 	      else
 		{
-		  second_insn = next_insn;
-		  n--;
+		  /* A reload sequence may have multiple insns, but
+		     they must be contiguous.  */
+		  do
+		    {
+		      second_insn = next_insn;
+		      n--;
+		      next_insn = NEXT_INSN (next_insn);
+		    }
+		  while (n > 1 && next_insn
+			 && bitmap_bit_p (&lra_reg_info[regno].insn_bitmap,
+					  INSN_UID (next_insn)));
+		  /* After finding second_insn, we don't want to
+		     search forward any more, so set next_insn to NULL
+		     so as to not loop indefinitely.  */
+		  next_insn = NULL;
 		}
 	    }
 	}


the other is to just prevent the infinite loop, that will then return
false because n > 1 after the loop ends:

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index c1d40ea2a14bd..efd5f764a37a5 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1726,7 +1726,7 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish)
 	     next_insn = NEXT_INSN (start_insn);
 	   n != 1 && (prev_insn != NULL || next_insn != NULL); )
 	{
-	  if (prev_insn != NULL && first_insn == NULL)
+	  if (prev_insn != NULL)
 	    {
 	      if (! bitmap_bit_p (&lra_reg_info[regno].insn_bitmap,
 				  INSN_UID (prev_insn)))
@@ -1735,9 +1735,10 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish)
 		{
 		  first_insn = prev_insn;
 		  n--;
+		  prev_insn = NULL;
 		}
 	    }
-	  if (next_insn != NULL && second_insn == NULL)
+	  if (next_insn != NULL)
 	    {
 	      if (! bitmap_bit_p (&lra_reg_info[regno].insn_bitmap,
 				INSN_UID (next_insn)))
@@ -1746,6 +1747,7 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish)
 		{
 		  second_insn = next_insn;
 		  n--;
+		  next_insn = NULL;
 		}
 	    }
 	}


When it comes to the v850 testcase, one of them just moves the infinite
loop to lra(), as we never get past while(fails_p); the other hits
lra-assigns.cc:lra_assign (bool&)'s

  if (flag_checking
      && (lra_assignment_iter_after_spill
	  > LRA_MAX_ASSIGNMENT_ITERATION_NUMBER))
    internal_error
      ("maximum number of LRA assignment passes is achieved (%d)",
       LRA_MAX_ASSIGNMENT_ITERATION_NUMBER);

which would loop indefinitely too without flag_checking.

Neither solves the v850 problem, only restoring the clobber does,
because then, with shorter live ranges, allocation succeeds for the
reloads, and we don't even try to split -> spill their pseudos.

Would any of these patchlets make sense to pursue regardless?

Ok to revert commit r12-5852-g50e8b0c9bca6cdc57804f860ec5311b641753fbb

I'm going to get back to the drawing board as to pr103302, since the
problem there will likely resurface, possibly also on v850.


diff --git a/gcc/expr.cc b/gcc/expr.cc
index 63a4aa838dec0..b6ed54983fabf 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3929,7 +3929,7 @@ emit_move_multi_word (machine_mode mode, rtx x, rtx y)
      hard regs shouldn't appear here except as return values.
      We never want to emit such a clobber after reload.  */
   if (x != y
-      && ! (lra_in_progress || reload_in_progress || reload_completed)
+      && ! (reload_in_progress || reload_completed)
       && need_clobber != 0)
     emit_clobber (x);
 


-- 
Alexandre Oliva, happy hacker                https://FSFLA.org/blogs/lxo/
   Free Software Activist                       GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>

next prev parent reply	other threads:[~2022-02-18 23:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08  5:37 Alexandre Oliva
2021-12-08 23:12 ` Jeff Law
2021-12-09  2:25   ` Alexandre Oliva
2021-12-09  4:08   ` Alexandre Oliva
2021-12-09  6:03     ` Jeff Law
2021-12-15  8:22       ` Alexandre Oliva
2021-12-15 16:00         ` Jeff Law
2022-02-18 23:27           ` Alexandre Oliva [this message]
2022-02-21  7:13             ` Richard Biener
2022-02-23 22:39               ` Alexandre Oliva
2022-03-01 20:15                 ` Alexandre Oliva
2022-03-02 12:25                   ` Alexandre Oliva
2022-03-02 14:21                     ` Vladimir Makarov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=orbkz37o2u.fsf@lxoliva.fsfla.org \
    --to=oliva@adacore.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=vmakarov@redhat.com \
    --cc=wilson@gcc.gnu.org \
    --cc=zsojka@seznam.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).