public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Hans-Peter Nilsson <hp@axis.com>
Subject: Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
Date: Thu, 20 Aug 2020 09:30:56 +0100	[thread overview]
Message-ID: <mptlfi9hg73.fsf@arm.com> (raw)
In-Reply-To: <202008140527.07E5RrHL029246@ignucius.se.axis.com> (Hans-Peter Nilsson via Gcc-patches's message of "Fri, 14 Aug 2020 07:27:53 +0200")

Anything I once knew about reorg.c has long since faded away, but since
noone else has reviewed it…

Do you know what guarantees that REG_DEAD and REG_UNUSED notes are
reliable during reorg.c?  It was written at a time when passes were
expected to keep the notes up-to-date, but that's not true these days.
My worry is that they might be mostly right, but just stale enough
to be harmful in corner cases.

Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Originally I thought to bootstrap this patch on MIPS and SPARC
> since they're both delayed-branch-slot targets but I
> reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> seems only visium and CRIS has this feature set, and I see no
> trace of visium in neither newlib nor the simulator next to
> glibc.  So, I just tested cris-elf.
>
> This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> fillers using a method similar to that in commit 33c2207d3fda,
> where care was taken for fill_simple_delay_slots to allow such
> insns when scanning for delay-slot fillers *backwards* (before
> the insn).
>
> A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> For cc0 targets, insns don't mention clobbering cc0, so the
> clobbers are mentioned in the "resources" only as a special
> entity and only for compare-insns and branches, where the cc0
> value matters.
>
> In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> the register liveness detection in reorg.c / resource.c treats
> that as a blocker (for other insns mentioning it, i.e. most)
> when looking for delay-slot-filling candidates.  This means that
> when comparing core and performance for a delay-slot cc0 target
> before and after the de-cc0 conversion, the inability to fill a
> delay slot after conversion manifests as a regression.  This was
> one such case, for CRIS, with random_bitstring in
> gcc.c-torture/execute/arith-rand-ll.c as well as the target
> libgcc division function.
>
> After this, all known performance regressions compared to cc0
> are fixed.
>
> Ok to commit?
>
> gcc:
> 	PR target/93372
> 	* reorg.c (fill_slots_from_thread): Allow trial insns that clobber
> 	TARGET_FLAGS_REGNUM as delay-slot fillers.
>
> gcc/testsuite:
> 	PR target/93372
> 	* gcc.target/cris/pr93372-47.c: New test.
> ---
>  gcc/reorg.c                                | 37 +++++++++++++++++++++-
>  gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 ++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c
>
> diff --git a/gcc/reorg.c b/gcc/reorg.c
> index dfd7494bf..83161caa0 100644
> --- a/gcc/reorg.c
> +++ b/gcc/reorg.c
> @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
>    CLEAR_RESOURCE (&needed);
>    CLEAR_RESOURCE (&set);
>  
> +  /* Handle the flags register specially, to be able to accept a
> +     candidate that clobbers it.  See also fill_simple_delay_slots.  */
> +  bool filter_flags
> +    = (slots_to_fill == 1
> +       && targetm.flags_regnum != INVALID_REGNUM
> +       && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
> +  struct resources fset;
> +  struct resources flags_res;
> +  if (filter_flags)
> +    {
> +      CLEAR_RESOURCE (&fset);
> +      CLEAR_RESOURCE (&flags_res);
> +      SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
> +    }
> +
>    /* If we do not own this thread, we must stop as soon as we find
>       something that we can't put in a delay slot, since all we can do
>       is branch into THREAD at a later point.  Therefore, labels stop
> @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
>        /* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
>  	 don't separate or copy insns that set and use CC0.  */
>        if (! insn_references_resource_p (trial, &set, true)
> -	  && ! insn_sets_resource_p (trial, &set, true)
> +	  && ! insn_sets_resource_p (trial, filter_flags ? &fset : &set, true)
>  	  && ! insn_sets_resource_p (trial, &needed, true)
> +	  /* If we're handling sets to the flags register specially, we
> +	     only allow an insn into a delay-slot, if it either:
> +	     - doesn't set the flags register,
> +	     - the "set" of the flags register isn't used (clobbered),
> +	     - insns between the delay-slot insn and the trial-insn
> +	     as accounted in "set", have not affected the flags register.  */
> +	  && (! filter_flags
> +	      || ! insn_sets_resource_p (trial, &flags_res, true)
> +	      || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
> +	      || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
>  	  && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
>  			      && (! own_thread || ! sets_cc0_p (pat)))))
>  	  && ! can_throw_internal (trial))
> @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
>        lose = 1;
>        mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
>        mark_referenced_resources (trial, &needed, true);
> +      if (filter_flags)
> +	{
> +	  mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> +
> +	  /* Groups of flags-register setters with users should not
> +	     affect opportunities to move flags-register-setting insns
> +	     (clobbers) into the delay-slot.  */
> +	  CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);

If we do this, what stops us from trying to move a flags-register user
ahead of the associated setter, when the user doesn't itself set the
flags register?  Feels like there should be some test involving
insn_references_resource_p (trial, &flags_res, true) in the
(! filter_flags || …) condition above.

Thanks,
Richard

> +	  CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> +	}
>  
>        /* Ensure we don't put insns between the setting of cc and the comparison
>  	 by moving a setting of cc into an earlier delay slot since these insns
> diff --git a/gcc/testsuite/gcc.target/cris/pr93372-47.c b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> new file mode 100644
> index 000000000..8d80ae6b4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=v10" } */
> +/* { dg-final { scan-assembler-times {\tnop} 1 } } */
> +
> +/* A somewhat brittle test-case, checking that we have (only) one
> +   unfilled delay-slot in random_bitstring: there might be none or two
> +   or more, and general improvements may lead to unfilled delay-slots.
> +   When the scan-assembler-times directive regresses, re-run
> +   gcc.c-torture/execute/arith-rand-ll.c, check cycle-level
> +   execution-time regressions in random_bitstring and take appropriate
> +   action.  */
> +
> +static long long
> +simple_rand ()
> +{
> +  static unsigned long long seed = 47114711;
> +  unsigned long long this = seed * 1103515245 + 12345;
> +  seed = this;
> +  return this >> 8;
> +}
> +
> +unsigned long long int
> +random_bitstring ()
> +{
> +  unsigned long long int x;
> +  int n_bits;
> +  long long ran;
> +  int tot_bits = 0;
> +
> +  x = 0;
> +  for (;;)
> +    {
> +      ran = simple_rand ();
> +      n_bits = (ran >> 1) % 16;
> +      tot_bits += n_bits;
> +
> +      if (n_bits == 0)
> +	return x;
> +      else
> +	{
> +	  x <<= n_bits;
> +	  if (ran & 1)
> +	    x |= (1 << n_bits) - 1;
> +
> +	  if (tot_bits > 8 * sizeof (long long) + 6)
> +	    return x;
> +	}
> +    }
> +}

  reply	other threads:[~2020-08-20  8:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-14  5:27 Hans-Peter Nilsson
2020-08-20  8:30 ` Richard Sandiford [this message]
2020-08-20 17:15   ` Hans-Peter Nilsson
2020-08-20 19:04     ` Richard Sandiford
2020-08-24 18:05 ` Jeff Law
2020-09-11 11:09 ` Eric Botcazou
2020-09-11 11:24   ` Hans-Peter Nilsson
2020-09-11 11:28     ` Hans-Peter Nilsson
2020-09-11 11:54   ` Hans-Peter Nilsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptlfi9hg73.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hp@axis.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).