public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
@ 2020-08-14  5:27 Hans-Peter Nilsson
  2020-08-20  8:30 ` Richard Sandiford
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Hans-Peter Nilsson @ 2020-08-14  5:27 UTC (permalink / raw)
  To: gcc-patches

Originally I thought to bootstrap this patch on MIPS and SPARC
since they're both delayed-branch-slot targets but I
reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
seems only visium and CRIS has this feature set, and I see no
trace of visium in neither newlib nor the simulator next to
glibc.  So, I just tested cris-elf.

This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
fillers using a method similar to that in commit 33c2207d3fda,
where care was taken for fill_simple_delay_slots to allow such
insns when scanning for delay-slot fillers *backwards* (before
the insn).

A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
For cc0 targets, insns don't mention clobbering cc0, so the
clobbers are mentioned in the "resources" only as a special
entity and only for compare-insns and branches, where the cc0
value matters.

In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
the register liveness detection in reorg.c / resource.c treats
that as a blocker (for other insns mentioning it, i.e. most)
when looking for delay-slot-filling candidates.  This means that
when comparing core and performance for a delay-slot cc0 target
before and after the de-cc0 conversion, the inability to fill a
delay slot after conversion manifests as a regression.  This was
one such case, for CRIS, with random_bitstring in
gcc.c-torture/execute/arith-rand-ll.c as well as the target
libgcc division function.

After this, all known performance regressions compared to cc0
are fixed.

Ok to commit?

gcc:
	PR target/93372
	* reorg.c (fill_slots_from_thread): Allow trial insns that clobber
	TARGET_FLAGS_REGNUM as delay-slot fillers.

gcc/testsuite:
	PR target/93372
	* gcc.target/cris/pr93372-47.c: New test.
---
 gcc/reorg.c                                | 37 +++++++++++++++++++++-
 gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 ++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c

diff --git a/gcc/reorg.c b/gcc/reorg.c
index dfd7494bf..83161caa0 100644
--- a/gcc/reorg.c
+++ b/gcc/reorg.c
@@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
   CLEAR_RESOURCE (&needed);
   CLEAR_RESOURCE (&set);
 
+  /* Handle the flags register specially, to be able to accept a
+     candidate that clobbers it.  See also fill_simple_delay_slots.  */
+  bool filter_flags
+    = (slots_to_fill == 1
+       && targetm.flags_regnum != INVALID_REGNUM
+       && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
+  struct resources fset;
+  struct resources flags_res;
+  if (filter_flags)
+    {
+      CLEAR_RESOURCE (&fset);
+      CLEAR_RESOURCE (&flags_res);
+      SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
+    }
+
   /* If we do not own this thread, we must stop as soon as we find
      something that we can't put in a delay slot, since all we can do
      is branch into THREAD at a later point.  Therefore, labels stop
@@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
       /* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
 	 don't separate or copy insns that set and use CC0.  */
       if (! insn_references_resource_p (trial, &set, true)
-	  && ! insn_sets_resource_p (trial, &set, true)
+	  && ! insn_sets_resource_p (trial, filter_flags ? &fset : &set, true)
 	  && ! insn_sets_resource_p (trial, &needed, true)
+	  /* If we're handling sets to the flags register specially, we
+	     only allow an insn into a delay-slot, if it either:
+	     - doesn't set the flags register,
+	     - the "set" of the flags register isn't used (clobbered),
+	     - insns between the delay-slot insn and the trial-insn
+	     as accounted in "set", have not affected the flags register.  */
+	  && (! filter_flags
+	      || ! insn_sets_resource_p (trial, &flags_res, true)
+	      || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
+	      || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
 	  && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
 			      && (! own_thread || ! sets_cc0_p (pat)))))
 	  && ! can_throw_internal (trial))
@@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
       lose = 1;
       mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
       mark_referenced_resources (trial, &needed, true);
+      if (filter_flags)
+	{
+	  mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
+
+	  /* Groups of flags-register setters with users should not
+	     affect opportunities to move flags-register-setting insns
+	     (clobbers) into the delay-slot.  */
+	  CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
+	  CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
+	}
 
       /* Ensure we don't put insns between the setting of cc and the comparison
 	 by moving a setting of cc into an earlier delay slot since these insns
diff --git a/gcc/testsuite/gcc.target/cris/pr93372-47.c b/gcc/testsuite/gcc.target/cris/pr93372-47.c
new file mode 100644
index 000000000..8d80ae6b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/cris/pr93372-47.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=v10" } */
+/* { dg-final { scan-assembler-times {\tnop} 1 } } */
+
+/* A somewhat brittle test-case, checking that we have (only) one
+   unfilled delay-slot in random_bitstring: there might be none or two
+   or more, and general improvements may lead to unfilled delay-slots.
+   When the scan-assembler-times directive regresses, re-run
+   gcc.c-torture/execute/arith-rand-ll.c, check cycle-level
+   execution-time regressions in random_bitstring and take appropriate
+   action.  */
+
+static long long
+simple_rand ()
+{
+  static unsigned long long seed = 47114711;
+  unsigned long long this = seed * 1103515245 + 12345;
+  seed = this;
+  return this >> 8;
+}
+
+unsigned long long int
+random_bitstring ()
+{
+  unsigned long long int x;
+  int n_bits;
+  long long ran;
+  int tot_bits = 0;
+
+  x = 0;
+  for (;;)
+    {
+      ran = simple_rand ();
+      n_bits = (ran >> 1) % 16;
+      tot_bits += n_bits;
+
+      if (n_bits == 0)
+	return x;
+      else
+	{
+	  x <<= n_bits;
+	  if (ran & 1)
+	    x |= (1 << n_bits) - 1;
+
+	  if (tot_bits > 8 * sizeof (long long) + 6)
+	    return x;
+	}
+    }
+}
-- 
2.11.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
  2020-08-14  5:27 reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets Hans-Peter Nilsson
@ 2020-08-20  8:30 ` Richard Sandiford
  2020-08-20 17:15   ` Hans-Peter Nilsson
  2020-08-24 18:05 ` Jeff Law
  2020-09-11 11:09 ` Eric Botcazou
  2 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-08-20  8:30 UTC (permalink / raw)
  To: Hans-Peter Nilsson via Gcc-patches; +Cc: Hans-Peter Nilsson

Anything I once knew about reorg.c has long since faded away, but since
noone else has reviewed it…

Do you know what guarantees that REG_DEAD and REG_UNUSED notes are
reliable during reorg.c?  It was written at a time when passes were
expected to keep the notes up-to-date, but that's not true these days.
My worry is that they might be mostly right, but just stale enough
to be harmful in corner cases.

Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Originally I thought to bootstrap this patch on MIPS and SPARC
> since they're both delayed-branch-slot targets but I
> reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> seems only visium and CRIS has this feature set, and I see no
> trace of visium in neither newlib nor the simulator next to
> glibc.  So, I just tested cris-elf.
>
> This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> fillers using a method similar to that in commit 33c2207d3fda,
> where care was taken for fill_simple_delay_slots to allow such
> insns when scanning for delay-slot fillers *backwards* (before
> the insn).
>
> A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> For cc0 targets, insns don't mention clobbering cc0, so the
> clobbers are mentioned in the "resources" only as a special
> entity and only for compare-insns and branches, where the cc0
> value matters.
>
> In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> the register liveness detection in reorg.c / resource.c treats
> that as a blocker (for other insns mentioning it, i.e. most)
> when looking for delay-slot-filling candidates.  This means that
> when comparing core and performance for a delay-slot cc0 target
> before and after the de-cc0 conversion, the inability to fill a
> delay slot after conversion manifests as a regression.  This was
> one such case, for CRIS, with random_bitstring in
> gcc.c-torture/execute/arith-rand-ll.c as well as the target
> libgcc division function.
>
> After this, all known performance regressions compared to cc0
> are fixed.
>
> Ok to commit?
>
> gcc:
> 	PR target/93372
> 	* reorg.c (fill_slots_from_thread): Allow trial insns that clobber
> 	TARGET_FLAGS_REGNUM as delay-slot fillers.
>
> gcc/testsuite:
> 	PR target/93372
> 	* gcc.target/cris/pr93372-47.c: New test.
> ---
>  gcc/reorg.c                                | 37 +++++++++++++++++++++-
>  gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 ++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c
>
> diff --git a/gcc/reorg.c b/gcc/reorg.c
> index dfd7494bf..83161caa0 100644
> --- a/gcc/reorg.c
> +++ b/gcc/reorg.c
> @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
>    CLEAR_RESOURCE (&needed);
>    CLEAR_RESOURCE (&set);
>  
> +  /* Handle the flags register specially, to be able to accept a
> +     candidate that clobbers it.  See also fill_simple_delay_slots.  */
> +  bool filter_flags
> +    = (slots_to_fill == 1
> +       && targetm.flags_regnum != INVALID_REGNUM
> +       && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
> +  struct resources fset;
> +  struct resources flags_res;
> +  if (filter_flags)
> +    {
> +      CLEAR_RESOURCE (&fset);
> +      CLEAR_RESOURCE (&flags_res);
> +      SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
> +    }
> +
>    /* If we do not own this thread, we must stop as soon as we find
>       something that we can't put in a delay slot, since all we can do
>       is branch into THREAD at a later point.  Therefore, labels stop
> @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
>        /* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
>  	 don't separate or copy insns that set and use CC0.  */
>        if (! insn_references_resource_p (trial, &set, true)
> -	  && ! insn_sets_resource_p (trial, &set, true)
> +	  && ! insn_sets_resource_p (trial, filter_flags ? &fset : &set, true)
>  	  && ! insn_sets_resource_p (trial, &needed, true)
> +	  /* If we're handling sets to the flags register specially, we
> +	     only allow an insn into a delay-slot, if it either:
> +	     - doesn't set the flags register,
> +	     - the "set" of the flags register isn't used (clobbered),
> +	     - insns between the delay-slot insn and the trial-insn
> +	     as accounted in "set", have not affected the flags register.  */
> +	  && (! filter_flags
> +	      || ! insn_sets_resource_p (trial, &flags_res, true)
> +	      || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
> +	      || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
>  	  && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
>  			      && (! own_thread || ! sets_cc0_p (pat)))))
>  	  && ! can_throw_internal (trial))
> @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
>        lose = 1;
>        mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
>        mark_referenced_resources (trial, &needed, true);
> +      if (filter_flags)
> +	{
> +	  mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> +
> +	  /* Groups of flags-register setters with users should not
> +	     affect opportunities to move flags-register-setting insns
> +	     (clobbers) into the delay-slot.  */
> +	  CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);

If we do this, what stops us from trying to move a flags-register user
ahead of the associated setter, when the user doesn't itself set the
flags register?  Feels like there should be some test involving
insn_references_resource_p (trial, &flags_res, true) in the
(! filter_flags || …) condition above.

Thanks,
Richard

> +	  CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> +	}
>  
>        /* Ensure we don't put insns between the setting of cc and the comparison
>  	 by moving a setting of cc into an earlier delay slot since these insns
> diff --git a/gcc/testsuite/gcc.target/cris/pr93372-47.c b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> new file mode 100644
> index 000000000..8d80ae6b4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=v10" } */
> +/* { dg-final { scan-assembler-times {\tnop} 1 } } */
> +
> +/* A somewhat brittle test-case, checking that we have (only) one
> +   unfilled delay-slot in random_bitstring: there might be none or two
> +   or more, and general improvements may lead to unfilled delay-slots.
> +   When the scan-assembler-times directive regresses, re-run
> +   gcc.c-torture/execute/arith-rand-ll.c, check cycle-level
> +   execution-time regressions in random_bitstring and take appropriate
> +   action.  */
> +
> +static long long
> +simple_rand ()
> +{
> +  static unsigned long long seed = 47114711;
> +  unsigned long long this = seed * 1103515245 + 12345;
> +  seed = this;
> +  return this >> 8;
> +}
> +
> +unsigned long long int
> +random_bitstring ()
> +{
> +  unsigned long long int x;
> +  int n_bits;
> +  long long ran;
> +  int tot_bits = 0;
> +
> +  x = 0;
> +  for (;;)
> +    {
> +      ran = simple_rand ();
> +      n_bits = (ran >> 1) % 16;
> +      tot_bits += n_bits;
> +
> +      if (n_bits == 0)
> +	return x;
> +      else
> +	{
> +	  x <<= n_bits;
> +	  if (ran & 1)
> +	    x |= (1 << n_bits) - 1;
> +
> +	  if (tot_bits > 8 * sizeof (long long) + 6)
> +	    return x;
> +	}
> +    }
> +}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
  2020-08-20  8:30 ` Richard Sandiford
@ 2020-08-20 17:15   ` Hans-Peter Nilsson
  2020-08-20 19:04     ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Hans-Peter Nilsson @ 2020-08-20 17:15 UTC (permalink / raw)
  To: richard.sandiford; +Cc: gcc-patches, ebotcazou

> From: Richard Sandiford <richard.sandiford@arm.com>
> Date: Thu, 20 Aug 2020 10:30:56 +0200

> Anything I once knew about reorg.c has long since faded away, but since
> noone else has reviewed it...

Thanks.  I forgot to add PATCH and/or RFA: in the subject and
forgot to CC Eric, assuming he's interested (I did CC him as a
heads-up that this was coming, that's why I added you now, Eric).

> Do you know what guarantees that REG_DEAD and REG_UNUSED notes are
> reliable during reorg.c?  It was written at a time when passes were
> expected to keep the notes up-to-date, but that's not true these days.
> My worry is that they might be mostly right, but just stale enough
> to be harmful in corner cases.

They are depended upon in reorg.c as absolutely accurate (and
kept updated), and there's IMO enough empirical evidence that
they still are.  In this particular patch, I'm using it in the
same way Eric used it in 33c2207d3fda for
fill_simple_delay_slots (i.e. it would be similarly flawed), but
it's also used all over for the old-style dataflow analysis done
here.  We'd be seeing failing for those "corner cases" all
around delayed-branch targets if that didn't work.  I should be
able to use the existing machinery in this patch.

BTW, I happened to notice that bugs here are also somewhat more
visible than your ordinary wrong-result bug. :)

> Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > Originally I thought to bootstrap this patch on MIPS and SPARC
> > since they're both delayed-branch-slot targets but I
> > reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> > seems only visium and CRIS has this feature set, and I see no
> > trace of visium in neither newlib nor the simulator next to
> > glibc.  So, I just tested cris-elf.
> >
> > This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> > fillers using a method similar to that in commit 33c2207d3fda,
> > where care was taken for fill_simple_delay_slots to allow such
> > insns when scanning for delay-slot fillers *backwards* (before
> > the insn).
> >
> > A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> > For cc0 targets, insns don't mention clobbering cc0, so the
> > clobbers are mentioned in the "resources" only as a special
> > entity and only for compare-insns and branches, where the cc0
> > value matters.
> >
> > In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> > the register liveness detection in reorg.c / resource.c treats
> > that as a blocker (for other insns mentioning it, i.e. most)
> > when looking for delay-slot-filling candidates.  This means that
> > when comparing core and performance for a delay-slot cc0 target
> > before and after the de-cc0 conversion, the inability to fill a
> > delay slot after conversion manifests as a regression.  This was
> > one such case, for CRIS, with random_bitstring in
> > gcc.c-torture/execute/arith-rand-ll.c as well as the target
> > libgcc division function.
> >
> > After this, all known performance regressions compared to cc0
> > are fixed.
> >
> > Ok to commit?
> >
> > gcc:
> > 	PR target/93372
> > 	* reorg.c (fill_slots_from_thread): Allow trial insns that clobber
> > 	TARGET_FLAGS_REGNUM as delay-slot fillers.
> >
> > gcc/testsuite:
> > 	PR target/93372
> > 	* gcc.target/cris/pr93372-47.c: New test.
> > ---
> >  gcc/reorg.c                                | 37 +++++++++++++++++++++-
> >  gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 ++++++++++++++++++++++++++++++
> >  2 files changed, 85 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c
> >
> > diff --git a/gcc/reorg.c b/gcc/reorg.c
> > index dfd7494bf..83161caa0 100644
> > --- a/gcc/reorg.c
> > +++ b/gcc/reorg.c
> > @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
> >    CLEAR_RESOURCE (&needed);
> >    CLEAR_RESOURCE (&set);
> >  
> > +  /* Handle the flags register specially, to be able to accept a
> > +     candidate that clobbers it.  See also fill_simple_delay_slots.  */
> > +  bool filter_flags
> > +    = (slots_to_fill == 1
> > +       && targetm.flags_regnum != INVALID_REGNUM
> > +       && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
> > +  struct resources fset;
> > +  struct resources flags_res;
> > +  if (filter_flags)
> > +    {
> > +      CLEAR_RESOURCE (&fset);
> > +      CLEAR_RESOURCE (&flags_res);
> > +      SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
> > +    }
> > +
> >    /* If we do not own this thread, we must stop as soon as we find
> >       something that we can't put in a delay slot, since all we can do
> >       is branch into THREAD at a later point.  Therefore, labels stop
> > @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
> >        /* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
> >  	 don't separate or copy insns that set and use CC0.  */
> >        if (! insn_references_resource_p (trial, &set, true)
> > -	  && ! insn_sets_resource_p (trial, &set, true)
> > +	  && ! insn_sets_resource_p (trial, filter_flags ? &fset : &set, true)
> >  	  && ! insn_sets_resource_p (trial, &needed, true)
> > +	  /* If we're handling sets to the flags register specially, we
> > +	     only allow an insn into a delay-slot, if it either:
> > +	     - doesn't set the flags register,
> > +	     - the "set" of the flags register isn't used (clobbered),
> > +	     - insns between the delay-slot insn and the trial-insn
> > +	     as accounted in "set", have not affected the flags register.  */
> > +	  && (! filter_flags
> > +	      || ! insn_sets_resource_p (trial, &flags_res, true)
> > +	      || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
> > +	      || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
> >  	  && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
> >  			      && (! own_thread || ! sets_cc0_p (pat)))))
> >  	  && ! can_throw_internal (trial))
> > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
> >        lose = 1;
> >        mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
> >        mark_referenced_resources (trial, &needed, true);
> > +      if (filter_flags)
> > +	{
> > +	  mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> > +
> > +	  /* Groups of flags-register setters with users should not
> > +	     affect opportunities to move flags-register-setting insns
> > +	     (clobbers) into the delay-slot.  */
> > +	  CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> 
> If we do this, what stops us from trying to move a flags-register user
> ahead of the associated setter, when the user doesn't itself set the
> flags register?

First, all sets are still in set (set.regs) including any that
set the flags register between the insn with the delay-slot and
the trial.

(That's why it's separate from fset (fset.regs).  I used the
same naming scheme as in the named commit, but perhaps a better
name would be set_with_flags_filtered or something.)

>  Feels like there should be some test involving
> insn_references_resource_p (trial, &flags_res, true) in the
> (! filter_flags || ...) condition above.

There is: the "! insn_references_resource_p (trial, &set, true)"
(pre-existing in the context above the patch), so we don't move
anything that references anything set (only additional flags
register setters where the result is unused).

Ok now?

> Thanks,
> Richard

Thanks again for the review, much appreciated.

brgds, H-P

> > +	  CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> > +	}
> >  
> >        /* Ensure we don't put insns between the setting of cc and the comparison
> >  	 by moving a setting of cc into an earlier delay slot since these insns
> > diff --git a/gcc/testsuite/gcc.target/cris/pr93372-47.c b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> > new file mode 100644
> > index 000000000..8d80ae6b4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> > @@ -0,0 +1,49 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=v10" } */
> > +/* { dg-final { scan-assembler-times {\tnop} 1 } } */
> > +
> > +/* A somewhat brittle test-case, checking that we have (only) one
> > +   unfilled delay-slot in random_bitstring: there might be none or two
> > +   or more, and general improvements may lead to unfilled delay-slots.
> > +   When the scan-assembler-times directive regresses, re-run
> > +   gcc.c-torture/execute/arith-rand-ll.c, check cycle-level
> > +   execution-time regressions in random_bitstring and take appropriate
> > +   action.  */
> > +
> > +static long long
> > +simple_rand ()
> > +{
> > +  static unsigned long long seed = 47114711;
> > +  unsigned long long this = seed * 1103515245 + 12345;
> > +  seed = this;
> > +  return this >> 8;
> > +}
> > +
> > +unsigned long long int
> > +random_bitstring ()
> > +{
> > +  unsigned long long int x;
> > +  int n_bits;
> > +  long long ran;
> > +  int tot_bits = 0;
> > +
> > +  x = 0;
> > +  for (;;)
> > +    {
> > +      ran = simple_rand ();
> > +      n_bits = (ran >> 1) % 16;
> > +      tot_bits += n_bits;
> > +
> > +      if (n_bits == 0)
> > +	return x;
> > +      else
> > +	{
> > +	  x <<= n_bits;
> > +	  if (ran & 1)
> > +	    x |= (1 << n_bits) - 1;
> > +
> > +	  if (tot_bits > 8 * sizeof (long long) + 6)
> > +	    return x;
> > +	}
> > +    }
> > +}
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
  2020-08-20 17:15   ` Hans-Peter Nilsson
@ 2020-08-20 19:04     ` Richard Sandiford
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2020-08-20 19:04 UTC (permalink / raw)
  To: Hans-Peter Nilsson via Gcc-patches; +Cc: Hans-Peter Nilsson, ebotcazou

Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
>> >    CLEAR_RESOURCE (&needed);
>> >    CLEAR_RESOURCE (&set);
>> >  
>> > +  /* Handle the flags register specially, to be able to accept a
>> > +     candidate that clobbers it.  See also fill_simple_delay_slots.  */
>> > +  bool filter_flags
>> > +    = (slots_to_fill == 1
>> > +       && targetm.flags_regnum != INVALID_REGNUM
>> > +       && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
>> > +  struct resources fset;
>> > +  struct resources flags_res;
>> > +  if (filter_flags)
>> > +    {
>> > +      CLEAR_RESOURCE (&fset);
>> > +      CLEAR_RESOURCE (&flags_res);
>> > +      SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
>> > +    }
>> > +
>> >    /* If we do not own this thread, we must stop as soon as we find
>> >       something that we can't put in a delay slot, since all we can do
>> >       is branch into THREAD at a later point.  Therefore, labels stop
>> > @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
>> >        /* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
>> >  	 don't separate or copy insns that set and use CC0.  */
>> >        if (! insn_references_resource_p (trial, &set, true)
>> > -	  && ! insn_sets_resource_p (trial, &set, true)
>> > +	  && ! insn_sets_resource_p (trial, filter_flags ? &fset : &set, true)
>> >  	  && ! insn_sets_resource_p (trial, &needed, true)
>> > +	  /* If we're handling sets to the flags register specially, we
>> > +	     only allow an insn into a delay-slot, if it either:
>> > +	     - doesn't set the flags register,
>> > +	     - the "set" of the flags register isn't used (clobbered),
>> > +	     - insns between the delay-slot insn and the trial-insn
>> > +	     as accounted in "set", have not affected the flags register.  */
>> > +	  && (! filter_flags
>> > +	      || ! insn_sets_resource_p (trial, &flags_res, true)
>> > +	      || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
>> > +	      || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
>> >  	  && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
>> >  			      && (! own_thread || ! sets_cc0_p (pat)))))
>> >  	  && ! can_throw_internal (trial))
>> > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
>> >        lose = 1;
>> >        mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
>> >        mark_referenced_resources (trial, &needed, true);
>> > +      if (filter_flags)
>> > +	{
>> > +	  mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
>> > +
>> > +	  /* Groups of flags-register setters with users should not
>> > +	     affect opportunities to move flags-register-setting insns
>> > +	     (clobbers) into the delay-slot.  */
>> > +	  CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
>> 
>> If we do this, what stops us from trying to move a flags-register user
>> ahead of the associated setter, when the user doesn't itself set the
>> flags register?
>
> First, all sets are still in set (set.regs) including any that
> set the flags register between the insn with the delay-slot and
> the trial.
>
> (That's why it's separate from fset (fset.regs).  I used the
> same naming scheme as in the named commit, but perhaps a better
> name would be set_with_flags_filtered or something.)
>
>>  Feels like there should be some test involving
>> insn_references_resource_p (trial, &flags_res, true) in the
>> (! filter_flags || ...) condition above.
>
> There is: the "! insn_references_resource_p (trial, &set, true)"
> (pre-existing in the context above the patch), so we don't move
> anything that references anything set (only additional flags
> register setters where the result is unused).

Ah, right, I think I was getting them the wrong way around.

Looks OK to me, but give Eric 24 hrs to object/comment.
Like you say, he's better qualified to review this, I was just
stepping in because it looked like the patch had fallen through
the cracks.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
  2020-08-14  5:27 reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets Hans-Peter Nilsson
  2020-08-20  8:30 ` Richard Sandiford
@ 2020-08-24 18:05 ` Jeff Law
  2020-09-11 11:09 ` Eric Botcazou
  2 siblings, 0 replies; 9+ messages in thread
From: Jeff Law @ 2020-08-24 18:05 UTC (permalink / raw)
  To: Hans-Peter Nilsson, gcc-patches

On Fri, 2020-08-14 at 07:27 +0200, Hans-Peter Nilsson via Gcc-patches wrote:
> Originally I thought to bootstrap this patch on MIPS and SPARC
> since they're both delayed-branch-slot targets but I
> reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> seems only visium and CRIS has this feature set, and I see no
> trace of visium in neither newlib nor the simulator next to
> glibc.  So, I just tested cris-elf.
[ Sorry, been focused on non-upstream stuff for the last month+, so yea,
  some stuff is falling through the cracks. ]

Ibuild newlib on visium daily.  I also run the gcc testsuite, but using dummy
simulator.  So we're really just testing that the code generator doesn't fault
and the various dg-scan style tests.

http://3.14.90.209:8080/job/visium-elf/


> 
> This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> fillers using a method similar to that in commit 33c2207d3fda,
> where care was taken for fill_simple_delay_slots to allow such
> insns when scanning for delay-slot fillers *backwards* (before
> the insn).
> 
> A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> For cc0 targets, insns don't mention clobbering cc0, so the
> clobbers are mentioned in the "resources" only as a special
> entity and only for compare-insns and branches, where the cc0
> value matters.
> 
> In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> the register liveness detection in reorg.c / resource.c treats
> that as a blocker (for other insns mentioning it, i.e. most)
> when looking for delay-slot-filling candidates.  This means that
> when comparing core and performance for a delay-slot cc0 target
> before and after the de-cc0 conversion, the inability to fill a
> delay slot after conversion manifests as a regression.  This was
> one such case, for CRIS, with random_bitstring in
> gcc.c-torture/execute/arith-rand-ll.c as well as the target
> libgcc division function.
I wouldn't be surprised if H8 would have tripped over this as well.  Converting
it to REG_CC is still in-progress and a failure to fill a delay slot would likely
have been missed in my performance testing to-date (delay slot filling isn't
terribly important on the H8 thankfully).


> 
> After this, all known performance regressions compared to cc0
> are fixed.
> 
> Ok to commit?
> 
> gcc:
> 	PR target/93372
> 	* reorg.c (fill_slots_from_thread): Allow trial insns that clobber
> 	TARGET_FLAGS_REGNUM as delay-slot fillers.
> 
> gcc/testsuite:
> 	PR target/93372
> 	* gcc.target/cris/pr93372-47.c: New test.
It looks reasonable to me.

jeff
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
  2020-08-14  5:27 reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets Hans-Peter Nilsson
  2020-08-20  8:30 ` Richard Sandiford
  2020-08-24 18:05 ` Jeff Law
@ 2020-09-11 11:09 ` Eric Botcazou
  2020-09-11 11:24   ` Hans-Peter Nilsson
  2020-09-11 11:54   ` Hans-Peter Nilsson
  2 siblings, 2 replies; 9+ messages in thread
From: Eric Botcazou @ 2020-09-11 11:09 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: gcc-patches

> @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> condition, lose = 1;
>        mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
>        mark_referenced_resources (trial, &needed, true);
> +      if (filter_flags)
> +	{
> +	  mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> +
> +	  /* Groups of flags-register setters with users should not
> +	     affect opportunities to move flags-register-setting insns
> +	     (clobbers) into the delay-slot.  */
> +	  CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> +	  CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> +	}

Don't you need to put the new block before mark_referenced_resources (as I did 
in fill_simple_delay_slots) in case the needed insn reads the flags register?

-- 
Eric Botcazou



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
  2020-09-11 11:09 ` Eric Botcazou
@ 2020-09-11 11:24   ` Hans-Peter Nilsson
  2020-09-11 11:28     ` Hans-Peter Nilsson
  2020-09-11 11:54   ` Hans-Peter Nilsson
  1 sibling, 1 reply; 9+ messages in thread
From: Hans-Peter Nilsson @ 2020-09-11 11:24 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

> From: Eric Botcazou <botcazou@adacore.com>
> CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
> Date: Fri, 11 Sep 2020 13:09:48 +0200
> received-spf: None (smtp1.axis.com: no sender authenticity  information
>  available from domain of  postmaster@mail-wr1-f54.google.com) identity=helo;
>   client-ip=209.85.221.54; receiver=smtp1.axis.com;
>   envelope-from="botcazou@adacore.com";
>   x-sender="postmaster@mail-wr1-f54.google.com";
>   x-conformance=sidf_compatible
> Old-Content-Type: multipart/alternative;
> 	boundary="_000_2479570Ksvo8Q1nxZfomalhaut_"
> Content-Type: text/plain; charset=iso-8859-1
> 
> > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> > condition, lose = 1;
> >        mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
> >        mark_referenced_resources (trial, &needed, true);
> > +      if (filter_flags)
> > +	{
> > +	  mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> > +
> > +	  /* Groups of flags-register setters with users should not
> > +	     affect opportunities to move flags-register-setting insns
> > +	     (clobbers) into the delay-slot.  */
> > +	  CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> > +	  CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> > +	}
> 
> Don't you need to put the new block before mark_referenced_resources (as I did 
> in fill_simple_delay_slots) in case the needed insn reads the flags register?

It can't be needed, as the flags register is dead in the insn with the
delay-slot, as part of the filter_flags condition.

brgds, H-P

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
  2020-09-11 11:24   ` Hans-Peter Nilsson
@ 2020-09-11 11:28     ` Hans-Peter Nilsson
  0 siblings, 0 replies; 9+ messages in thread
From: Hans-Peter Nilsson @ 2020-09-11 11:28 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: botcazou, gcc-patches

> From: Hans-Peter Nilsson <hp@axis.com>
> Date: Fri, 11 Sep 2020 13:24:18 +0200

> > > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> > > condition, lose = 1;
> > >        mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
> > >        mark_referenced_resources (trial, &needed, true);
> > > +      if (filter_flags)
> > > +	{
> > > +	  mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> > > +
> > > +	  /* Groups of flags-register setters with users should not
> > > +	     affect opportunities to move flags-register-setting insns
> > > +	     (clobbers) into the delay-slot.  */
> > > +	  CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> > > +	  CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> > > +	}
> > 
> > Don't you need to put the new block before mark_referenced_resources (as I did 
> > in fill_simple_delay_slots) in case the needed insn reads the flags register?
> 
> It can't be needed, as the flags register is dead in the insn with the
> delay-slot, as part of the filter_flags condition.

Uh, I'll get back with a more thoughtful reply...

brgds, H-P

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
  2020-09-11 11:09 ` Eric Botcazou
  2020-09-11 11:24   ` Hans-Peter Nilsson
@ 2020-09-11 11:54   ` Hans-Peter Nilsson
  1 sibling, 0 replies; 9+ messages in thread
From: Hans-Peter Nilsson @ 2020-09-11 11:54 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

> From: Eric Botcazou <botcazou@adacore.com>
> Date: Fri, 11 Sep 2020 13:09:48 +0200

> > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> > condition, lose = 1;
> >        mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
> >        mark_referenced_resources (trial, &needed, true);
> > +      if (filter_flags)
> > +	{
> > +	  mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> > +
> > +	  /* Groups of flags-register setters with users should not
> > +	     affect opportunities to move flags-register-setting insns
> > +	     (clobbers) into the delay-slot.  */
> > +	  CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> > +	  CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> > +	}
> 
> Don't you need to put the new block before mark_referenced_resources (as I did 
> in fill_simple_delay_slots) in case the needed insn reads the flags register?

I think the simplest answer is that wouldn't be good; it'd leave the
"CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);" without
effect.

Note also fill_simple_delay_slots is moving insns from the before the
delay-slotting insn, while this is from a "tread" (or arm, or leg)
"after".

brgds, H-P

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-09-11 11:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-14  5:27 reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets Hans-Peter Nilsson
2020-08-20  8:30 ` Richard Sandiford
2020-08-20 17:15   ` Hans-Peter Nilsson
2020-08-20 19:04     ` Richard Sandiford
2020-08-24 18:05 ` Jeff Law
2020-09-11 11:09 ` Eric Botcazou
2020-09-11 11:24   ` Hans-Peter Nilsson
2020-09-11 11:28     ` Hans-Peter Nilsson
2020-09-11 11:54   ` Hans-Peter Nilsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).