From: Richard Sandiford <richard.sandiford@arm.com>
To: Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Hans-Peter Nilsson <hp@axis.com>
Subject: Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
Date: Thu, 20 Aug 2020 09:30:56 +0100 [thread overview]
Message-ID: <mptlfi9hg73.fsf@arm.com> (raw)
In-Reply-To: <202008140527.07E5RrHL029246@ignucius.se.axis.com> (Hans-Peter Nilsson via Gcc-patches's message of "Fri, 14 Aug 2020 07:27:53 +0200")
Anything I once knew about reorg.c has long since faded away, but since
noone else has reviewed it…
Do you know what guarantees that REG_DEAD and REG_UNUSED notes are
reliable during reorg.c? It was written at a time when passes were
expected to keep the notes up-to-date, but that's not true these days.
My worry is that they might be mostly right, but just stale enough
to be harmful in corner cases.
Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Originally I thought to bootstrap this patch on MIPS and SPARC
> since they're both delayed-branch-slot targets but I
> reconsidered, as neither is a TARGET_FLAGS_REGNUM target. It
> seems only visium and CRIS has this feature set, and I see no
> trace of visium in neither newlib nor the simulator next to
> glibc. So, I just tested cris-elf.
>
> This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> fillers using a method similar to that in commit 33c2207d3fda,
> where care was taken for fill_simple_delay_slots to allow such
> insns when scanning for delay-slot fillers *backwards* (before
> the insn).
>
> A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> For cc0 targets, insns don't mention clobbering cc0, so the
> clobbers are mentioned in the "resources" only as a special
> entity and only for compare-insns and branches, where the cc0
> value matters.
>
> In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> the register liveness detection in reorg.c / resource.c treats
> that as a blocker (for other insns mentioning it, i.e. most)
> when looking for delay-slot-filling candidates. This means that
> when comparing core and performance for a delay-slot cc0 target
> before and after the de-cc0 conversion, the inability to fill a
> delay slot after conversion manifests as a regression. This was
> one such case, for CRIS, with random_bitstring in
> gcc.c-torture/execute/arith-rand-ll.c as well as the target
> libgcc division function.
>
> After this, all known performance regressions compared to cc0
> are fixed.
>
> Ok to commit?
>
> gcc:
> PR target/93372
> * reorg.c (fill_slots_from_thread): Allow trial insns that clobber
> TARGET_FLAGS_REGNUM as delay-slot fillers.
>
> gcc/testsuite:
> PR target/93372
> * gcc.target/cris/pr93372-47.c: New test.
> ---
> gcc/reorg.c | 37 +++++++++++++++++++++-
> gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 ++++++++++++++++++++++++++++++
> 2 files changed, 85 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c
>
> diff --git a/gcc/reorg.c b/gcc/reorg.c
> index dfd7494bf..83161caa0 100644
> --- a/gcc/reorg.c
> +++ b/gcc/reorg.c
> @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
> CLEAR_RESOURCE (&needed);
> CLEAR_RESOURCE (&set);
>
> + /* Handle the flags register specially, to be able to accept a
> + candidate that clobbers it. See also fill_simple_delay_slots. */
> + bool filter_flags
> + = (slots_to_fill == 1
> + && targetm.flags_regnum != INVALID_REGNUM
> + && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
> + struct resources fset;
> + struct resources flags_res;
> + if (filter_flags)
> + {
> + CLEAR_RESOURCE (&fset);
> + CLEAR_RESOURCE (&flags_res);
> + SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
> + }
> +
> /* If we do not own this thread, we must stop as soon as we find
> something that we can't put in a delay slot, since all we can do
> is branch into THREAD at a later point. Therefore, labels stop
> @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
> /* If TRIAL conflicts with the insns ahead of it, we lose. Also,
> don't separate or copy insns that set and use CC0. */
> if (! insn_references_resource_p (trial, &set, true)
> - && ! insn_sets_resource_p (trial, &set, true)
> + && ! insn_sets_resource_p (trial, filter_flags ? &fset : &set, true)
> && ! insn_sets_resource_p (trial, &needed, true)
> + /* If we're handling sets to the flags register specially, we
> + only allow an insn into a delay-slot, if it either:
> + - doesn't set the flags register,
> + - the "set" of the flags register isn't used (clobbered),
> + - insns between the delay-slot insn and the trial-insn
> + as accounted in "set", have not affected the flags register. */
> + && (! filter_flags
> + || ! insn_sets_resource_p (trial, &flags_res, true)
> + || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
> + || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
> && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
> && (! own_thread || ! sets_cc0_p (pat)))))
> && ! can_throw_internal (trial))
> @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
> lose = 1;
> mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
> mark_referenced_resources (trial, &needed, true);
> + if (filter_flags)
> + {
> + mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> +
> + /* Groups of flags-register setters with users should not
> + affect opportunities to move flags-register-setting insns
> + (clobbers) into the delay-slot. */
> + CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
If we do this, what stops us from trying to move a flags-register user
ahead of the associated setter, when the user doesn't itself set the
flags register? Feels like there should be some test involving
insn_references_resource_p (trial, &flags_res, true) in the
(! filter_flags || …) condition above.
Thanks,
Richard
> + CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> + }
>
> /* Ensure we don't put insns between the setting of cc and the comparison
> by moving a setting of cc into an earlier delay slot since these insns
> diff --git a/gcc/testsuite/gcc.target/cris/pr93372-47.c b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> new file mode 100644
> index 000000000..8d80ae6b4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=v10" } */
> +/* { dg-final { scan-assembler-times {\tnop} 1 } } */
> +
> +/* A somewhat brittle test-case, checking that we have (only) one
> + unfilled delay-slot in random_bitstring: there might be none or two
> + or more, and general improvements may lead to unfilled delay-slots.
> + When the scan-assembler-times directive regresses, re-run
> + gcc.c-torture/execute/arith-rand-ll.c, check cycle-level
> + execution-time regressions in random_bitstring and take appropriate
> + action. */
> +
> +static long long
> +simple_rand ()
> +{
> + static unsigned long long seed = 47114711;
> + unsigned long long this = seed * 1103515245 + 12345;
> + seed = this;
> + return this >> 8;
> +}
> +
> +unsigned long long int
> +random_bitstring ()
> +{
> + unsigned long long int x;
> + int n_bits;
> + long long ran;
> + int tot_bits = 0;
> +
> + x = 0;
> + for (;;)
> + {
> + ran = simple_rand ();
> + n_bits = (ran >> 1) % 16;
> + tot_bits += n_bits;
> +
> + if (n_bits == 0)
> + return x;
> + else
> + {
> + x <<= n_bits;
> + if (ran & 1)
> + x |= (1 << n_bits) - 1;
> +
> + if (tot_bits > 8 * sizeof (long long) + 6)
> + return x;
> + }
> + }
> +}
next prev parent reply other threads:[~2020-08-20 8:30 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-14 5:27 Hans-Peter Nilsson
2020-08-20 8:30 ` Richard Sandiford [this message]
2020-08-20 17:15 ` Hans-Peter Nilsson
2020-08-20 19:04 ` Richard Sandiford
2020-08-24 18:05 ` Jeff Law
2020-09-11 11:09 ` Eric Botcazou
2020-09-11 11:24 ` Hans-Peter Nilsson
2020-09-11 11:28 ` Hans-Peter Nilsson
2020-09-11 11:54 ` Hans-Peter Nilsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mptlfi9hg73.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hp@axis.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).