From: Hongtao Liu <crazylht@gmail.com>
To: Jakub Jelinek <jakub@redhat.com>, Uros Bizjak <ubizjak@gmail.com>,
Hongtao Liu <crazylht@gmail.com>,
GCC Patches <gcc-patches@gcc.gnu.org>,
"H. J. Lu" <hjl.tools@gmail.com>,
Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH] [i386] Fix _mm256_zeroupper to notify LRA that vzeroupper will kill sse registers. [PR target/82735]
Date: Mon, 17 May 2021 16:44:30 +0800 [thread overview]
Message-ID: <CAMZc-bxb_15p-tLjY-UfgP9EyFmxhXW5mMJFbu02V+QXbimoFQ@mail.gmail.com> (raw)
In-Reply-To: <CAMZc-bweYecznWAmDk5kZrHOgr2zxQqAf=9jVSizF+AFQRqL2A@mail.gmail.com>
On Fri, May 14, 2021 at 10:27 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, May 13, 2021 at 7:52 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Jakub Jelinek <jakub@redhat.com> writes:
> > > On Thu, May 13, 2021 at 12:32:26PM +0100, Richard Sandiford wrote:
> > >> Jakub Jelinek <jakub@redhat.com> writes:
> > >> > On Thu, May 13, 2021 at 11:43:19AM +0200, Uros Bizjak wrote:
> > >> >> > > Bootstrapped and regtested on X86_64-linux-gnu{-m32,}
> > >> >> > > Ok for trunk?
> > >> >> >
> > >> >> > Some time ago a support for CLOBBER_HIGH RTX was added (and later
> > >> >> > removed for some reason). Perhaps we could resurrect the patch for the
> > >> >> > purpose of ferrying 128bit modes via vzeroupper RTX?
> > >> >>
> > >> >> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-11/msg01325.html
> > >> >
> > >> > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01468.html
> > >> > is where it got removed, CCing Richard.
> > >>
> > >> Yeah. Initially clobber_high seemed like the best appraoch for
> > >> handling the tlsdesc thing, but in practice it was too difficult
> > >> to shoe-horn the concept in after the fact, when so much rtl
> > >> infrastructure wasn't prepared to deal with it. The old support
> > >> didn't handle all cases and passes correctly, and handled others
> > >> suboptimally.
> > >>
> > >> I think it would be worth using the same approach as
> > >> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01466.html for
> > >> vzeroupper: represent the instructions as call_insns in which the
> > >> call has a special vzeroupper ABI. I think that's likely to lead
> > >> to better code than clobber_high would (or at least, it did for tlsdesc).
>
> From an implementation perspective, I guess you're meaning we should
> implement TARGET_INSN_CALLEE_ABI and TARGET_FNTYPE_ABI in the i386
> backend.
>
When I implemented the vzeroupper pattern as call_insn and defined
TARGET_INSN_CALLEE_ABI for it, I got several failures. they're related
to 2 parts
1. requires_stack_frame_p return true for vzeroupper which should be false.
2. in subst_stack_regs, vzeroupper shouldn't kill arguments
I've tried a rough patch like below, it works for those failures,
unfortunately, I don't have an arm machine to test, so I want to ask
would the below change break something in the arm backend?
modified gcc/reg-stack.c
@@ -174,6 +174,7 @@
#include "reload.h"
#include "tree-pass.h"
#include "rtl-iter.h"
+#include "function-abi.h"
#ifdef STACK_REGS
@@ -2385,7 +2386,7 @@ subst_stack_regs (rtx_insn *insn, stack_ptr regstack)
bool control_flow_insn_deleted = false;
int i;
- if (CALL_P (insn))
+ if (CALL_P (insn) && insn_callee_abi (insn).id () == 0)
{
int top = regstack->top;
modified gcc/shrink-wrap.c
@@ -58,7 +58,12 @@ requires_stack_frame_p (rtx_insn *insn,
HARD_REG_SET prologue_used,
unsigned regno;
if (CALL_P (insn))
- return !SIBLING_CALL_P (insn);
+ {
+ if (insn_callee_abi (insn).id() != 0)
+ return false;
+ else
+ return !SIBLING_CALL_P (insn);
+ }
/* We need a frame to get the unique CFA expected by the unwinder. */
if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn))
> > >
> > > Perhaps a magic call_insn that is split post-reload into a normal insn
> > > with the sets then?
> >
> > I'd be tempted to treat it is a call_insn throughout. The unspec_volatile
> > means that we can't move the instruction, so converting a call_insn to an
> > insn isn't likely to help from that point of view. The sets are also
> > likely to be handled suboptimally compared to the more accurate register
> > information attached to the call: all code that handles calls has to be
> > prepared to deal with partial clobbers, whereas most code dealing with
> > sets will assume that the set does useful work, and that the rhs of the
> > set is live.
> >
> > Thanks,
> > Richard
> >
>
>
> --
> BR,
> Hongtao
--
BR,
Hongtao
next prev parent reply other threads:[~2021-05-17 8:40 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-13 9:23 Hongtao Liu
2021-05-13 9:40 ` Uros Bizjak
2021-05-13 9:43 ` Uros Bizjak
2021-05-13 9:54 ` Jakub Jelinek
2021-05-13 11:32 ` Richard Sandiford
2021-05-13 11:37 ` Jakub Jelinek
2021-05-13 11:52 ` Richard Sandiford
2021-05-14 2:27 ` Hongtao Liu
2021-05-17 8:44 ` Hongtao Liu [this message]
2021-05-17 9:56 ` Richard Sandiford
2021-05-18 13:12 ` Hongtao Liu
2021-05-18 15:18 ` Richard Sandiford
2021-05-25 6:04 ` Hongtao Liu
2021-05-25 6:30 ` Hongtao Liu
2021-05-27 5:07 ` Hongtao Liu
2021-05-27 7:05 ` Uros Bizjak
2021-06-01 2:24 ` Hongtao Liu
2021-06-03 6:54 ` [PATCH 1/2] CALL_INSN may not be a real function call liuhongt
2021-06-03 6:54 ` [PATCH 2/2] Fix _mm256_zeroupper by representing the instructions as call_insns in which the call has a special vzeroupper ABI liuhongt
2021-06-04 2:56 ` Hongtao Liu
2021-06-04 6:26 ` Uros Bizjak
2021-06-04 6:34 ` Hongtao Liu
2021-06-07 19:04 ` [PATCH] x86: Don't compile pr82735-[345].c for x32 H.J. Lu
2021-06-04 2:55 ` [PATCH 1/2] CALL_INSN may not be a real function call Hongtao Liu
2021-06-04 7:50 ` Jakub Jelinek
2021-07-05 23:30 ` Segher Boessenkool
2021-07-06 0:03 ` Jeff Law
2021-07-06 1:49 ` Hongtao Liu
2021-07-07 14:55 ` Segher Boessenkool
2021-07-07 17:56 ` Jeff Law
2021-07-06 1:37 ` Hongtao Liu
2021-07-07 2:44 ` Hongtao Liu
2021-07-07 8:15 ` Richard Biener
2021-07-07 14:52 ` Segher Boessenkool
2021-07-07 15:23 ` Hongtao Liu
2021-07-07 23:42 ` Segher Boessenkool
2021-07-08 4:14 ` Hongtao Liu
2021-07-07 15:32 ` Hongtao Liu
2021-07-07 23:54 ` Segher Boessenkool
2021-07-09 7:20 ` Hongtao Liu
2021-07-07 15:52 ` Hongtao Liu
2021-05-27 7:20 ` [PATCH] [i386] Fix _mm256_zeroupper to notify LRA that vzeroupper will kill sse registers. [PR target/82735] Jakub Jelinek
2021-05-27 10:50 ` Richard Sandiford
2021-06-01 2:22 ` Hongtao Liu
2021-06-01 2:25 ` Hongtao Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMZc-bxb_15p-tLjY-UfgP9EyFmxhXW5mMJFbu02V+QXbimoFQ@mail.gmail.com \
--to=crazylht@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hjl.tools@gmail.com \
--cc=jakub@redhat.com \
--cc=richard.sandiford@arm.com \
--cc=ubizjak@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).