From: Uros Bizjak <ubizjak@gmail.com>
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
Jan Hubicka <hubicka@ucw.cz>
Subject: Re: [04/32] [x86] Robustify vzeroupper handling across calls
Date: Tue, 08 Oct 2019 18:17:00 -0000 [thread overview]
Message-ID: <CAFULd4YWDOCeWw5udoBbvkGPthNZcBEkV4pUDQnZ3iFEP-aA8Q@mail.gmail.com> (raw)
In-Reply-To: <CAFULd4ZvEmn5tAYW_Ud--8j-V+908NnEZ8MnPU9BSVREf4GzYA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3486 bytes --]
The following patch uses correct SSE register class; vzeroupper
operates only on lower 16 (8 on 32bit target) SSE registers.
2019-10-08 Uroš Bizjak <ubizjak@gmail.com>
PR target/91994
* config/i386/i386.c (x86_avx_u128_mode_needed): Use SSE_REG
instead of ALL_SSE_REG to check if function call preserves some
256-bit SSE registers.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Committed to mainline SVN.
Uros.
On Tue, Oct 1, 2019 at 12:14 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Sep 25, 2019 at 5:48 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>
> > > The comment suggests that this code is only needed for Win64 and that
> > > not testing for Win64 is just a simplification. But in practice it was
> > > needed for correctness on GNU/Linux and other targets too, since without
> > > it the RA would be able to keep 256-bit and 512-bit values in SSE
> > > registers across calls that are known not to clobber them.
> > >
> > > This patch conservatively treats calls as AVX_U128_ANY if the RA can see
> > > that some SSE registers are not touched by a call. There are then no
> > > regressions if the ix86_hard_regno_call_part_clobbered check is disabled
> > > for GNU/Linux (not something we should do, was just for testing).
>
> If RA can sse that some SSE regs are not touched by the call, then we
> are sure that the called function is part of the current TU. In this
> case, the called function will be compiled using VEX instructions,
> where there is no AVX-SSE transition penalty. So, skipping VZEROUPPER
> is beneficial here.
>
> Uros.
>
> > > If in fact we want -fipa-ra to pretend that all functions clobber
> > > SSE registers above 128 bits, it'd certainly be possible to arrange
> > > that. But IMO that would be an optimisation decision, whereas what
> > > the patch is fixing is a correctness decision. So I think we should
> > > have this check even so.
> >
> > 2019-09-25 Richard Sandiford <richard.sandiford@arm.com>
> >
> > gcc/
> > * config/i386/i386.c: Include function-abi.h.
> > (ix86_avx_u128_mode_needed): Treat function calls as AVX_U128_ANY
> > if they preserve some 256-bit or 512-bit SSE registers.
> >
> > Index: gcc/config/i386/i386.c
> > ===================================================================
> > --- gcc/config/i386/i386.c 2019-09-25 16:47:48.000000000 +0100
> > +++ gcc/config/i386/i386.c 2019-09-25 16:47:49.089962608 +0100
> > @@ -95,6 +95,7 @@ #define IN_TARGET_CODE 1
> > #include "i386-builtins.h"
> > #include "i386-expand.h"
> > #include "i386-features.h"
> > +#include "function-abi.h"
> >
> > /* This file should be included last. */
> > #include "target-def.h"
> > @@ -13511,6 +13512,15 @@ ix86_avx_u128_mode_needed (rtx_insn *ins
> > }
> > }
> >
> > + /* If the function is known to preserve some SSE registers,
> > + RA and previous passes can legitimately rely on that for
> > + modes wider than 256 bits. It's only safe to issue a
> > + vzeroupper if all SSE registers are clobbered. */
> > + const function_abi &abi = insn_callee_abi (insn);
> > + if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS],
> > + abi.mode_clobbers (V4DImode)))
> > + return AVX_U128_ANY;
> > +
> > return AVX_U128_CLEAN;
> > }
> >
[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 605 bytes --]
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c (revision 276677)
+++ config/i386/i386.c (working copy)
@@ -13530,7 +13530,7 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
modes wider than 256 bits. It's only safe to issue a
vzeroupper if all SSE registers are clobbered. */
const function_abi &abi = insn_callee_abi (insn);
- if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS],
+ if (!hard_reg_set_subset_p (reg_class_contents[SSE_REGS],
abi.mode_clobbers (V4DImode)))
return AVX_U128_ANY;
next prev parent reply other threads:[~2019-10-08 18:17 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-11 19:02 [00/32] Support multiple ABIs in the same translation unit Richard Sandiford
2019-09-11 19:03 ` [01/32] Add function_abi.{h,cc} Richard Sandiford
2019-09-29 20:51 ` Jeff Law
2019-09-30 9:19 ` Richard Sandiford
2019-09-30 21:16 ` Jeff Law
2019-09-11 19:03 ` [02/32] Add a target hook for getting an ABI from a function type Richard Sandiford
2019-09-29 20:52 ` Jeff Law
2019-09-11 19:04 ` [03/32] Add a function for getting the ABI of a call insn target Richard Sandiford
2019-09-25 15:38 ` Richard Sandiford
2019-09-30 15:52 ` Jeff Law
2019-09-30 16:32 ` Richard Sandiford
2019-09-30 16:46 ` Jeff Law
2019-09-11 19:05 ` [05/32] Pass an ABI identifier to hard_regno_call_part_clobbered Richard Sandiford
2019-09-29 20:58 ` Jeff Law
2019-09-11 19:05 ` [04/32] [x86] Robustify vzeroupper handling across calls Richard Sandiford
2019-09-25 15:48 ` Richard Sandiford
2019-09-25 18:11 ` Uros Bizjak
2019-10-01 10:14 ` Uros Bizjak
2019-10-08 18:17 ` Uros Bizjak [this message]
2019-09-11 19:06 ` [06/32] Pass an ABI to choose_hard_reg_mode Richard Sandiford
2019-09-29 21:00 ` Jeff Law
2019-09-11 19:07 ` [07/32] Remove global call sets: caller-save.c Richard Sandiford
2019-09-29 21:01 ` Jeff Law
2019-09-11 19:07 ` [08/32] Remove global call sets: cfgcleanup.c Richard Sandiford
2019-09-29 21:02 ` Jeff Law
2019-09-11 19:08 ` [09/32] Remove global call sets: cfgloopanal.c Richard Sandiford
2019-09-29 21:02 ` Jeff Law
2019-09-11 19:08 ` [10/32] Remove global call sets: combine.c Richard Sandiford
2019-09-12 2:18 ` Segher Boessenkool
2019-09-12 7:52 ` Richard Sandiford
2019-09-20 0:43 ` Segher Boessenkool
2019-09-25 15:52 ` Richard Sandiford
2019-09-25 16:30 ` Segher Boessenkool
2019-09-29 22:32 ` Jeff Law
2019-09-29 22:43 ` Segher Boessenkool
2019-09-11 19:09 ` [11/32] Remove global call sets: cse.c Richard Sandiford
2019-09-25 15:57 ` Richard Sandiford
2019-09-29 21:04 ` Jeff Law
2019-09-30 16:23 ` Richard Sandiford
2019-09-11 19:09 ` [12/32] Remove global call sets: cselib.c Richard Sandiford
2019-09-29 21:05 ` Jeff Law
2019-10-29 9:20 ` Martin Liška
2019-09-11 19:10 ` [14/32] Remove global call sets: DF (entry/exit defs) Richard Sandiford
2019-09-29 21:07 ` Jeff Law
2019-09-11 19:10 ` [13/32] Remove global call sets: DF (EH edges) Richard Sandiford
2019-09-29 21:07 ` Jeff Law
2019-09-11 19:11 ` [16/32] Remove global call sets: function.c Richard Sandiford
2019-09-29 21:10 ` Jeff Law
2019-09-11 19:11 ` [15/32] Remove global call sets: early-remat.c Richard Sandiford
2019-09-29 21:09 ` Jeff Law
2019-09-11 19:11 ` [17/32] Remove global call sets: gcse.c Richard Sandiford
2019-09-25 16:04 ` Richard Sandiford
2019-09-29 21:10 ` Jeff Law
2019-09-11 19:12 ` [18/32] Remove global call sets: haifa-sched.c Richard Sandiford
2019-09-29 21:11 ` Jeff Law
2019-09-11 19:12 ` [19/32] Remove global call sets: IRA Richard Sandiford
2019-09-30 15:16 ` Jeff Law
2019-09-11 19:13 ` [20/32] Remove global call sets: loop-iv.c Richard Sandiford
2019-09-29 21:20 ` Jeff Law
2019-09-11 19:14 ` [22/32] Remove global call sets: postreload.c Richard Sandiford
2019-09-29 21:33 ` Jeff Law
2019-09-11 19:14 ` [23/32] Remove global call sets: postreload-gcse.c Richard Sandiford
2019-09-25 16:08 ` Richard Sandiford
2019-09-29 22:22 ` Jeff Law
2019-09-11 19:14 ` [21/32] Remove global call sets: LRA Richard Sandiford
2019-09-30 15:29 ` Jeff Law
2019-10-04 18:03 ` H.J. Lu
2019-10-04 21:52 ` H.J. Lu
2019-10-05 13:33 ` Richard Sandiford
2019-09-11 19:15 ` [25/32] Remove global call sets: regcprop.c Richard Sandiford
2019-09-29 21:34 ` Jeff Law
2019-09-11 19:15 ` [24/32] Remove global call sets: recog.c Richard Sandiford
2019-09-29 21:33 ` Jeff Law
2019-09-11 19:16 ` [27/32] Remove global call sets: reload.c Richard Sandiford
2019-09-29 22:26 ` Jeff Law
2019-09-11 19:16 ` [26/32] Remove global call sets: regrename.c Richard Sandiford
2019-09-29 22:25 ` Jeff Law
2019-09-11 19:17 ` [29/32] Remove global call sets: sched-deps.c Richard Sandiford
2019-09-29 22:20 ` Jeff Law
2019-10-04 14:32 ` Christophe Lyon
2019-10-04 14:35 ` Richard Sandiford
2019-10-04 14:37 ` Christophe Lyon
2019-10-07 13:29 ` Christophe Lyon
2019-09-11 19:17 ` [00/32] Remove global call sets: rtlanal.c Richard Sandiford
2019-09-29 22:21 ` Jeff Law
2019-09-11 19:18 ` [30/32] Remove global call sets: sel-sched.c Richard Sandiford
2019-09-30 15:08 ` Jeff Law
2019-09-11 19:18 ` [31/32] Remove global call sets: shrink-wrap.c Richard Sandiford
2019-09-29 22:21 ` Jeff Law
2019-09-11 19:19 ` [32/32] Hide regs_invalidated_by_call etc Richard Sandiford
2019-09-29 22:22 ` Jeff Law
2019-09-12 20:42 ` [00/32] Support multiple ABIs in the same translation unit Steven Bosscher
2019-09-26 19:24 ` Dimitar Dimitrov
2019-09-27 8:58 ` Richard Sandiford
2019-10-01 2:09 ` build-failure for cris-elf with "[00/32] Support multiple ABIs in the same translation unit" Hans-Peter Nilsson
2019-10-01 7:51 ` Richard Sandiford
2019-10-01 10:58 ` Hans-Peter Nilsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFULd4YWDOCeWw5udoBbvkGPthNZcBEkV4pUDQnZ3iFEP-aA8Q@mail.gmail.com \
--to=ubizjak@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hubicka@ucw.cz \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).