public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Uros Bizjak <ubizjak@gmail.com>
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Jan Hubicka <hubicka@ucw.cz>
Subject: Re: [04/32] [x86] Robustify vzeroupper handling across calls
Date: Tue, 08 Oct 2019 18:17:00 -0000	[thread overview]
Message-ID: <CAFULd4YWDOCeWw5udoBbvkGPthNZcBEkV4pUDQnZ3iFEP-aA8Q@mail.gmail.com> (raw)
In-Reply-To: <CAFULd4ZvEmn5tAYW_Ud--8j-V+908NnEZ8MnPU9BSVREf4GzYA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3486 bytes --]

The following patch uses correct SSE register class; vzeroupper
operates only on lower 16 (8 on 32bit target) SSE registers.

2019-10-08  Uroš Bizjak  <ubizjak@gmail.com>

    PR target/91994
    * config/i386/i386.c (x86_avx_u128_mode_needed): Use SSE_REG
    instead of ALL_SSE_REG to check if function call preserves some
    256-bit SSE registers.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.

On Tue, Oct 1, 2019 at 12:14 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Sep 25, 2019 at 5:48 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>
> > > The comment suggests that this code is only needed for Win64 and that
> > > not testing for Win64 is just a simplification.  But in practice it was
> > > needed for correctness on GNU/Linux and other targets too, since without
> > > it the RA would be able to keep 256-bit and 512-bit values in SSE
> > > registers across calls that are known not to clobber them.
> > >
> > > This patch conservatively treats calls as AVX_U128_ANY if the RA can see
> > > that some SSE registers are not touched by a call.  There are then no
> > > regressions if the ix86_hard_regno_call_part_clobbered check is disabled
> > > for GNU/Linux (not something we should do, was just for testing).
>
> If RA can sse that some SSE regs are not touched by the call, then we
> are sure that the called function is part of the current TU. In this
> case, the called function will be compiled using VEX instructions,
> where there is no AVX-SSE transition penalty. So, skipping VZEROUPPER
> is beneficial here.
>
> Uros.
>
> > > If in fact we want -fipa-ra to pretend that all functions clobber
> > > SSE registers above 128 bits, it'd certainly be possible to arrange
> > > that.  But IMO that would be an optimisation decision, whereas what
> > > the patch is fixing is a correctness decision.  So I think we should
> > > have this check even so.
> >
> > 2019-09-25  Richard Sandiford  <richard.sandiford@arm.com>
> >
> > gcc/
> >         * config/i386/i386.c: Include function-abi.h.
> >         (ix86_avx_u128_mode_needed): Treat function calls as AVX_U128_ANY
> >         if they preserve some 256-bit or 512-bit SSE registers.
> >
> > Index: gcc/config/i386/i386.c
> > ===================================================================
> > --- gcc/config/i386/i386.c      2019-09-25 16:47:48.000000000 +0100
> > +++ gcc/config/i386/i386.c      2019-09-25 16:47:49.089962608 +0100
> > @@ -95,6 +95,7 @@ #define IN_TARGET_CODE 1
> >  #include "i386-builtins.h"
> >  #include "i386-expand.h"
> >  #include "i386-features.h"
> > +#include "function-abi.h"
> >
> >  /* This file should be included last.  */
> >  #include "target-def.h"
> > @@ -13511,6 +13512,15 @@ ix86_avx_u128_mode_needed (rtx_insn *ins
> >             }
> >         }
> >
> > +      /* If the function is known to preserve some SSE registers,
> > +        RA and previous passes can legitimately rely on that for
> > +        modes wider than 256 bits.  It's only safe to issue a
> > +        vzeroupper if all SSE registers are clobbered.  */
> > +      const function_abi &abi = insn_callee_abi (insn);
> > +      if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS],
> > +                                 abi.mode_clobbers (V4DImode)))
> > +       return AVX_U128_ANY;
> > +
> >        return AVX_U128_CLEAN;
> >      }
> >

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 605 bytes --]

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 276677)
+++ config/i386/i386.c	(working copy)
@@ -13530,7 +13530,7 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
 	 modes wider than 256 bits.  It's only safe to issue a
 	 vzeroupper if all SSE registers are clobbered.  */
       const function_abi &abi = insn_callee_abi (insn);
-      if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS],
+      if (!hard_reg_set_subset_p (reg_class_contents[SSE_REGS],
 				  abi.mode_clobbers (V4DImode)))
 	return AVX_U128_ANY;
 

  reply	other threads:[~2019-10-08 18:17 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11 19:02 [00/32] Support multiple ABIs in the same translation unit Richard Sandiford
2019-09-11 19:03 ` [01/32] Add function_abi.{h,cc} Richard Sandiford
2019-09-29 20:51   ` Jeff Law
2019-09-30  9:19     ` Richard Sandiford
2019-09-30 21:16       ` Jeff Law
2019-09-11 19:03 ` [02/32] Add a target hook for getting an ABI from a function type Richard Sandiford
2019-09-29 20:52   ` Jeff Law
2019-09-11 19:04 ` [03/32] Add a function for getting the ABI of a call insn target Richard Sandiford
2019-09-25 15:38   ` Richard Sandiford
2019-09-30 15:52     ` Jeff Law
2019-09-30 16:32       ` Richard Sandiford
2019-09-30 16:46         ` Jeff Law
2019-09-11 19:05 ` [05/32] Pass an ABI identifier to hard_regno_call_part_clobbered Richard Sandiford
2019-09-29 20:58   ` Jeff Law
2019-09-11 19:05 ` [04/32] [x86] Robustify vzeroupper handling across calls Richard Sandiford
2019-09-25 15:48   ` Richard Sandiford
2019-09-25 18:11     ` Uros Bizjak
2019-10-01 10:14     ` Uros Bizjak
2019-10-08 18:17       ` Uros Bizjak [this message]
2019-09-11 19:06 ` [06/32] Pass an ABI to choose_hard_reg_mode Richard Sandiford
2019-09-29 21:00   ` Jeff Law
2019-09-11 19:07 ` [07/32] Remove global call sets: caller-save.c Richard Sandiford
2019-09-29 21:01   ` Jeff Law
2019-09-11 19:07 ` [08/32] Remove global call sets: cfgcleanup.c Richard Sandiford
2019-09-29 21:02   ` Jeff Law
2019-09-11 19:08 ` [09/32] Remove global call sets: cfgloopanal.c Richard Sandiford
2019-09-29 21:02   ` Jeff Law
2019-09-11 19:08 ` [10/32] Remove global call sets: combine.c Richard Sandiford
2019-09-12  2:18   ` Segher Boessenkool
2019-09-12  7:52     ` Richard Sandiford
2019-09-20  0:43       ` Segher Boessenkool
2019-09-25 15:52         ` Richard Sandiford
2019-09-25 16:30           ` Segher Boessenkool
2019-09-29 22:32           ` Jeff Law
2019-09-29 22:43             ` Segher Boessenkool
2019-09-11 19:09 ` [11/32] Remove global call sets: cse.c Richard Sandiford
2019-09-25 15:57   ` Richard Sandiford
2019-09-29 21:04     ` Jeff Law
2019-09-30 16:23       ` Richard Sandiford
2019-09-11 19:09 ` [12/32] Remove global call sets: cselib.c Richard Sandiford
2019-09-29 21:05   ` Jeff Law
2019-10-29  9:20     ` Martin Liška
2019-09-11 19:10 ` [14/32] Remove global call sets: DF (entry/exit defs) Richard Sandiford
2019-09-29 21:07   ` Jeff Law
2019-09-11 19:10 ` [13/32] Remove global call sets: DF (EH edges) Richard Sandiford
2019-09-29 21:07   ` Jeff Law
2019-09-11 19:11 ` [16/32] Remove global call sets: function.c Richard Sandiford
2019-09-29 21:10   ` Jeff Law
2019-09-11 19:11 ` [15/32] Remove global call sets: early-remat.c Richard Sandiford
2019-09-29 21:09   ` Jeff Law
2019-09-11 19:11 ` [17/32] Remove global call sets: gcse.c Richard Sandiford
2019-09-25 16:04   ` Richard Sandiford
2019-09-29 21:10   ` Jeff Law
2019-09-11 19:12 ` [18/32] Remove global call sets: haifa-sched.c Richard Sandiford
2019-09-29 21:11   ` Jeff Law
2019-09-11 19:12 ` [19/32] Remove global call sets: IRA Richard Sandiford
2019-09-30 15:16   ` Jeff Law
2019-09-11 19:13 ` [20/32] Remove global call sets: loop-iv.c Richard Sandiford
2019-09-29 21:20   ` Jeff Law
2019-09-11 19:14 ` [22/32] Remove global call sets: postreload.c Richard Sandiford
2019-09-29 21:33   ` Jeff Law
2019-09-11 19:14 ` [23/32] Remove global call sets: postreload-gcse.c Richard Sandiford
2019-09-25 16:08   ` Richard Sandiford
2019-09-29 22:22     ` Jeff Law
2019-09-11 19:14 ` [21/32] Remove global call sets: LRA Richard Sandiford
2019-09-30 15:29   ` Jeff Law
2019-10-04 18:03   ` H.J. Lu
2019-10-04 21:52     ` H.J. Lu
2019-10-05 13:33       ` Richard Sandiford
2019-09-11 19:15 ` [25/32] Remove global call sets: regcprop.c Richard Sandiford
2019-09-29 21:34   ` Jeff Law
2019-09-11 19:15 ` [24/32] Remove global call sets: recog.c Richard Sandiford
2019-09-29 21:33   ` Jeff Law
2019-09-11 19:16 ` [27/32] Remove global call sets: reload.c Richard Sandiford
2019-09-29 22:26   ` Jeff Law
2019-09-11 19:16 ` [26/32] Remove global call sets: regrename.c Richard Sandiford
2019-09-29 22:25   ` Jeff Law
2019-09-11 19:17 ` [29/32] Remove global call sets: sched-deps.c Richard Sandiford
2019-09-29 22:20   ` Jeff Law
2019-10-04 14:32     ` Christophe Lyon
2019-10-04 14:35       ` Richard Sandiford
2019-10-04 14:37         ` Christophe Lyon
2019-10-07 13:29         ` Christophe Lyon
2019-09-11 19:17 ` [00/32] Remove global call sets: rtlanal.c Richard Sandiford
2019-09-29 22:21   ` Jeff Law
2019-09-11 19:18 ` [30/32] Remove global call sets: sel-sched.c Richard Sandiford
2019-09-30 15:08   ` Jeff Law
2019-09-11 19:18 ` [31/32] Remove global call sets: shrink-wrap.c Richard Sandiford
2019-09-29 22:21   ` Jeff Law
2019-09-11 19:19 ` [32/32] Hide regs_invalidated_by_call etc Richard Sandiford
2019-09-29 22:22   ` Jeff Law
2019-09-12 20:42 ` [00/32] Support multiple ABIs in the same translation unit Steven Bosscher
2019-09-26 19:24 ` Dimitar Dimitrov
2019-09-27  8:58   ` Richard Sandiford
2019-10-01  2:09 ` build-failure for cris-elf with "[00/32] Support multiple ABIs in the same translation unit" Hans-Peter Nilsson
2019-10-01  7:51   ` Richard Sandiford
2019-10-01 10:58     ` Hans-Peter Nilsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFULd4YWDOCeWw5udoBbvkGPthNZcBEkV4pUDQnZ3iFEP-aA8Q@mail.gmail.com \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).