From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 101243 invoked by alias); 4 Feb 2020 10:16:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 101221 invoked by uid 89); 4 Feb 2020 10:16:22 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-il1-f194.google.com Received: from mail-il1-f194.google.com (HELO mail-il1-f194.google.com) (209.85.166.194) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 04 Feb 2020 10:16:20 +0000 Received: by mail-il1-f194.google.com with SMTP id f5so15369679ilq.5 for ; Tue, 04 Feb 2020 02:16:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SPSWbSD8WR3UE+SoW5/tcJY9bVCAhY/ueAghMnE7kLo=; b=va7b+TyFOJ2r5Z/LowcyW4BEcHSpj8jJeX3KysFrjhL1MF0dEhMJmysF1fWufcvNzb 8Swr6VLOseA3yCGp7Q8MG+w7hVUdeJm1TY6FfifvK4vg22tqZ014F8hDgmoHqeJLkMnN 9SGt8eEQLyzG8aI7RLbYYyrsoquoKVGbOHzRSbj8Wdxq4dcDjy86U6/wBQ1urgb5x/ZF LHJKuVdOEz15Ho1Wjuz1egv6hyw1umEvMb3yJNgXMgu69LQG8jTEUTalxg9UqJY16gsR 8RwLBNzIi2SAtprGCYbO1/DXWIMROXJmXMmJm7l6/U8anaCn2qbOIhNzUF7EzgbZX8bR jwuw== MIME-Version: 1.0 References: <20200204093921.GN17695@tucnak> In-Reply-To: <20200204093921.GN17695@tucnak> From: Uros Bizjak Date: Tue, 04 Feb 2020 10:16:00 -0000 Message-ID: Subject: Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190] To: Jakub Jelinek Cc: Richard Sandiford , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" X-SW-Source: 2020-02/txt/msg00162.txt.bz2 On Tue, Feb 4, 2020 at 10:39 AM Jakub Jelinek wrote: > > Hi! > > As mentioned in the PR, the CLOBBERs in vzeroupper are added there even for > registers that aren't ever live in the function before and break the > prologue/epilogue expansion with ms ABI (normal ABIs are fine, as they > consider all [xyz]mm registers call clobbered, but the ms ABI considers > xmm0-15 call used but the bits above low 128 ones call clobbered). > The following patch fixes it by not adding the clobbers during vzeroupper > pass (before pro_and_epilogue), but adding them for -fipa-ra purposes only > during the final output. Perhaps we could add some CLOBBERs early (say for > df_regs_ever_live_p regs that aren't live in the live_regs bitmap, or > depending on the ABI either add all of them immediately, or for ms ABI add > CLOBBERs for xmm0-xmm5 if they don't have a SET) and add the rest later. > And the addition could be perhaps done at other spots, e.g. in an > epilogue_completed guarded splitter. If it works OK, I'd rather see this functionality implemented as an epilogue_completed guarded splitter. In the .md files, there are already cases where we split at this point, and where it is assumed that allocated registers won't change anymore. Also, please don't make the functionality conditional on flag_split_ra. This way, we would always get new patterns in the debug dumps, so in case something goes wrong, one could at least clearly see the full pattern. > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > Or any of the above mentioned variants? > > A separate issue is CALL_USED_REGISTERS for msabi on xmm16-xmm31. I guess that Comment #9 patch form the PR should be trivially correct, but althouhg it looks obvious, I don't want to propose the patch since I have no means of testing it. Uros. > And yet another issue is that the presence of a vzeroupper instruction does > (and the patch doesn't change) prevent callers from trying to preserve > something in the xmm0-xmm15 registers if the callee doesn't use them. > Would be nice to be able to say that the vzeroupper will preserve the low > 128 bits, so it is ok to hold a 128-bit value across the call, but not > 256-bit or larger. Not sure if the RA has a way to express that (and IPA-RA > certainly ATM doesn't). > > 2020-02-04 Jakub Jelinek > > PR target/92190 > * config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only > include sets and not clobbers in the vzeroupper pattern. > * config/i386/sse.md (*avx_vzeroupper): Add the clobbers during > pattern output. > > * gcc.target/i386/pr92190.c: New test. > > --- gcc/config/i386/i386-features.c.jj 2020-02-03 13:17:15.919723150 +0100 > +++ gcc/config/i386/i386-features.c 2020-02-03 18:30:08.274865983 +0100 > @@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p > > (set (reg:V2DF R) (reg:V2DF R)) > > - which preserves the low 128 bits but clobbers the upper bits. > - For a dead register we just use: > - > - (clobber (reg:V2DF R)) > - > - which invalidates any previous contents of R and stops R from becoming > - live across the vzeroupper in future. */ > + which preserves the low 128 bits but clobbers the upper bits. */ > > static void > ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs) > { > rtx pattern = PATTERN (insn); > unsigned int nregs = TARGET_64BIT ? 16 : 8; > - rtvec vec = rtvec_alloc (nregs + 1); > - RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0); > + unsigned int npats = nregs; > for (unsigned int i = 0; i < nregs; ++i) > { > unsigned int regno = GET_SSE_REGNO (i); > + if (!bitmap_bit_p (live_regs, regno)) > + npats--; > + } > + if (npats == 0) > + return; > + rtvec vec = rtvec_alloc (npats + 1); > + RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0); > + for (unsigned int i = 0, j = 0; i < nregs; ++i) > + { > + unsigned int regno = GET_SSE_REGNO (i); > + if (!bitmap_bit_p (live_regs, regno)) > + continue; > rtx reg = gen_rtx_REG (V2DImode, regno); > - if (bitmap_bit_p (live_regs, regno)) > - RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg); > - else > - RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg); > + ++j; > + RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg); > } > XVEC (pattern, 0) = vec; > df_insn_rescan (insn); > --- gcc/config/i386/sse.md.jj 2020-02-03 13:17:15.957722589 +0100 > +++ gcc/config/i386/sse.md 2020-02-03 18:30:08.280865894 +0100 > @@ -19819,7 +19819,34 @@ (define_insn "*avx_vzeroupper" > [(match_parallel 0 "vzeroupper_pattern" > [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])] > "TARGET_AVX" > - "vzeroupper" > +{ > + if (flag_ipa_ra && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1) > + { > + /* For IPA-RA purposes, make it clear the instruction clobbers > + even XMM registers not mentioned explicitly in the pattern. */ > + unsigned int nregs = TARGET_64BIT ? 16 : 8; > + unsigned int npats = XVECLEN (operands[0], 0); > + rtvec vec = rtvec_alloc (nregs + 1); > + RTVEC_ELT (vec, 0) = XVECEXP (operands[0], 0, 0); > + for (unsigned int i = 0, j = 1; i < nregs; ++i) > + { > + unsigned int regno = GET_SSE_REGNO (i); > + if (j < npats > + && REGNO (SET_DEST (XVECEXP (operands[0], 0, j))) == regno) > + { > + RTVEC_ELT (vec, i + 1) = XVECEXP (operands[0], 0, j); > + j++; > + } > + else > + { > + rtx reg = gen_rtx_REG (V2DImode, regno); > + RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg); > + } > + } > + XVEC (operands[0], 0) = vec; > + } > + return "vzeroupper"; > +} > [(set_attr "type" "sse") > (set_attr "modrm" "0") > (set_attr "memory" "none") > --- gcc/testsuite/gcc.target/i386/pr92190.c.jj 2020-02-03 18:29:30.924415644 +0100 > +++ gcc/testsuite/gcc.target/i386/pr92190.c 2020-02-03 18:29:15.992635389 +0100 > @@ -0,0 +1,19 @@ > +/* PR target/92190 */ > +/* { dg-do compile { target { *-*-linux* && lp64 } } } */ > +/* { dg-options "-mabi=ms -O2 -mavx512f" } */ > + > +typedef char VC __attribute__((vector_size (16))); > +typedef int VI __attribute__((vector_size (16 * sizeof 0))); > +VC a; > +VI b; > +void bar (VI); > +void baz (VC); > + > +void > +foo (void) > +{ > + VC k = a; > + VI n = b; > + bar (n); > + baz (k); > +} > > Jakub >