From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 68C9C383801C for ; Mon, 17 May 2021 09:56:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 68C9C383801C Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 043D8113E; Mon, 17 May 2021 02:56:08 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2F6E93F719; Mon, 17 May 2021 02:56:07 -0700 (PDT) From: Richard Sandiford To: Hongtao Liu via Gcc-patches Mail-Followup-To: Hongtao Liu via Gcc-patches , Jakub Jelinek , Uros Bizjak , Hongtao Liu , "H. J. Lu" , richard.sandiford@arm.com Cc: Jakub Jelinek , Uros Bizjak , Hongtao Liu , "H. J. Lu" Subject: Re: [PATCH] [i386] Fix _mm256_zeroupper to notify LRA that vzeroupper will kill sse registers. [PR target/82735] References: <20210513095433.GH1179226@tucnak> <20210513113704.GI1179226@tucnak> Date: Mon, 17 May 2021 10:56:06 +0100 In-Reply-To: (Hongtao Liu via Gcc-patches's message of "Mon, 17 May 2021 16:44:30 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 May 2021 09:56:10 -0000 Hongtao Liu via Gcc-patches writes: > On Fri, May 14, 2021 at 10:27 AM Hongtao Liu wrote: >> >> On Thu, May 13, 2021 at 7:52 PM Richard Sandiford >> wrote: >> > >> > Jakub Jelinek writes: >> > > On Thu, May 13, 2021 at 12:32:26PM +0100, Richard Sandiford wrote: >> > >> Jakub Jelinek writes: >> > >> > On Thu, May 13, 2021 at 11:43:19AM +0200, Uros Bizjak wrote: >> > >> >> > > Bootstrapped and regtested on X86_64-linux-gnu{-m32,} >> > >> >> > > Ok for trunk? >> > >> >> > >> > >> >> > Some time ago a support for CLOBBER_HIGH RTX was added (and la= ter >> > >> >> > removed for some reason). Perhaps we could resurrect the patch= for the >> > >> >> > purpose of ferrying 128bit modes via vzeroupper RTX? >> > >> >> >> > >> >> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-11/msg01325.html >> > >> > >> > >> > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01468.html >> > >> > is where it got removed, CCing Richard. >> > >> >> > >> Yeah. Initially clobber_high seemed like the best appraoch for >> > >> handling the tlsdesc thing, but in practice it was too difficult >> > >> to shoe-horn the concept in after the fact, when so much rtl >> > >> infrastructure wasn't prepared to deal with it. The old support >> > >> didn't handle all cases and passes correctly, and handled others >> > >> suboptimally. >> > >> >> > >> I think it would be worth using the same approach as >> > >> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01466.html for >> > >> vzeroupper: represent the instructions as call_insns in which the >> > >> call has a special vzeroupper ABI. I think that's likely to lead >> > >> to better code than clobber_high would (or at least, it did for tls= desc). >> >> From an implementation perspective=EF=BC=8C I guess you're meaning we sh= ould >> implement TARGET_INSN_CALLEE_ABI and TARGET_FNTYPE_ABI in the i386 >> backend. >> > When I implemented the vzeroupper pattern as call_insn and defined > TARGET_INSN_CALLEE_ABI for it, I got several failures. they're related > to 2 parts > > 1. requires_stack_frame_p return true for vzeroupper which should be fals= e. > 2. in subst_stack_regs, vzeroupper shouldn't kill arguments > > I've tried a rough patch like below, it works for those failures, > unfortunately, I don't have an arm machine to test, so I want to ask > would the below change break something in the arm backend? ABI id 0 just means the default ABI. Real calls can use other ABIs besides the default. That said=E2=80=A6 > modified gcc/reg-stack.c > @@ -174,6 +174,7 @@ > #include "reload.h" > #include "tree-pass.h" > #include "rtl-iter.h" > +#include "function-abi.h" > > #ifdef STACK_REGS > > @@ -2385,7 +2386,7 @@ subst_stack_regs (rtx_insn *insn, stack_ptr regstac= k) > bool control_flow_insn_deleted =3D false; > int i; > > - if (CALL_P (insn)) > + if (CALL_P (insn) && insn_callee_abi (insn).id () =3D=3D 0) > { > int top =3D regstack->top; =E2=80=A6reg-stack.c is effectively x86-specific code, so checking id 0 here wouldn't affect anything else. It doesn't feel very future-proof though, since x86 could use ABIs other than 0 for real calls in future. AIUI the property that matters here isn't the ABI, but that the target of the call doesn't reference stack registers. That can be true for real calls too, with -fipa-ra. > modified gcc/shrink-wrap.c > @@ -58,7 +58,12 @@ requires_stack_frame_p (rtx_insn *insn, > HARD_REG_SET prologue_used, > unsigned regno; > > if (CALL_P (insn)) > - return !SIBLING_CALL_P (insn); > + { > + if (insn_callee_abi (insn).id() !=3D 0) > + return false; > + else > + return !SIBLING_CALL_P (insn); > + } TBH I'm not sure why off-hand this function needs to treat non-sibling calls specially, rather than rely on normal DF information. Calls have a use of the stack pointer, so we should return true for that reason: /* The stack ptr is used (honorarily) by a CALL insn. */ df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i], NULL, bb, insn_info, DF_REF_REG_USE, DF_REF_CALL_STACK_USAGE | flags); I guess this is something we should suppress for fake calls though. It looks like the rtx =E2=80=9Cused=E2=80=9D flag is unused for INSNs, so w= e could use that as a CALL_INSN flag that indicates a fake call. We could just need to make: /* For all other RTXes clear the used flag on the copy. */ RTX_FLAG (copy, used) =3D 0; conditional on !INSN_P. Thanks, Richard