From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x929.google.com (mail-ua1-x929.google.com [IPv6:2607:f8b0:4864:20::929]) by sourceware.org (Postfix) with ESMTPS id 13741385703F for ; Tue, 18 May 2021 13:07:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 13741385703F Received: by mail-ua1-x929.google.com with SMTP id 20so3199253uaf.12 for ; Tue, 18 May 2021 06:07:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=udvr1ZQURsJ94pRT/qVsIRuPi6rFCj96gOMm/SqSoP0=; b=JQUP4amrjhK+nt2kHUy5N4u+wk1e6dQYJpzn7L1L1x/b64OCW3Al2JnhUak1aOQfG8 r8zDZVgS17YYiGf/ZAhjGarogLWF/XGFQftUh6518t1UJRSri0YJGXcne+yC2lTRJlmN RoxvRyxqXF+8vxImzpSdqQOCwPnt8p7b+1yhuFkslUAPOwVZxxMWBN8COtJx6Ya/gSQj KQSQ1coJgIiQUNIKYcO96LijUjMWaDgaQ8rHJTUH6y3C4Uv+i+TCww12iSiHcC/N1HC/ qmNNhcabew4+sBcP53gc7aRt9r/oQU8Ov1puS16zIvgSBcAHntQdfd8+qMR1C4dZOkhM r1mQ== X-Gm-Message-State: AOAM531+s6LPsEBCv97dKNLjTgnyeDkt60hhGdpARDBuS3OxJHMHlgpG wABulB+VJhRY5DItNSer2lIcLRUPPbmLJPRuobv3Gv7lPFVw8A== X-Google-Smtp-Source: ABdhPJycFaFtc0sQQbSRs6tp5z8oG1RbX8jP/xFVr5henk7rL/nJnoV2/UwmzeFWNG3Yho8DEF5m7QQBKqyZpcDMY+I= X-Received: by 2002:ab0:20d0:: with SMTP id z16mr6023332ual.33.1621343268170; Tue, 18 May 2021 06:07:48 -0700 (PDT) MIME-Version: 1.0 References: <20210513095433.GH1179226@tucnak> <20210513113704.GI1179226@tucnak> In-Reply-To: From: Hongtao Liu Date: Tue, 18 May 2021 21:12:03 +0800 Message-ID: Subject: Re: [PATCH] [i386] Fix _mm256_zeroupper to notify LRA that vzeroupper will kill sse registers. [PR target/82735] To: Hongtao Liu via Gcc-patches , Jakub Jelinek , Uros Bizjak , Hongtao Liu , "H. J. Lu" , Richard Sandiford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_MANYTO, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 May 2021 13:07:53 -0000 On Mon, May 17, 2021 at 5:56 PM Richard Sandiford wrote: > > Hongtao Liu via Gcc-patches writes: > > On Fri, May 14, 2021 at 10:27 AM Hongtao Liu wrote= : > >> > >> On Thu, May 13, 2021 at 7:52 PM Richard Sandiford > >> wrote: > >> > > >> > Jakub Jelinek writes: > >> > > On Thu, May 13, 2021 at 12:32:26PM +0100, Richard Sandiford wrote: > >> > >> Jakub Jelinek writes: > >> > >> > On Thu, May 13, 2021 at 11:43:19AM +0200, Uros Bizjak wrote: > >> > >> >> > > Bootstrapped and regtested on X86_64-linux-gnu{-m32,} > >> > >> >> > > Ok for trunk? > >> > >> >> > > >> > >> >> > Some time ago a support for CLOBBER_HIGH RTX was added (and = later > >> > >> >> > removed for some reason). Perhaps we could resurrect the pat= ch for the > >> > >> >> > purpose of ferrying 128bit modes via vzeroupper RTX? > >> > >> >> > >> > >> >> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-11/msg01325.htm= l > >> > >> > > >> > >> > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01468.html > >> > >> > is where it got removed, CCing Richard. > >> > >> > >> > >> Yeah. Initially clobber_high seemed like the best appraoch for > >> > >> handling the tlsdesc thing, but in practice it was too difficult > >> > >> to shoe-horn the concept in after the fact, when so much rtl > >> > >> infrastructure wasn't prepared to deal with it. The old support > >> > >> didn't handle all cases and passes correctly, and handled others > >> > >> suboptimally. > >> > >> > >> > >> I think it would be worth using the same approach as > >> > >> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01466.html f= or > >> > >> vzeroupper: represent the instructions as call_insns in which the > >> > >> call has a special vzeroupper ABI. I think that's likely to lead > >> > >> to better code than clobber_high would (or at least, it did for t= lsdesc). > >> > >> From an implementation perspective=EF=BC=8C I guess you're meaning we = should > >> implement TARGET_INSN_CALLEE_ABI and TARGET_FNTYPE_ABI in the i386 > >> backend. > >> > > When I implemented the vzeroupper pattern as call_insn and defined > > TARGET_INSN_CALLEE_ABI for it, I got several failures. they're related > > to 2 parts > > > > 1. requires_stack_frame_p return true for vzeroupper which should be fa= lse. > > 2. in subst_stack_regs, vzeroupper shouldn't kill arguments > > > > I've tried a rough patch like below, it works for those failures, > > unfortunately, I don't have an arm machine to test, so I want to ask > > would the below change break something in the arm backend? > > ABI id 0 just means the default ABI. Real calls can use other ABIs > besides the default. That said=E2=80=A6 > > > modified gcc/reg-stack.c > > @@ -174,6 +174,7 @@ > > #include "reload.h" > > #include "tree-pass.h" > > #include "rtl-iter.h" > > +#include "function-abi.h" > > > > #ifdef STACK_REGS > > > > @@ -2385,7 +2386,7 @@ subst_stack_regs (rtx_insn *insn, stack_ptr regst= ack) > > bool control_flow_insn_deleted =3D false; > > int i; > > > > - if (CALL_P (insn)) > > + if (CALL_P (insn) && insn_callee_abi (insn).id () =3D=3D 0) > > { > > int top =3D regstack->top; > > =E2=80=A6reg-stack.c is effectively x86-specific code, so checking id 0 h= ere > wouldn't affect anything else. It doesn't feel very future-proof > though, since x86 could use ABIs other than 0 for real calls in future. > > AIUI the property that matters here isn't the ABI, but that the target > of the call doesn't reference stack registers. That can be true for > real calls too, with -fipa-ra. > > > modified gcc/shrink-wrap.c > > @@ -58,7 +58,12 @@ requires_stack_frame_p (rtx_insn *insn, > > HARD_REG_SET prologue_used, > > unsigned regno; > > > > if (CALL_P (insn)) > > - return !SIBLING_CALL_P (insn); > > + { > > + if (insn_callee_abi (insn).id() !=3D 0) > > + return false; > > + else > > + return !SIBLING_CALL_P (insn); > > + } > > TBH I'm not sure why off-hand this function needs to treat non-sibling > calls specially, rather than rely on normal DF information. Calls have > a use of the stack pointer, so we should return true for that reason: > > /* The stack ptr is used (honorarily) by a CALL insn. */ > df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i], > NULL, bb, insn_info, DF_REF_REG_USE, > DF_REF_CALL_STACK_USAGE | flags); > > I guess this is something we should suppress for fake calls though. > > It looks like the rtx =E2=80=9Cused=E2=80=9D flag is unused for INSNs, so= we could > use that as a CALL_INSN flag that indicates a fake call. We could just > need to make: > > /* For all other RTXes clear the used flag on the copy. */ > RTX_FLAG (copy, used) =3D 0; > > conditional on !INSN_P. > I got another error in @@ -83,6 +83,9 @@ control_flow_insn_p (const rtx_insn *insn) return true; case CALL_INSN: + /* CALL_INSN use "used" flag to indicate it's a fake call. */ + if (RTX_FLAG (insn, used)) + break; and performance issue in modified gcc/final.c @@ -4498,7 +4498,8 @@ leaf_function_p (void) for (insn =3D get_insns (); insn; insn =3D NEXT_INSN (insn)) { if (CALL_P (insn) - && ! SIBLING_CALL_P (insn)) + && ! SIBLING_CALL_P (insn) + && !RTX_FLAG (insn, used)) return 0; if (NONJUMP_INSN_P (insn) Also i grep CALL_P or CALL_INSN in GCC source codes, there are many places which hold the assumption CALL_P/CALL_INSN is a real call. Considering that vzeroupper is used a lot on the i386 backend, I'm a bit worried that this implementation solution will be a bottomless pit. > Thanks, > Richard --=20 BR, Hongtao