From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x92d.google.com (mail-ua1-x92d.google.com [IPv6:2607:f8b0:4864:20::92d]) by sourceware.org (Postfix) with ESMTPS id D99B63860C3F for ; Mon, 17 May 2021 08:40:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D99B63860C3F Received: by mail-ua1-x92d.google.com with SMTP id 14so1842538uac.9 for ; Mon, 17 May 2021 01:40:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=YJEwxpJ/bOT1E4lUojh7KGwFFGlhnp+jIrk1twXtisw=; b=UZNrFzUR55WBp8+FYu9tsVPxLBVX7zv4Zxr88A5tKG/Ja9Cfw1Yh03rkd+5JFhetjD EHEW6eDJsjpAQ04+91JL9JUqUB6MznWvs2If4VNIC23CJz+ifDZ6kJqapKyUj2yGuSZI dxx2ArFAbyuoq6aNPFSgMTyewMq+bUbJe0QtQjP48kw9144UKeodagfe8uA6GlsyaklO sn1N+sHPkm+VMOYLLtKRF16FKNJPH/FnJ9h9d5LIPPUQbjFw17/rEH7HSO+AOgq/tbxQ kFxBJ5QTHqQXSsneHAoy+NCSny/3uSXG1w8/VRI4ejodsLCL9D0vxkm+5t1kZ6gpyecq Znqw== X-Gm-Message-State: AOAM5337C3ap5IH+zpf/jWrBHuTE+9RcEw8f0DwEXsGDssanyt6Lbr5+ LrtLNr64b8Qsz0Bm0UKA1UAQ/lICBwQJRVCGR20= X-Google-Smtp-Source: ABdhPJxIGcLz1PEUupFZkuRraOS1xFU8ayaE2b4dJ8+jpsFKbooQrrePG9UIOChW90kUnywPbX2igH8ig2AsiFUMgcU= X-Received: by 2002:ab0:5481:: with SMTP id p1mr51768440uaa.77.1621240811375; Mon, 17 May 2021 01:40:11 -0700 (PDT) MIME-Version: 1.0 References: <20210513095433.GH1179226@tucnak> <20210513113704.GI1179226@tucnak> In-Reply-To: From: Hongtao Liu Date: Mon, 17 May 2021 16:44:30 +0800 Message-ID: Subject: Re: [PATCH] [i386] Fix _mm256_zeroupper to notify LRA that vzeroupper will kill sse registers. [PR target/82735] To: Jakub Jelinek , Uros Bizjak , Hongtao Liu , GCC Patches , "H. J. Lu" , Richard Sandiford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_MANYTO, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 May 2021 08:40:15 -0000 On Fri, May 14, 2021 at 10:27 AM Hongtao Liu wrote: > > On Thu, May 13, 2021 at 7:52 PM Richard Sandiford > wrote: > > > > Jakub Jelinek writes: > > > On Thu, May 13, 2021 at 12:32:26PM +0100, Richard Sandiford wrote: > > >> Jakub Jelinek writes: > > >> > On Thu, May 13, 2021 at 11:43:19AM +0200, Uros Bizjak wrote: > > >> >> > > Bootstrapped and regtested on X86_64-linux-gnu{-m32,} > > >> >> > > Ok for trunk? > > >> >> > > > >> >> > Some time ago a support for CLOBBER_HIGH RTX was added (and lat= er > > >> >> > removed for some reason). Perhaps we could resurrect the patch = for the > > >> >> > purpose of ferrying 128bit modes via vzeroupper RTX? > > >> >> > > >> >> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-11/msg01325.html > > >> > > > >> > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01468.html > > >> > is where it got removed, CCing Richard. > > >> > > >> Yeah. Initially clobber_high seemed like the best appraoch for > > >> handling the tlsdesc thing, but in practice it was too difficult > > >> to shoe-horn the concept in after the fact, when so much rtl > > >> infrastructure wasn't prepared to deal with it. The old support > > >> didn't handle all cases and passes correctly, and handled others > > >> suboptimally. > > >> > > >> I think it would be worth using the same approach as > > >> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01466.html for > > >> vzeroupper: represent the instructions as call_insns in which the > > >> call has a special vzeroupper ABI. I think that's likely to lead > > >> to better code than clobber_high would (or at least, it did for tlsd= esc). > > From an implementation perspective=EF=BC=8C I guess you're meaning we sho= uld > implement TARGET_INSN_CALLEE_ABI and TARGET_FNTYPE_ABI in the i386 > backend. > When I implemented the vzeroupper pattern as call_insn and defined TARGET_INSN_CALLEE_ABI for it, I got several failures. they're related to 2 parts 1. requires_stack_frame_p return true for vzeroupper which should be false. 2. in subst_stack_regs, vzeroupper shouldn't kill arguments I've tried a rough patch like below, it works for those failures, unfortunately, I don't have an arm machine to test, so I want to ask would the below change break something in the arm backend? modified gcc/reg-stack.c @@ -174,6 +174,7 @@ #include "reload.h" #include "tree-pass.h" #include "rtl-iter.h" +#include "function-abi.h" #ifdef STACK_REGS @@ -2385,7 +2386,7 @@ subst_stack_regs (rtx_insn *insn, stack_ptr regstack) bool control_flow_insn_deleted =3D false; int i; - if (CALL_P (insn)) + if (CALL_P (insn) && insn_callee_abi (insn).id () =3D=3D 0) { int top =3D regstack->top; modified gcc/shrink-wrap.c @@ -58,7 +58,12 @@ requires_stack_frame_p (rtx_insn *insn, HARD_REG_SET prologue_used, unsigned regno; if (CALL_P (insn)) - return !SIBLING_CALL_P (insn); + { + if (insn_callee_abi (insn).id() !=3D 0) + return false; + else + return !SIBLING_CALL_P (insn); + } /* We need a frame to get the unique CFA expected by the unwinder. */ if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn)) > > > > > > Perhaps a magic call_insn that is split post-reload into a normal ins= n > > > with the sets then? > > > > I'd be tempted to treat it is a call_insn throughout. The unspec_volat= ile > > means that we can't move the instruction, so converting a call_insn to = an > > insn isn't likely to help from that point of view. The sets are also > > likely to be handled suboptimally compared to the more accurate registe= r > > information attached to the call: all code that handles calls has to be > > prepared to deal with partial clobbers, whereas most code dealing with > > sets will assume that the set does useful work, and that the rhs of the > > set is live. > > > > Thanks, > > Richard > > > > > -- > BR, > Hongtao --=20 BR, Hongtao