From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) by sourceware.org (Postfix) with ESMTPS id 67B603858CDA for ; Tue, 11 Jul 2023 06:42:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 67B603858CDA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x135.google.com with SMTP id 2adb3069b0e04-4fb73ba3b5dso8537076e87.1 for ; Mon, 10 Jul 2023 23:42:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689057771; x=1691649771; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mP7X8JuNMcXSn0lARZDyaoC5ivEisfiHpwppgjb0zng=; b=fK1dZvCe4QQHPVaE65Uv6/rcTdoI9aT/Wd0JGJ6iFO6HAfxJZZarkxT/j5Wlaeps5Q XNwFdgjF1rK+nfKkb8TOgFcoEJ8sfksUdTtc6lKsnkIpRDWxqVrRbm3HFtGxvyTn+3yR CQMn6DnVcnxjgr9U4FdOk2DTnPS49LFywlJZ9g99BdO7+IUMYu211nsEllFKO+6O4/KV R9RXg0imeqOYd9XUqHJ3CjoAZkNiL3nwuSOwfMwF9+487WsXiX/Hv4J5L3mVlMkupc4L U/SghdXTQ1NmU2qgKbNvzVoVCE5xtGhnjLjCqmnOsxjH/ks2S7Go1jxQZd0ESfIU837g a1Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689057771; x=1691649771; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mP7X8JuNMcXSn0lARZDyaoC5ivEisfiHpwppgjb0zng=; b=BVKG/80hGrI3YmeZqJWajq9bncD9djGaJIbK+8DBYYWxQ9GezmltzTTRXdp8uZADu9 ns2INzOcty9Le0bJNCKCyzUBUNqHgKB0mZavVh6Sz3ufVaYAgdf4HUhAaLGZoDMYM2xx N+trsyPfkHhoqbNPx7tUVBz46bPW7XTEpUZdSHzAtl1fivN98hq9N08nqGs2hb3KKQV0 jJijglQTZOc8jxi95mXZmNFqHbTCSXgC1RkZp7iSltOlHGbyMmgSOL1lzdABpSyd5vW8 OyaCVjMWgPN4sGsUR508XWXPnT0D4XwOcIJgGpuY3/5sPbW2kasctPt4qjj4HiaW7zDp poDA== X-Gm-Message-State: ABy/qLZuyrkvrSYvpC0Avj6x6K+9xdgEal2O7Q71Vf7LsVNXujc+flWV zLSbvqvaoufi146M4HLTpKnxj3ojx7rOMNT4SB0= X-Google-Smtp-Source: APBJJlFO81VtoGlUcW39ns3qrsaXyw6JNI/Qdj8jPpBQUMQF1XlkLusI3YaT8cyup4x8nvt4dTR3M0Tuc4H0pfbj9Ps= X-Received: by 2002:a2e:93d0:0:b0:2b6:ef10:f018 with SMTP id p16-20020a2e93d0000000b002b6ef10f018mr12657434ljh.42.1689057770508; Mon, 10 Jul 2023 23:42:50 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Tue, 11 Jul 2023 08:42:29 +0200 Message-ID: Subject: Re: [x86-64] RFC: Add nosse abi attribute To: Alexander Monakov Cc: Michael Matz , gcc-patches@gcc.gnu.org, Jan Hubicka Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Jul 10, 2023 at 9:08=E2=80=AFPM Alexander Monakov via Gcc-patches wrote: > > > On Mon, 10 Jul 2023, Michael Matz via Gcc-patches wrote: > > > Hello, > > > > the ELF psABI for x86-64 doesn't have any callee-saved SSE > > registers (there were actual reasons for that, but those don't > > matter anymore). This starts to hurt some uses, as it means that > > as soon as you have a call (say to memmove/memcpy, even if > > implicit as libcall) in a loop that manipulates floating point > > or vector data you get saves/restores around those calls. > > > > But in reality many functions can be written such that they only need > > to clobber a subset of the 16 XMM registers (or do the save/restore > > themself in the codepaths that needs them, hello memcpy again). > > So we want to introduce a way to specify this, via an ABI attribute > > that basically says "doesn't clobber the high XMM regs". > > I think the main question is why you're going with this (weak) form > instead of the (strong) form "may only clobber the low XMM regs": > as Richi noted, surely for libcalls we'd like to know they preserve > AVX-512 mask registers as well? > > (I realize this is partially answered later) > > Note this interacts with anything that interposes between the caller > and the callee, like the Glibc lazy binding stub (which used to > zero out high halves of 512-bit arguments in ZMM registers). > Not an immediate problem for the patch, just something to mind perhaps. > > > I've opted to do only the obvious: do something special only for > > xmm8 to xmm15, without a way to specify the clobber set in more detail. > > I think such half/half split is reasonable, and as I don't want to > > change the argument passing anyway (whose regs are always clobbered) > > there isn't that much wiggle room anyway. > > > > I chose to make it possible to write function definitions with that > > attribute with GCC adding the necessary callee save/restore code in > > the xlogue itself. > > But you can't trivially restore if the callee is sibcalling =E2=80=94 wha= t > happens then (a testcase might be nice)? > > > Carefully note that this is only possible for > > the SSE2 registers, as other parts of them would need instructions > > that are only optional. > > What is supposed to happen on 32-bit x86 with -msse -mno-sse2? > > > When a function doesn't contain calls to > > unknown functions we can be a bit more lenient: we can make it so that > > GCC simply doesn't touch xmm8-15 at all, then no save/restore is > > necessary. > > What if the source code has a local register variable bound to xmm15, > i.e. register double x asm("xmm15"); asm("..." : "+x"(x)); ? > Probably "dont'd do that", i.e. disallow that in the documentation? > > > If a function contains calls then GCC can't know which > > parts of the XMM regset is clobbered by that, it may be parts > > which don't even exist yet (say until avx2048 comes out), so we must > > restrict ourself to only save/restore the SSE2 parts and then of course > > can only claim to not clobber those parts. > > Hm, I guess this is kinda the reason a "weak" form is needed. But this > highlights the difference between the two: the "weak" form will actively > preserve some state (so it cannot preserve future extensions), while > the "strong" form may just passively not touch any state, preserving > any state it doesn't know about. > > > To that end I introduce actually two related attributes (for naming > > see below): > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered > > This is the weak/active form; I'd suggest "preserve_high_sse". > > > * noanysseclobber: claims (and ensures) that nothing of any of the > > registers overlapping xmm8-15 is clobbered (not even future, as of > > yet unknown, parts) > > This is the strong/passive form; I'd suggest "only_low_sse". > > > Ensuring the first is simple: potentially add saves/restore in xlogue > > (e.g. when xmm8 is either used explicitely or implicitely by a call). > > Ensuring the second comes with more: we must also ensure that no > > functions are called that don't guarantee the same thing (in addition > > to just removing all xmm8-15 parts alltogether from the available > > regsters). > > > > See also the added testcases for what I intended to support. > > > > I chose to use the new target independend function-abi facility for > > this. I need some adjustments in generic code: > > * the "default_abi" is actually more like a "current" abi: it happily > > changes its contents according to conditional_register_usage, > > and other code assumes that such changes do propagate. > > But if that conditonal_reg_usage is actually done because the current > > function is of a different ABI, then we must not change default_abi. > > * in insn_callee_abi we do look at a potential fndecl for a call > > insn (only set when -fipa-ra), but doesn't work for calls through > > pointers and (as said) is optional. So, also always look at the > > called functions type (it's always recorded in the MEM_EXPR for > > non-libcalls), before asking the target. > > (The function-abi accessors working on trees were already doing that, > > its just the RTL accessor that missed this) > > > > Accordingly I also implement some more target hooks for function-abi. > > With that it's possible to also move the other ABI-influencing code > > of i386 to function-abi (ms_abi and friends). I have not done so for > > this patch. > > > > Regarding the names of the attributes: gah! I've left them at > > my mediocre attempts of names in order to hopefully get input on better > > names :-) > > > > I would welcome any comments, about the names, the approach, the attemp= t > > at documenting the intricacies of these attributes and anything. > > I hope the new attributes are supposed to be usable with function pointer= s? > From the code it looks that way, but the documentation doesn't promise th= at. > > > FWIW, this particular patch was regstrapped on x86-64-linux > > with trunk from a week ago (and sniff-tested on current trunk). > > This looks really cool. The biggest benefit might be from IPA with LTO where we'd carefully place t= hose attributes at WPA time (at that time tieing our hands for later). For manual use it would be nice to diagnose calls to non-{nosse,noanysse}cl= obber functions in such annotated functions - because when we have to conservativ= ely handle unknown calls that's hardly going to be better than saving exactly the set of SSE regs that need to be preserved in the ultimate caller we want to optimize. I wonder whether the linker could come to rescue here if we introduce speci= al aliases with nosse/noanysse clobber ABI that would generate stubs when entr= y points with such ABI guarantee are not available (those stubs could also sp= ecify the sub-ISA used and thus "solve" the "future" thing as long as the dynamic loader(?) can handle it). I'll note that with AVX512 one of the advantages is that vzero{upper,all} does not modify xmm16-xmm31, so an alternate ABI where all mask registers and xmm16-xmm31 are callee saved would not be impacted by AVX2 using code. What's the plan for using those attributes? As Alex says glibc is using vzero{upper,all} in the AVX+ specific routines. Any future changes there and placing attributes would be an ABI break requiring new entry points? Richard. > Thanks. > Alexander