From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by sourceware.org (Postfix) with ESMTPS id B4605386F0F8 for ; Tue, 7 Jun 2022 02:46:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B4605386F0F8 Received: by mail-pl1-x632.google.com with SMTP id w3so13630470plp.13 for ; Mon, 06 Jun 2022 19:46:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Y9l3k8c953PNNCArN8msxg08cbpEnpQ0MdHeXR9Id6U=; b=NXezfAFUH4vWa6P2XrRXGUXfUwZtqa733VQFOzGAwp2jpByxuVpIu8wYwUxQxko2DS CZqIRWL5bOoIHfagxj+D2WG4lU6pU9m4gwvxD5v4Ku2QNYSaUuYg7BUrkuRouRon1/T9 mcGgzMGxEmbaijOW6k3Y1xBmF2BaQr9Uuj3kb2vW4/tZ8bjbOg4gw0ZlLr/fPhvvlu/S xbWOjwWc48tb0C6cAy/yHkXmiOkVXisb0QfMx2Y3brsbzh72AxNNcvmhErIAootsTXze 0QBlk7uOxqEL+b2OKJEn/DG+VGovnm+xZtTpW316htQxT65jpoyF4TBrcxlEnA3pmr12 CxVg== X-Gm-Message-State: AOAM531nwF9YhPGTs6KrIplNIcZSkze8MfPxy5iQacXrhsJyo3a7TVUI 43K5zxPefuXSHJsHJEBAiRvxC+GvKBEbuwBZ7DiKscAafUI= X-Google-Smtp-Source: ABdhPJxPLEioKECXoEoSuPIMmkD7p4WfS+A+n8e4Pjt0wu7wHRqodaL2R0hhml6JOTTkUw+9N0lB1pq27tv3piDiPMo= X-Received: by 2002:a17:902:c651:b0:164:127:cdfd with SMTP id s17-20020a170902c65100b001640127cdfdmr27671491pls.154.1654569976428; Mon, 06 Jun 2022 19:46:16 -0700 (PDT) MIME-Version: 1.0 References: <20220603044229.2180216-2-goldstein.w.n@gmail.com> <20220606223726.2082226-1-goldstein.w.n@gmail.com> <20220606223726.2082226-2-goldstein.w.n@gmail.com> In-Reply-To: <20220606223726.2082226-2-goldstein.w.n@gmail.com> From: "H.J. Lu" Date: Mon, 6 Jun 2022 19:45:40 -0700 Message-ID: Subject: Re: [PATCH v4 2/8] x86: Add COND_VZEROUPPER that can replace vzeroupper if no `ret` To: Noah Goldstein Cc: GNU C Library , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3024.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2022 02:46:19 -0000 On Mon, Jun 6, 2022 at 3:37 PM Noah Goldstein wrote: > > The RTM vzeroupper mitigation has no way of replacing inline > vzeroupper not before a return. > > This can be useful when hoisting a vzeroupper to save code size > for example: > > ``` > L(foo): > cmpl %eax, %edx > jz L(bar) > tzcntl %eax, %eax > addq %rdi, %rax > VZEROUPPER_RETURN > > L(bar): > xorl %eax, %eax > VZEROUPPER_RETURN > ``` > > Can become: > > ``` > L(foo): > COND_VZEROUPPER > cmpl %eax, %edx > jz L(bar) > tzcntl %eax, %eax > addq %rdi, %rax > ret > > L(bar): > xorl %eax, %eax > ret > ``` > > This code does not change any existing functionality. > > There is no difference in the objdump of libc.so before and after this > patch. > --- > sysdeps/x86_64/multiarch/avx-rtm-vecs.h | 1 + > sysdeps/x86_64/sysdep.h | 18 ++++++++++++++++++ > 2 files changed, 19 insertions(+) > > diff --git a/sysdeps/x86_64/multiarch/avx-rtm-vecs.h b/sysdeps/x86_64/multiarch/avx-rtm-vecs.h > index 3f531dd47f..6ca9f5e6ba 100644 > --- a/sysdeps/x86_64/multiarch/avx-rtm-vecs.h > +++ b/sysdeps/x86_64/multiarch/avx-rtm-vecs.h > @@ -20,6 +20,7 @@ > #ifndef _AVX_RTM_VECS_H > #define _AVX_RTM_VECS_H 1 > > +#define COND_VZEROUPPER COND_VZEROUPPER_XTEST > #define ZERO_UPPER_VEC_REGISTERS_RETURN \ > ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST > > diff --git a/sysdeps/x86_64/sysdep.h b/sysdeps/x86_64/sysdep.h > index f14d50786d..4f512d5566 100644 > --- a/sysdeps/x86_64/sysdep.h > +++ b/sysdeps/x86_64/sysdep.h > @@ -106,6 +106,24 @@ lose: \ > vzeroupper; \ > ret > > +/* Can be used to replace vzeroupper that is not directly before a > + return. This is useful when hoisting a vzeroupper from multiple > + return paths to decrease the total number of vzerouppers and code > + size. */ > +#define COND_VZEROUPPER_XTEST \ > + xtest; \ > + jz 1f; \ > + vzeroall; \ > + jmp 2f; \ > +1: \ > + vzeroupper; \ > +2: > + > +/* In RTM define this as COND_VZEROUPPER_XTEST. */ > +#ifndef COND_VZEROUPPER > +# define COND_VZEROUPPER vzeroupper > +#endif > + > /* Zero upper vector registers and return. */ > #ifndef ZERO_UPPER_VEC_REGISTERS_RETURN > # define ZERO_UPPER_VEC_REGISTERS_RETURN \ > -- > 2.34.1 > LGTM. Reviewed-by: H.J. Lu Thanks. -- H.J.