From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by sourceware.org (Postfix) with ESMTPS id 88E163858032 for ; Tue, 22 Feb 2022 14:22:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 88E163858032 Received: by mail-pf1-x429.google.com with SMTP id x18so12188765pfh.5 for ; Tue, 22 Feb 2022 06:22:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pHbLelTvOio62H8GNxv1Lfaju2SeXc2eZvjewaWw8ZY=; b=txo/tTPD8rfVqgRNRTnDLJFrb/ys4sCEL38Rssa1SU+ssykRLuBf90Emb74x6ZAFoE PSF3qC8ZUD3sKgEmX103VQudt+47PJwcVG08mSb8wG+6h1Didehrs5xmm80JEwHOY33J 96c2JWcT079AoD0FoDvMomXOl12Hib2mPC1psgi45SfGwoLhZv8ms35xyMghp4za6oPi rGkdpWi/+JQvdi94p9JE2Je6X2lAfonrPSHLJAI/BExomDsPvD5BcAUxmXKhCVFMnLLj y+BnrtBVRcoHdTw3VbU8TnwGoOkrg8eAPVX/k7lpufNFegpP7igNDugPJES6dT8P6uvT YM8A== X-Gm-Message-State: AOAM5311Kzkn9q4tFenEmbkoV8z9Ib2eFPFH1aVaSD0YxajrY2oAqPd+ qPz5MSoZz59e/fLR2I+ZBx0hRMOAUmIWJV4zPSKDatPCZm4= X-Google-Smtp-Source: ABdhPJxtzn6fl4xsTnV1lW2KUfjppptkB8EJ23CYoypaqYQd9H6fNLjk2bd4z6Qr3Y2e5Nvm81fRklN4rY3FyQBexU0= X-Received: by 2002:a63:e249:0:b0:36c:4f1f:95e0 with SMTP id y9-20020a63e249000000b0036c4f1f95e0mr20094397pgj.381.1645539745933; Tue, 22 Feb 2022 06:22:25 -0800 (PST) MIME-Version: 1.0 References: <20220217042628.133306-1-hjl.tools@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Tue, 22 Feb 2022 06:21:50 -0800 Message-ID: Subject: Re: [PATCH v2] x86: Add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO To: Hongtao Liu Cc: Uros Bizjak , liuhongt , GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3027.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Feb 2022 14:22:29 -0000 On Mon, Feb 21, 2022 at 6:43 PM Hongtao Liu wrote: > > On Tue, Feb 22, 2022 at 2:35 AM H.J. Lu wrote: > > > > On Sun, Feb 20, 2022 at 6:01 PM Hongtao Liu wrote: > > > > > > On Thu, Feb 17, 2022 at 9:56 PM H.J. Lu wrote: > > > > > > > > On Thu, Feb 17, 2022 at 08:51:31AM +0100, Uros Bizjak wrote: > > > > > On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches > > > > > wrote: > > > > > > > > > > > > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches > > > > > > wrote: > > > > > > > > > > > > > > Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride, > > > > > > > Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX > > > > > > > transition penalty. Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER to > > > > > > > generate vzeroupper instruction after loading all-zero YMM/YMM registers > > > > > > > and enable it by default. > > > > > > Shouldn't TARGET_READ_ZERO_YMM_ZMM_NONEED_VZEROUPPER sounds a bit smoother? > > > > > > Because originally we needed to add vzeroupper to all avx<->sse cases, > > > > > > now it's a tune to indicate that we don't need to add it in some > > > > > > > > > > Perhaps we should go from the other side and use > > > > > X86_TUNE_OPTIMIZE_AVX_READ for new processors? > > > > > > > > > > > > > Here is the v2 patch to add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO. > > > > > > > The patch LGTM in general, but please rebase against > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590541.html > > > and resend the patch, also wait a couple days in case Uros(and others) > > > have any comments. > > > > I am dropping my patch since it causes the compile-time regression. > I think only vextractif128 part is reverted, but we still have > vmovdqu(below) which should also cause penalty? commit fe79d652c96b53384ddfa43e312cb0010251391b Author: Richard Biener Date: Thu Feb 17 14:40:16 2022 +0100 target/104581 - compile-time regression in mode-switching has diff --git a/gcc/testsuite/gcc.target/i386/pr101456-1.c b/gcc/testsuite/gcc.target/i386/pr101456-1.c index 803fc6e0207..7fb3a3f055c 100644 --- a/gcc/testsuite/gcc.target/i386/pr101456-1.c +++ b/gcc/testsuite/gcc.target/i386/pr101456-1.c @@ -30,4 +30,5 @@ foo3 (void) bar (); } -/* { dg-final { scan-assembler-not "vzeroupper" } } */ +/* See PR104581 for the XFAIL reason. */ +/* { dg-final { scan-assembler-not "vzeroupper" { xfail *-*-* } } } */ and I checked in: commit 1931cbad498e625b1e24452dcfffe02539b12224 Author: H.J. Lu Date: Fri Feb 18 10:36:53 2022 -0800 pieces-memset-21.c: Expect vzeroupper for ia32 Update gcc.target/i386/pieces-memset-21.c to expect vzeroupper for ia32 caused by commit fe79d652c96b53384ddfa43e312cb0010251391b Author: Richard Biener Date: Thu Feb 17 14:40:16 2022 +0100 target/104581 - compile-time regression in mode-switching PR target/104581 * gcc.target/i386/pieces-memset-21.c: Expect vzeroupper for ia32. I believe that vmovdqu is also covered. -- H.J.