From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by sourceware.org (Postfix) with ESMTPS id 1A5593858C39 for ; Thu, 16 Sep 2021 02:22:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1A5593858C39 Received: by mail-pg1-x52f.google.com with SMTP id f129so4738177pgc.1 for ; Wed, 15 Sep 2021 19:22:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uaqVsTNzcTdF82S9QiCXh3kSIxzS5cvpjGZmAGfdP/M=; b=HZKGYISOZc8oNadjSNhuJJC67Gox59johqZbxgL0nPqitsOWNinAUTpBxelJhQaxhQ tD587G0k7xYwKSjJwDkFbKTMnBFx/lgv4d6kpbD7454Si0Zaw9qh2n/I7oK05pz9VUxq 8kyXpBFJZq9AWfGqTLIb04wRor85Wdqr7dAhTkbB5+Vy3Urkj5twKysaXVLG2g4yGTYy igphovwe3phG6s6cto2nbofpzy0+8uPpcLidSkOoOOPZEDUOveOL1Hzjzdy+GHs7pv7C GsI9QXSw6+TVzHsRJq3Uu2rLka4ZeGHq/+3wqVy6K+9/VxGtGnYDKMhlJVAv6Xm2VQCJ v0ig== X-Gm-Message-State: AOAM5336cB/Nq8yiSa0x/X/M8V+gKg4hhaifDTCKB3SAKX6HpKOYUpfk gFyXFiBBd9G8Fhs3439aTPfk6L58ykKxL6NuX4w= X-Google-Smtp-Source: ABdhPJwxXtRF4r1PrfxUI7RTwVy+sPMPFhIp6qkG/Ft6DMuK/QMSc5rHoxk8lN239LWW5XmXXcZDVaVEx9pgqpDGIXs= X-Received: by 2002:a63:7702:: with SMTP id s2mr2760831pgc.75.1631758935979; Wed, 15 Sep 2021 19:22:15 -0700 (PDT) MIME-Version: 1.0 References: <20210915080951.10362-1-lili.cui@intel.com> <20210915080951.10362-5-lili.cui@intel.com> In-Reply-To: From: "H.J. Lu" Date: Wed, 15 Sep 2021 19:21:40 -0700 Message-ID: Subject: Re: [PATCH 4/4] [PATCH 4/4] x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY To: "Cui, Lili" Cc: Uros Bizjak , GCC Patches , "Liu, Hongtao" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3030.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2021 02:22:19 -0000 On Wed, Sep 15, 2021 at 4:54 PM Cui, Lili wrote: > > > > > -----Original Message----- > > From: H.J. Lu > > Sent: Wednesday, September 15, 2021 10:14 PM > > To: Cui, Lili > > Cc: Uros Bizjak ; GCC Patches > patches@gcc.gnu.org>; Liu, Hongtao > > Subject: Re: [PATCH 4/4] [PATCH 4/4] x86: Add > > TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY > > > > There is no need to add [PATCH N/4] in the first line of the git commit > > message. "git format-patch" or "git send-email" will add them automatically. > > > Thanks for the reminder, I didn't notice it before. > > > On Wed, Sep 15, 2021 at 1:10 AM wrote: > > > > > > From: "H.J. Lu" > > > > > > 1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with > > > TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP > > splitters. > > > 2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with > > > TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT to FP > > splitters. > > > 3. Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY > > and > > > TARGET_SSE_PARTIAL_REG_DEPENDENCY when handling > > avx_partial_xmm_update > > > attribute. Don't convert AVX partial XMM register update if there is > > > no partial SSE register dependency for SSE conversion. > > > > > > gcc/ > > > > > > * config/i386/i386-features.c (remove_partial_avx_dependency): > > > Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY > > and > > > and TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY before > > generating > > > vxorps. > > > * config/i386/i386.h > > (TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): > > > New. > > > (TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise. > > > * config/i386/i386.md (SSE FP to FP splitters): Replace > > > TARGET_SSE_PARTIAL_REG_DEPENDENCY with > > > TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY. > > > (SSE INT to FP splitter): Replace > > TARGET_SSE_PARTIAL_REG_DEPENDENCY > > > with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY. > > > * config/i386/x86-tune.def > > > (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New. > > > (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise. > > > > > > gcc/testsuite/ > > > > > > * gcc.target/i386/avx-covert-1.c: New file. > > > * gcc.target/i386/avx-fp-covert-1.c: Likewise. > > > * gcc.target/i386/avx-int-covert-1.c: Likewise. > > > * gcc.target/i386/sse-covert-1.c: Likewise. > > > * gcc.target/i386/sse-fp-covert-1.c: Likewise. > > > * gcc.target/i386/sse-int-covert-1.c: Likewise. > > > --- > > > gcc/config/i386/i386-features.c | 6 ++++-- > > > gcc/config/i386/i386.h | 4 ++++ > > > gcc/config/i386/i386.md | 9 ++++++--- > > > gcc/config/i386/x86-tune.def | 15 +++++++++++++++ > > > gcc/testsuite/gcc.target/i386/avx-covert-1.c | 19 +++++++++++++++++++ > > > .../gcc.target/i386/avx-fp-covert-1.c | 15 +++++++++++++++ > > > .../gcc.target/i386/avx-int-covert-1.c | 14 ++++++++++++++ > > > gcc/testsuite/gcc.target/i386/sse-covert-1.c | 19 +++++++++++++++++++ > > > .../gcc.target/i386/sse-fp-covert-1.c | 15 +++++++++++++++ > > > .../gcc.target/i386/sse-int-covert-1.c | 14 ++++++++++++++ > > > 10 files changed, 125 insertions(+), 5 deletions(-) create mode > > > 100644 gcc/testsuite/gcc.target/i386/avx-covert-1.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/avx-int-covert-1.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/sse-covert-1.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/sse-int-covert-1.c > > > > > > diff --git a/gcc/config/i386/i386-features.c > > > b/gcc/config/i386/i386-features.c index ae5ea02a002..91bfa06d4bf > > > 100644 > > > --- a/gcc/config/i386/i386-features.c > > > +++ b/gcc/config/i386/i386-features.c > > > @@ -2218,14 +2218,16 @@ remove_partial_avx_dependency (void) > > > machine_mode dest_mode = GET_MODE (dest); > > > machine_mode src_mode; > > > > > > - if (TARGET_USE_VECTOR_FP_CONVERTS) > > > + if (TARGET_USE_VECTOR_FP_CONVERTS > > > + || !TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY) > > > { > > > src_mode = GET_MODE (XEXP (src, 0)); > > > if (src_mode == E_SFmode || src_mode == E_DFmode) > > > continue; > > > } > > > > > > - if (TARGET_USE_VECTOR_CONVERTS) > > > + if (TARGET_USE_VECTOR_CONVERTS > > > + || !TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY) > > > { > > > src_mode = GET_MODE (XEXP (src, 0)); > > > if (src_mode == E_SImode || src_mode == E_DImode) diff > > > --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index > > > e76bb55c080..ec60b89753e 100644 > > > --- a/gcc/config/i386/i386.h > > > +++ b/gcc/config/i386/i386.h > > > @@ -334,6 +334,10 @@ extern unsigned char > > ix86_tune_features[X86_TUNE_LAST]; > > > ix86_tune_features[X86_TUNE_PARTIAL_REG_DEPENDENCY] > > > #define TARGET_SSE_PARTIAL_REG_DEPENDENCY \ > > > ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY] > > > +#define TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY \ > > > + > > > > > +ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDE > > NCY] > > > +#define TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY \ > > > + > > > > > +ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENC > > Y] > > > #define TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ > > > ix86_tune_features[X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL] > > > #define TARGET_SSE_UNALIGNED_STORE_OPTIMAL \ diff --git > > > a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index > > > 13f6f57cdcc..c82a9dc1f67 100644 > > > --- a/gcc/config/i386/i386.md > > > +++ b/gcc/config/i386/i386.md > > > @@ -4535,7 +4535,8 @@ > > > (float_extend:DF > > > (match_operand:SF 1 "nonimmediate_operand")))] > > > "!TARGET_AVX > > > - && TARGET_SSE_PARTIAL_REG_DEPENDENCY && epilogue_completed > > > + && TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY > > > + && epilogue_completed > > > && optimize_function_for_speed_p (cfun) > > > && (!REG_P (operands[1]) > > > || (!TARGET_AVX && REGNO (operands[0]) != REGNO > > > (operands[1]))) @@ -4708,7 +4709,8 @@ > > > (float_truncate:SF > > > (match_operand:DF 1 "nonimmediate_operand")))] > > > "!TARGET_AVX > > > - && TARGET_SSE_PARTIAL_REG_DEPENDENCY && epilogue_completed > > > + && TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY > > > + && epilogue_completed > > > && optimize_function_for_speed_p (cfun) > > > && (!REG_P (operands[1]) > > > || (!TARGET_AVX && REGNO (operands[0]) != REGNO > > > (operands[1]))) @@ -5243,7 +5245,8 @@ > > > [(set (match_operand:MODEF 0 "sse_reg_operand") > > > (float:MODEF (match_operand:SWI48 1 "nonimmediate_operand")))] > > > "!TARGET_AVX > > > - && TARGET_SSE_PARTIAL_REG_DEPENDENCY && epilogue_completed > > > + && TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY > > > + && epilogue_completed > > > && optimize_function_for_speed_p (cfun) > > > && (!EXT_REX_SSE_REG_P (operands[0]) > > > || TARGET_AVX512VL)" > > > diff --git a/gcc/config/i386/x86-tune.def > > > b/gcc/config/i386/x86-tune.def index 088edb6c4ca..58e8ead56b4 100644 > > > --- a/gcc/config/i386/x86-tune.def > > > +++ b/gcc/config/i386/x86-tune.def > > > @@ -64,6 +64,21 @@ DEF_TUNE > > (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, "sse_partial_reg_dependency", > > > m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | > > m_AMDFAM10 > > > | m_BDVER | m_ZNVER | m_TREMONT | m_GENERIC) > > > > > > +/* X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY: This knob > > avoids > > > + partial write to the destination in scalar SSE conversion from FP > > > + to FP. */ > > > +DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY, > > > + "sse_partial_reg_fp_converts_dependency", > > > + m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | > > m_AMDFAM10 > > > + | m_BDVER | m_ZNVER | m_GENERIC) > > > > I thought we wanted to enable this for Tremont. > > > From the latest test, enabling Tremont here will cause a 2.8% regression to 538.imagic_r. I see. Thanks. > Thanks, > Lili. > > > > + > > > +/* X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY: This knob > > avoids partial > > > + write to the destination in scalar SSE conversion from integer to > > > +FP. */ DEF_TUNE > > (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY, > > > + "sse_partial_reg_converts_dependency", > > > + m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | > > m_AMDFAM10 > > > + | m_BDVER | m_ZNVER | m_GENERIC) > > > + > > > /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and > > dependencies > > > are resolved on SSE register parts instead of whole registers, so we may > > > maintain just lower part of scalar values in proper format leaving > > > the > > > > -- > > H.J. -- H.J.