From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by sourceware.org (Postfix) with ESMTPS id 43EBE3857BAA for ; Sun, 29 May 2022 21:44:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 43EBE3857BAA Received: by mail-pf1-x42f.google.com with SMTP id y199so8996532pfb.9 for ; Sun, 29 May 2022 14:44:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=V20SRkhJtZXdNZJiPdY+/4F+TLYjvckPgj0SUz6YVZk=; b=Ew3D7bBqxR9zUKss3AElCQPCD1koacsnMDHs9A9uXbyUeiD9VUSyY0O1jiWj+pfA3/ TikB/OdSKuYXZf9f+mwg2nUJHfCSFUMzyWakW3+KhXo1o7yfQur6FV+/qeklZLYuoJgK gqhK3ckp1fSyhStKhGuQAq4oji0XpEB4jyaZa3vq+iC39SVW2uLlKWLhl5aLi184bsTi 6G7xBUGwTPm+9yaWSF7Ktvqh1zYDu+s5mYLAmTMJnXSH4pQmNIFA7tkxfGVgnsoQMnzM h2MDfMdqotWSnOhE87YymI60QSOuYLqtiz9wfnqRxzrppN0L0RHNbAPYvz+dBbEhylP8 FtvA== X-Gm-Message-State: AOAM531buTbRfH4FvD8RfTDJjDQ8GT0O4X0GNU+lcQjAZ883N9lM23aF +KF/X3/7NFrHl+FhHvnrZlPn1rxP2RPPOlSXumg= X-Google-Smtp-Source: ABdhPJytKE8Lnb0d79+45rDZ4Lo2xHIemp5inP3erUJKJDCq16yWo1Kum74GFq48VcHyV1XJu2mpJVZnmCojnzX7Hl0= X-Received: by 2002:a05:6a00:2d0:b0:518:95f8:d00a with SMTP id b16-20020a056a0002d000b0051895f8d00amr37780466pft.8.1653860666030; Sun, 29 May 2022 14:44:26 -0700 (PDT) MIME-Version: 1.0 References: <20220521030120.1977551-1-hjl.tools@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Sun, 29 May 2022 14:43:50 -0700 Message-ID: Subject: Re: [PATCH v3] DSE: Use the constant store source if possible To: Jeff Law Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3025.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 May 2022 21:44:29 -0000 On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches wrote: > > > > On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote: > > On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote: > >> "H.J. Lu" writes: > >>> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford > >>> wrote: > >>>> "H.J. Lu via Gcc-patches" writes: > >>>>> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote: > >>>>>> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches > >>>>>> wrote: > >>>>>>> When recording store for RTL dead store elimination, check if the= source > >>>>>>> register is set only once to a constant. If yes, record the cons= tant > >>>>>>> as the store source. It eliminates unrolled zero stores after me= mset 0 > >>>>>>> in a loop where a vector register is used as the zero store sourc= e. > >>>>>>> > >>>>>>> gcc/ > >>>>>>> > >>>>>>> PR rtl-optimization/105638 > >>>>>>> * dse.cc (record_store): Use the constant source if the = source > >>>>>>> register is set only once. > >>>>>>> > >>>>>>> gcc/testsuite/ > >>>>>>> > >>>>>>> PR rtl-optimization/105638 > >>>>>>> * g++.target/i386/pr105638.C: New test. > >>>>>>> --- > >>>>>>> gcc/dse.cc | 19 ++++++++++ > >>>>>>> gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++= +++++++ > >>>>>>> 2 files changed, 63 insertions(+) > >>>>>>> create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C > >>>>>>> > >>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc > >>>>>>> index 30c11cee034..0433dd3d846 100644 > >>>>>>> --- a/gcc/dse.cc > >>>>>>> +++ b/gcc/dse.cc > >>>>>>> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info) > >>>>>>> > >>>>>>> if (tem && CONSTANT_P (tem)) > >>>>>>> const_rhs =3D tem; > >>>>>>> + else > >>>>>>> + { > >>>>>>> + /* If RHS is set only once to a constant, set CONST= _RHS > >>>>>>> + to the constant. */ > >>>>>>> + df_ref def =3D DF_REG_DEF_CHAIN (REGNO (rhs)); > >>>>>>> + if (def !=3D nullptr > >>>>>>> + && !DF_REF_IS_ARTIFICIAL (def) > >>>>>>> + && !DF_REF_NEXT_REG (def)) > >>>>>>> + { > >>>>>>> + rtx_insn *def_insn =3D DF_REF_INSN (def); > >>>>>>> + rtx def_body =3D PATTERN (def_insn); > >>>>>>> + if (GET_CODE (def_body) =3D=3D SET) > >>>>>>> + { > >>>>>>> + rtx def_src =3D SET_SRC (def_body); > >>>>>>> + if (CONSTANT_P (def_src)) > >>>>>>> + const_rhs =3D def_src; > >>>>>> doesn't DSE have its own tracking of stored values? Shouldn't we > >>>>> It tracks stored values only within the basic block. When RTL loop > >>>>> invariant motion hoists a constant initialization out of the loop i= nto > >>>>> a separate basic block, the constant store value becomes unknown > >>>>> within the original basic block. > >>>>> > >>>>>> improve _that_ if it is not enough? I also wonder if you need to > >>>>> My patch extends DSE stored value tracking to include the constant = which > >>>>> is set only once in another basic block. > >>>>> > >>>>>> verify the SET isn't partial? > >>>>>> > >>>>> Here is the v2 patch to check that the constant is set by a non-par= tial > >>>>> unconditional load. > >>>>> > >>>>> OK for master? > >>>>> > >>>>> Thanks. > >>>>> > >>>>> H.J. > >>>>> --- > >>>>> RTL DSE tracks redundant constant stores within a basic block. Whe= n RTL > >>>>> loop invariant motion hoists a constant initialization out of the l= oop > >>>>> into a separate basic block, the constant store value becomes unkno= wn > >>>>> within the original basic block. When recording store for RTL DSE,= check > >>>>> if the source register is set only once to a constant by a non-part= ial > >>>>> unconditional load. If yes, record the constant as the constant st= ore > >>>>> source. It eliminates unrolled zero stores after memset 0 in a loo= p > >>>>> where a vector register is used as the zero store source. > >>>>> > >>>>> gcc/ > >>>>> > >>>>> PR rtl-optimization/105638 > >>>>> * dse.cc (record_store): Use the constant source if the sour= ce > >>>>> register is set only once. > >>>>> > >>>>> gcc/testsuite/ > >>>>> > >>>>> PR rtl-optimization/105638 > >>>>> * g++.target/i386/pr105638.C: New test. > >>>>> --- > >>>>> gcc/dse.cc | 22 ++++++++++++ > >>>>> gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++++= +++++ > >>>>> 2 files changed, 66 insertions(+) > >>>>> create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C > >>>>> > >>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc > >>>>> index 30c11cee034..af8e88dac32 100644 > >>>>> --- a/gcc/dse.cc > >>>>> +++ b/gcc/dse.cc > >>>>> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info) > >>>>> > >>>>> if (tem && CONSTANT_P (tem)) > >>>>> const_rhs =3D tem; > >>>>> + else > >>>>> + { > >>>>> + /* If RHS is set only once to a constant, set CONST_RHS > >>>>> + to the constant. */ > >>>>> + df_ref def =3D DF_REG_DEF_CHAIN (REGNO (rhs)); > >>>>> + if (def !=3D nullptr > >>>>> + && !DF_REF_IS_ARTIFICIAL (def) > >>>>> + && !(DF_REF_FLAGS (def) > >>>>> + & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)) > >>>>> + && !DF_REF_NEXT_REG (def)) > >>>> Can we introduce a helper for this? There are already similar tests > >>>> in ira and loop-iv, and it seems a bit too complex to have to open-c= ode > >>>> each time. > >>> I can use find_single_def_src in loop-iv.cc: > >>> > >>> /* If REGNO has a single definition, return its known value, otherwis= e return > >>> null. */ > >>> > >>> rtx > >>> find_single_def_src (unsigned int regno) > >> Yeah, reusing that sounds good. Perhaps we should move it into df-cor= e.cc, > >> alongside the df_reg_used group of functions. > >> > >> I think the mode check in your original patch should be kept though, > >> so how about we change the parameter to an rtx reg and use rtx_equal i= n: > >> > >> rtx set =3D single_set (DF_REF_INSN (adef)); > >> if (set =3D=3D NULL || !REG_P (SET_DEST (set)) > >> || REGNO (SET_DEST (set)) !=3D regno) > >> return NULL_RTX; > > Fixed. > > > >> rather than the existing !REG_P and REGNO checks. We should also add: > >> > >>> and do > >>> > >>> /* If RHS is set only once to a constant, set CONST_RHS > >>> to the constant. */ > >>> rtx def_src =3D find_single_def_src (REGNO (rhs)); > >>> if (def_src !=3D nullptr && CONSTANT_P (def_src)) > >>> { > >>> df_ref def =3D DF_REG_DEF_CHAIN (REGNO (rhs)); > >>> if (!(DF_REF_FLAGS (def) > >>> & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))) > >> =E2=80=A6this check to the function, since it's needed for correctness= even > >> in the loop-iv.cc use. > > Fixed. > > > >> Thanks, > >> Richard > > Here is the v3 patch. OK for master? > > > > Thanks. > > > > H.J. > > --- > > RTL DSE tracks redundant constant stores within a basic block. When RT= L > > loop invariant motion hoists a constant initialization out of the loop > > into a separate basic block, the constant store value becomes unknown > > within the original basic block. When recording store for RTL DSE, che= ck > > if the source register is set only once to a constant by a non-partial > > unconditional load. If yes, record the constant as the constant store > > source. It eliminates unrolled zero stores after memset 0 in a loop > > where a vector register is used as the zero store source. > > > > Extract find_single_def_src from loop-iv.cc and move it to df-core.cc: > > > > 1. Rename to df_find_single_def_src. > > 2. Change the argument to rtx and use rtx_equal_p. > > 3. Return null for partial or conditional defs. > > > > gcc/ > > > > PR rtl-optimization/105638 > > * df-core.cc (df_find_single_def_sr): Moved and renamed from > > find_single_def_src in loop-iv.cc. Change the argument to rtx > > and use rtx_equal_p. Return null for partial or conditional > > defs. > > * df.h (df_find_single_def_src): New prototype. > > * dse.cc (record_store): Use the constant source if the source > > register is set only once. > > * loop-iv.cc (find_single_def_src): Moved to df-core.cc. > > (replace_single_def_regs): Replace find_single_def_src with > > df_find_single_def_src. > > > > gcc/testsuite/ > > > > PR rtl-optimization/105638 > > * g++.target/i386/pr105638.C: New test. > Avoiding REG_EQUAL and only handling REG_EQUIV notes would be better > here. REG_EQUIV indicates the destination could be replaced with the > source of the note at any point and semantically the code would still be > valid. REG_EQUAL doesn't give us that guarantee. > > To allow REG_EQUAL you have to check that the block with the note > dominates the use. When a use only has one non-conditional def which doesn't dominate the use, isn't its behavior undefined? --=20 H.J.