From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 219BA3815FFD for ; Mon, 30 May 2022 08:28:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 219BA3815FFD Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C7450113E; Mon, 30 May 2022 01:28:52 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3AC703F73D; Mon, 30 May 2022 01:28:52 -0700 (PDT) From: Richard Sandiford To: Jeff Law via Gcc-patches Mail-Followup-To: Jeff Law via Gcc-patches , "H.J. Lu" , Jeff Law , richard.sandiford@arm.com Subject: Re: [PATCH v3] DSE: Use the constant store source if possible References: <20220521030120.1977551-1-hjl.tools@gmail.com> <82a25496-2a00-4876-7a57-df09c983f2fe@gmail.com> Date: Mon, 30 May 2022 09:28:50 +0100 In-Reply-To: <82a25496-2a00-4876-7a57-df09c983f2fe@gmail.com> (Jeff Law via Gcc-patches's message of "Sun, 29 May 2022 16:55:04 -0600") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 May 2022 08:28:55 -0000 Jeff Law via Gcc-patches writes: > On 5/29/2022 3:43 PM, H.J. Lu wrote: >> On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches >> wrote: >>> >>> >>> On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote: >>>> On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote: >>>>> "H.J. Lu" writes: >>>>>> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford >>>>>> wrote: >>>>>>> "H.J. Lu via Gcc-patches" writes: >>>>>>>> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote: >>>>>>>>> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches >>>>>>>>> wrote: >>>>>>>>>> When recording store for RTL dead store elimination, check if th= e source >>>>>>>>>> register is set only once to a constant. If yes, record the con= stant >>>>>>>>>> as the store source. It eliminates unrolled zero stores after m= emset 0 >>>>>>>>>> in a loop where a vector register is used as the zero store sour= ce. >>>>>>>>>> >>>>>>>>>> gcc/ >>>>>>>>>> >>>>>>>>>> PR rtl-optimization/105638 >>>>>>>>>> * dse.cc (record_store): Use the constant source if th= e source >>>>>>>>>> register is set only once. >>>>>>>>>> >>>>>>>>>> gcc/testsuite/ >>>>>>>>>> >>>>>>>>>> PR rtl-optimization/105638 >>>>>>>>>> * g++.target/i386/pr105638.C: New test. >>>>>>>>>> --- >>>>>>>>>> gcc/dse.cc | 19 ++++++++++ >>>>>>>>>> gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++= +++++++++ >>>>>>>>>> 2 files changed, 63 insertions(+) >>>>>>>>>> create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C >>>>>>>>>> >>>>>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc >>>>>>>>>> index 30c11cee034..0433dd3d846 100644 >>>>>>>>>> --- a/gcc/dse.cc >>>>>>>>>> +++ b/gcc/dse.cc >>>>>>>>>> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info) >>>>>>>>>> >>>>>>>>>> if (tem && CONSTANT_P (tem)) >>>>>>>>>> const_rhs =3D tem; >>>>>>>>>> + else >>>>>>>>>> + { >>>>>>>>>> + /* If RHS is set only once to a constant, set CONS= T_RHS >>>>>>>>>> + to the constant. */ >>>>>>>>>> + df_ref def =3D DF_REG_DEF_CHAIN (REGNO (rhs)); >>>>>>>>>> + if (def !=3D nullptr >>>>>>>>>> + && !DF_REF_IS_ARTIFICIAL (def) >>>>>>>>>> + && !DF_REF_NEXT_REG (def)) >>>>>>>>>> + { >>>>>>>>>> + rtx_insn *def_insn =3D DF_REF_INSN (def); >>>>>>>>>> + rtx def_body =3D PATTERN (def_insn); >>>>>>>>>> + if (GET_CODE (def_body) =3D=3D SET) >>>>>>>>>> + { >>>>>>>>>> + rtx def_src =3D SET_SRC (def_body); >>>>>>>>>> + if (CONSTANT_P (def_src)) >>>>>>>>>> + const_rhs =3D def_src; >>>>>>>>> doesn't DSE have its own tracking of stored values? Shouldn't we >>>>>>>> It tracks stored values only within the basic block. When RTL loop >>>>>>>> invariant motion hoists a constant initialization out of the loop = into >>>>>>>> a separate basic block, the constant store value becomes unknown >>>>>>>> within the original basic block. >>>>>>>> >>>>>>>>> improve _that_ if it is not enough? I also wonder if you need to >>>>>>>> My patch extends DSE stored value tracking to include the constant= which >>>>>>>> is set only once in another basic block. >>>>>>>> >>>>>>>>> verify the SET isn't partial? >>>>>>>>> >>>>>>>> Here is the v2 patch to check that the constant is set by a non-pa= rtial >>>>>>>> unconditional load. >>>>>>>> >>>>>>>> OK for master? >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> H.J. >>>>>>>> --- >>>>>>>> RTL DSE tracks redundant constant stores within a basic block. Wh= en RTL >>>>>>>> loop invariant motion hoists a constant initialization out of the = loop >>>>>>>> into a separate basic block, the constant store value becomes unkn= own >>>>>>>> within the original basic block. When recording store for RTL DSE= , check >>>>>>>> if the source register is set only once to a constant by a non-par= tial >>>>>>>> unconditional load. If yes, record the constant as the constant s= tore >>>>>>>> source. It eliminates unrolled zero stores after memset 0 in a lo= op >>>>>>>> where a vector register is used as the zero store source. >>>>>>>> >>>>>>>> gcc/ >>>>>>>> >>>>>>>> PR rtl-optimization/105638 >>>>>>>> * dse.cc (record_store): Use the constant source if the so= urce >>>>>>>> register is set only once. >>>>>>>> >>>>>>>> gcc/testsuite/ >>>>>>>> >>>>>>>> PR rtl-optimization/105638 >>>>>>>> * g++.target/i386/pr105638.C: New test. >>>>>>>> --- >>>>>>>> gcc/dse.cc | 22 ++++++++++++ >>>>>>>> gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++= +++++++ >>>>>>>> 2 files changed, 66 insertions(+) >>>>>>>> create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C >>>>>>>> >>>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc >>>>>>>> index 30c11cee034..af8e88dac32 100644 >>>>>>>> --- a/gcc/dse.cc >>>>>>>> +++ b/gcc/dse.cc >>>>>>>> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info) >>>>>>>> >>>>>>>> if (tem && CONSTANT_P (tem)) >>>>>>>> const_rhs =3D tem; >>>>>>>> + else >>>>>>>> + { >>>>>>>> + /* If RHS is set only once to a constant, set CONST_RHS >>>>>>>> + to the constant. */ >>>>>>>> + df_ref def =3D DF_REG_DEF_CHAIN (REGNO (rhs)); >>>>>>>> + if (def !=3D nullptr >>>>>>>> + && !DF_REF_IS_ARTIFICIAL (def) >>>>>>>> + && !(DF_REF_FLAGS (def) >>>>>>>> + & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)) >>>>>>>> + && !DF_REF_NEXT_REG (def)) >>>>>>> Can we introduce a helper for this? There are already similar tests >>>>>>> in ira and loop-iv, and it seems a bit too complex to have to open-= code >>>>>>> each time. >>>>>> I can use find_single_def_src in loop-iv.cc: >>>>>> >>>>>> /* If REGNO has a single definition, return its known value, otherwi= se return >>>>>> null. */ >>>>>> >>>>>> rtx >>>>>> find_single_def_src (unsigned int regno) >>>>> Yeah, reusing that sounds good. Perhaps we should move it into df-co= re.cc, >>>>> alongside the df_reg_used group of functions. >>>>> >>>>> I think the mode check in your original patch should be kept though, >>>>> so how about we change the parameter to an rtx reg and use rtx_equal = in: >>>>> >>>>> rtx set =3D single_set (DF_REF_INSN (adef)); >>>>> if (set =3D=3D NULL || !REG_P (SET_DEST (set)) >>>>> || REGNO (SET_DEST (set)) !=3D regno) >>>>> return NULL_RTX; >>>> Fixed. >>>> >>>>> rather than the existing !REG_P and REGNO checks. We should also add: >>>>> >>>>>> and do >>>>>> >>>>>> /* If RHS is set only once to a constant, set CONST_R= HS >>>>>> to the constant. */ >>>>>> rtx def_src =3D find_single_def_src (REGNO (rhs)); >>>>>> if (def_src !=3D nullptr && CONSTANT_P (def_src)) >>>>>> { >>>>>> df_ref def =3D DF_REG_DEF_CHAIN (REGNO (rhs)); >>>>>> if (!(DF_REF_FLAGS (def) >>>>>> & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))) >>>>> =E2=80=A6this check to the function, since it's needed for correctnes= s even >>>>> in the loop-iv.cc use. >>>> Fixed. >>>> >>>>> Thanks, >>>>> Richard >>>> Here is the v3 patch. OK for master? >>>> >>>> Thanks. >>>> >>>> H.J. >>>> --- >>>> RTL DSE tracks redundant constant stores within a basic block. When R= TL >>>> loop invariant motion hoists a constant initialization out of the loop >>>> into a separate basic block, the constant store value becomes unknown >>>> within the original basic block. When recording store for RTL DSE, ch= eck >>>> if the source register is set only once to a constant by a non-partial >>>> unconditional load. If yes, record the constant as the constant store >>>> source. It eliminates unrolled zero stores after memset 0 in a loop >>>> where a vector register is used as the zero store source. >>>> >>>> Extract find_single_def_src from loop-iv.cc and move it to df-core.cc: >>>> >>>> 1. Rename to df_find_single_def_src. >>>> 2. Change the argument to rtx and use rtx_equal_p. >>>> 3. Return null for partial or conditional defs. >>>> >>>> gcc/ >>>> >>>> PR rtl-optimization/105638 >>>> * df-core.cc (df_find_single_def_sr): Moved and renamed from >>>> find_single_def_src in loop-iv.cc. Change the argument to rtx >>>> and use rtx_equal_p. Return null for partial or conditional >>>> defs. >>>> * df.h (df_find_single_def_src): New prototype. >>>> * dse.cc (record_store): Use the constant source if the source >>>> register is set only once. >>>> * loop-iv.cc (find_single_def_src): Moved to df-core.cc. >>>> (replace_single_def_regs): Replace find_single_def_src with >>>> df_find_single_def_src. >>>> >>>> gcc/testsuite/ >>>> >>>> PR rtl-optimization/105638 >>>> * g++.target/i386/pr105638.C: New test. >>> Avoiding REG_EQUAL and only handling REG_EQUIV notes would be better >>> here. REG_EQUIV indicates the destination could be replaced with the >>> source of the note at any point and semantically the code would still be >>> valid. REG_EQUAL doesn't give us that guarantee. >>> >>> To allow REG_EQUAL you have to check that the block with the note >>> dominates the use. >> When a use only has one non-conditional def which doesn't dominate >> the use, isn't its behavior undefined? > I think so, even for an irreducible loop.=C2=A0 Even so, it's the safest= =20 > thing to do, particularly if someone tries to extend this code in the=20 > future. If any use of R was upwards exposed, R would have an artificial definition in the entry block in addition to any =E2=80=9Creal=E2=80=9D def= inition. Since we're checking for exactly one definition in total, it should follow that the definition dominates all uses. Thanks, Richard