From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by sourceware.org (Postfix) with ESMTPS id C27CA3857805 for ; Sat, 28 May 2022 18:37:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C27CA3857805 Received: by mail-pf1-x42a.google.com with SMTP id c65so6519500pfb.1 for ; Sat, 28 May 2022 11:37:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=yh/F0C5MyGiKSXSluhX4+L8j6+DrkcMVzSvBP9IhnWY=; b=GgWJt5lKIEGQYCk5AvXoWbC2AX/G0ZuZzLr3S4VllCABlFC95zMz0avFdvA39JjLmk sXRHfbPAU/E7kjkgzvbZGVS/YnM4zN+y3/OWHjKnIWUcrEZFHL9xbqJ3uzb1BIaZY4UC pDyjoS5GYdOKQdL1HhswmRFAogE5dqIqwp1sokGqA1eX6pq3WR9j1lxwPZblM65THNcX abhrbj09M4RcIcusukb0v5MnBboe9M1sz4+ZxD8C98frDmqnPrFgZVlFw5alvFtnUxsM GDmCP9qaGIZuhU21rWiDQN5y3tZhV6byidtwEV7pouGWzeWkQVRxexf+SV+KTUj2WP8U 5F7w== X-Gm-Message-State: AOAM532BNRG15vXOraVqE+XF2pGt0M6AkdnItONsmEfmRdeGTXyKwnBc AVon2eWVgD2DJDv7pKJhaBRv+Zv85+sFsQ== X-Google-Smtp-Source: ABdhPJwjYiuWapFcMrDvyyg9rz8C37x4LJTT4PQOg12znrhz1Xanku3H0Gf+6EvfQhpgIUH2W5KIZw== X-Received: by 2002:a05:6a00:b4d:b0:518:ae48:a2bb with SMTP id p13-20020a056a000b4d00b00518ae48a2bbmr28613440pfo.2.1653763034337; Sat, 28 May 2022 11:37:14 -0700 (PDT) Received: from [172.31.0.204] (c-73-63-24-84.hsd1.ut.comcast.net. [73.63.24.84]) by smtp.gmail.com with ESMTPSA id i12-20020a170902cf0c00b0015e8d4eb28fsm6136333plg.217.2022.05.28.11.37.13 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 28 May 2022 11:37:13 -0700 (PDT) Message-ID: Date: Sat, 28 May 2022 12:37:12 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH v3] DSE: Use the constant store source if possible Content-Language: en-US To: gcc-patches@gcc.gnu.org References: <20220521030120.1977551-1-hjl.tools@gmail.com> From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 May 2022 18:37:18 -0000 On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote: > On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote: >> "H.J. Lu" writes: >>> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford >>> wrote: >>>> "H.J. Lu via Gcc-patches" writes: >>>>> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote: >>>>>> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches >>>>>> wrote: >>>>>>> When recording store for RTL dead store elimination, check if the source >>>>>>> register is set only once to a constant. If yes, record the constant >>>>>>> as the store source. It eliminates unrolled zero stores after memset 0 >>>>>>> in a loop where a vector register is used as the zero store source. >>>>>>> >>>>>>> gcc/ >>>>>>> >>>>>>> PR rtl-optimization/105638 >>>>>>> * dse.cc (record_store): Use the constant source if the source >>>>>>> register is set only once. >>>>>>> >>>>>>> gcc/testsuite/ >>>>>>> >>>>>>> PR rtl-optimization/105638 >>>>>>> * g++.target/i386/pr105638.C: New test. >>>>>>> --- >>>>>>> gcc/dse.cc | 19 ++++++++++ >>>>>>> gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++ >>>>>>> 2 files changed, 63 insertions(+) >>>>>>> create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C >>>>>>> >>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc >>>>>>> index 30c11cee034..0433dd3d846 100644 >>>>>>> --- a/gcc/dse.cc >>>>>>> +++ b/gcc/dse.cc >>>>>>> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info) >>>>>>> >>>>>>> if (tem && CONSTANT_P (tem)) >>>>>>> const_rhs = tem; >>>>>>> + else >>>>>>> + { >>>>>>> + /* If RHS is set only once to a constant, set CONST_RHS >>>>>>> + to the constant. */ >>>>>>> + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs)); >>>>>>> + if (def != nullptr >>>>>>> + && !DF_REF_IS_ARTIFICIAL (def) >>>>>>> + && !DF_REF_NEXT_REG (def)) >>>>>>> + { >>>>>>> + rtx_insn *def_insn = DF_REF_INSN (def); >>>>>>> + rtx def_body = PATTERN (def_insn); >>>>>>> + if (GET_CODE (def_body) == SET) >>>>>>> + { >>>>>>> + rtx def_src = SET_SRC (def_body); >>>>>>> + if (CONSTANT_P (def_src)) >>>>>>> + const_rhs = def_src; >>>>>> doesn't DSE have its own tracking of stored values? Shouldn't we >>>>> It tracks stored values only within the basic block. When RTL loop >>>>> invariant motion hoists a constant initialization out of the loop into >>>>> a separate basic block, the constant store value becomes unknown >>>>> within the original basic block. >>>>> >>>>>> improve _that_ if it is not enough? I also wonder if you need to >>>>> My patch extends DSE stored value tracking to include the constant which >>>>> is set only once in another basic block. >>>>> >>>>>> verify the SET isn't partial? >>>>>> >>>>> Here is the v2 patch to check that the constant is set by a non-partial >>>>> unconditional load. >>>>> >>>>> OK for master? >>>>> >>>>> Thanks. >>>>> >>>>> H.J. >>>>> --- >>>>> RTL DSE tracks redundant constant stores within a basic block. When RTL >>>>> loop invariant motion hoists a constant initialization out of the loop >>>>> into a separate basic block, the constant store value becomes unknown >>>>> within the original basic block. When recording store for RTL DSE, check >>>>> if the source register is set only once to a constant by a non-partial >>>>> unconditional load. If yes, record the constant as the constant store >>>>> source. It eliminates unrolled zero stores after memset 0 in a loop >>>>> where a vector register is used as the zero store source. >>>>> >>>>> gcc/ >>>>> >>>>> PR rtl-optimization/105638 >>>>> * dse.cc (record_store): Use the constant source if the source >>>>> register is set only once. >>>>> >>>>> gcc/testsuite/ >>>>> >>>>> PR rtl-optimization/105638 >>>>> * g++.target/i386/pr105638.C: New test. >>>>> --- >>>>> gcc/dse.cc | 22 ++++++++++++ >>>>> gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++ >>>>> 2 files changed, 66 insertions(+) >>>>> create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C >>>>> >>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc >>>>> index 30c11cee034..af8e88dac32 100644 >>>>> --- a/gcc/dse.cc >>>>> +++ b/gcc/dse.cc >>>>> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info) >>>>> >>>>> if (tem && CONSTANT_P (tem)) >>>>> const_rhs = tem; >>>>> + else >>>>> + { >>>>> + /* If RHS is set only once to a constant, set CONST_RHS >>>>> + to the constant. */ >>>>> + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs)); >>>>> + if (def != nullptr >>>>> + && !DF_REF_IS_ARTIFICIAL (def) >>>>> + && !(DF_REF_FLAGS (def) >>>>> + & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)) >>>>> + && !DF_REF_NEXT_REG (def)) >>>> Can we introduce a helper for this? There are already similar tests >>>> in ira and loop-iv, and it seems a bit too complex to have to open-code >>>> each time. >>> I can use find_single_def_src in loop-iv.cc: >>> >>> /* If REGNO has a single definition, return its known value, otherwise return >>> null. */ >>> >>> rtx >>> find_single_def_src (unsigned int regno) >> Yeah, reusing that sounds good. Perhaps we should move it into df-core.cc, >> alongside the df_reg_used group of functions. >> >> I think the mode check in your original patch should be kept though, >> so how about we change the parameter to an rtx reg and use rtx_equal in: >> >> rtx set = single_set (DF_REF_INSN (adef)); >> if (set == NULL || !REG_P (SET_DEST (set)) >> || REGNO (SET_DEST (set)) != regno) >> return NULL_RTX; > Fixed. > >> rather than the existing !REG_P and REGNO checks. We should also add: >> >>> and do >>> >>> /* If RHS is set only once to a constant, set CONST_RHS >>> to the constant. */ >>> rtx def_src = find_single_def_src (REGNO (rhs)); >>> if (def_src != nullptr && CONSTANT_P (def_src)) >>> { >>> df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs)); >>> if (!(DF_REF_FLAGS (def) >>> & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))) >> …this check to the function, since it's needed for correctness even >> in the loop-iv.cc use. > Fixed. > >> Thanks, >> Richard > Here is the v3 patch. OK for master? > > Thanks. > > H.J. > --- > RTL DSE tracks redundant constant stores within a basic block. When RTL > loop invariant motion hoists a constant initialization out of the loop > into a separate basic block, the constant store value becomes unknown > within the original basic block. When recording store for RTL DSE, check > if the source register is set only once to a constant by a non-partial > unconditional load. If yes, record the constant as the constant store > source. It eliminates unrolled zero stores after memset 0 in a loop > where a vector register is used as the zero store source. > > Extract find_single_def_src from loop-iv.cc and move it to df-core.cc: > > 1. Rename to df_find_single_def_src. > 2. Change the argument to rtx and use rtx_equal_p. > 3. Return null for partial or conditional defs. > > gcc/ > > PR rtl-optimization/105638 > * df-core.cc (df_find_single_def_sr): Moved and renamed from > find_single_def_src in loop-iv.cc. Change the argument to rtx > and use rtx_equal_p. Return null for partial or conditional > defs. > * df.h (df_find_single_def_src): New prototype. > * dse.cc (record_store): Use the constant source if the source > register is set only once. > * loop-iv.cc (find_single_def_src): Moved to df-core.cc. > (replace_single_def_regs): Replace find_single_def_src with > df_find_single_def_src. > > gcc/testsuite/ > > PR rtl-optimization/105638 > * g++.target/i386/pr105638.C: New test. Avoiding REG_EQUAL and only handling REG_EQUIV notes would be better here.  REG_EQUIV indicates the destination could be replaced with the source of the note at any point and semantically the code would still be valid.  REG_EQUAL doesn't give us that guarantee. To allow REG_EQUAL you have to check that the block with the note dominates the use. Jeff