From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by sourceware.org (Postfix) with ESMTPS id 907B73857B95 for ; Sun, 29 May 2022 22:55:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 907B73857B95 Received: by mail-pj1-x102b.google.com with SMTP id w2-20020a17090ac98200b001e0519fe5a8so9083467pjt.4 for ; Sun, 29 May 2022 15:55:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=QDRuUSODPhajNjlqOfsqL/WlU8rG7efnklyOrNs2R+E=; b=JnzNfWbqsvuQpUoZqjSu90rgzAtMtARzmhzDc84+tfAObpLU7kvnqOD01r6+91EkmL 7thmmU50n37NZIaze8rtYR36Tue6BZ+r8wzp9Zjrn2t5vhCFWafd4oDxP03l5XPfrb0p EndX0U9zQBnDm1QZRdcD/tDvC6S4Nq3f9XWby4ya7lwevzi7NQOmpEXNfJ/lJeVSu2HJ TbCuPlijoo5eFvl2JOEfbcIeUjMOvqWbY+CZz1jHHtMHgmpGdNKJaY7es7gsyEHbLTeZ gPJMxa+gNkRnpDFhZvy9zmSOyuk2ZCbvTLcOZrUc6Kz9lfpgKxEw21BdgapXAIkCV/ks NRdg== X-Gm-Message-State: AOAM532qOW3/Iq5+srR7vwHCxlwUfyGerX18+lzG154mxgNbt4lx1fYb yNgYrufn0tctSbr4PAT6Kds= X-Google-Smtp-Source: ABdhPJy9oeZ3KcQECHSf9NZejacBTx6feh9iqzHXeWGKEXPchVtErEykwqP+WCqRd92ZzoCBlvjN7w== X-Received: by 2002:a17:902:e745:b0:163:5074:6c2f with SMTP id p5-20020a170902e74500b0016350746c2fmr25487395plf.140.1653864906218; Sun, 29 May 2022 15:55:06 -0700 (PDT) Received: from [172.31.0.204] (c-73-63-24-84.hsd1.ut.comcast.net. [73.63.24.84]) by smtp.gmail.com with ESMTPSA id y139-20020a626491000000b0051844a64d3dsm7416367pfb.25.2022.05.29.15.55.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 29 May 2022 15:55:05 -0700 (PDT) Message-ID: <82a25496-2a00-4876-7a57-df09c983f2fe@gmail.com> Date: Sun, 29 May 2022 16:55:04 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH v3] DSE: Use the constant store source if possible Content-Language: en-US To: "H.J. Lu" Cc: GCC Patches References: <20220521030120.1977551-1-hjl.tools@gmail.com> From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 May 2022 22:55:09 -0000 On 5/29/2022 3:43 PM, H.J. Lu wrote: > On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches > wrote: >> >> >> On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote: >>> On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote: >>>> "H.J. Lu" writes: >>>>> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford >>>>> wrote: >>>>>> "H.J. Lu via Gcc-patches" writes: >>>>>>> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote: >>>>>>>> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches >>>>>>>> wrote: >>>>>>>>> When recording store for RTL dead store elimination, check if the source >>>>>>>>> register is set only once to a constant. If yes, record the constant >>>>>>>>> as the store source. It eliminates unrolled zero stores after memset 0 >>>>>>>>> in a loop where a vector register is used as the zero store source. >>>>>>>>> >>>>>>>>> gcc/ >>>>>>>>> >>>>>>>>> PR rtl-optimization/105638 >>>>>>>>> * dse.cc (record_store): Use the constant source if the source >>>>>>>>> register is set only once. >>>>>>>>> >>>>>>>>> gcc/testsuite/ >>>>>>>>> >>>>>>>>> PR rtl-optimization/105638 >>>>>>>>> * g++.target/i386/pr105638.C: New test. >>>>>>>>> --- >>>>>>>>> gcc/dse.cc | 19 ++++++++++ >>>>>>>>> gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++ >>>>>>>>> 2 files changed, 63 insertions(+) >>>>>>>>> create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C >>>>>>>>> >>>>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc >>>>>>>>> index 30c11cee034..0433dd3d846 100644 >>>>>>>>> --- a/gcc/dse.cc >>>>>>>>> +++ b/gcc/dse.cc >>>>>>>>> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info) >>>>>>>>> >>>>>>>>> if (tem && CONSTANT_P (tem)) >>>>>>>>> const_rhs = tem; >>>>>>>>> + else >>>>>>>>> + { >>>>>>>>> + /* If RHS is set only once to a constant, set CONST_RHS >>>>>>>>> + to the constant. */ >>>>>>>>> + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs)); >>>>>>>>> + if (def != nullptr >>>>>>>>> + && !DF_REF_IS_ARTIFICIAL (def) >>>>>>>>> + && !DF_REF_NEXT_REG (def)) >>>>>>>>> + { >>>>>>>>> + rtx_insn *def_insn = DF_REF_INSN (def); >>>>>>>>> + rtx def_body = PATTERN (def_insn); >>>>>>>>> + if (GET_CODE (def_body) == SET) >>>>>>>>> + { >>>>>>>>> + rtx def_src = SET_SRC (def_body); >>>>>>>>> + if (CONSTANT_P (def_src)) >>>>>>>>> + const_rhs = def_src; >>>>>>>> doesn't DSE have its own tracking of stored values? Shouldn't we >>>>>>> It tracks stored values only within the basic block. When RTL loop >>>>>>> invariant motion hoists a constant initialization out of the loop into >>>>>>> a separate basic block, the constant store value becomes unknown >>>>>>> within the original basic block. >>>>>>> >>>>>>>> improve _that_ if it is not enough? I also wonder if you need to >>>>>>> My patch extends DSE stored value tracking to include the constant which >>>>>>> is set only once in another basic block. >>>>>>> >>>>>>>> verify the SET isn't partial? >>>>>>>> >>>>>>> Here is the v2 patch to check that the constant is set by a non-partial >>>>>>> unconditional load. >>>>>>> >>>>>>> OK for master? >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> H.J. >>>>>>> --- >>>>>>> RTL DSE tracks redundant constant stores within a basic block. When RTL >>>>>>> loop invariant motion hoists a constant initialization out of the loop >>>>>>> into a separate basic block, the constant store value becomes unknown >>>>>>> within the original basic block. When recording store for RTL DSE, check >>>>>>> if the source register is set only once to a constant by a non-partial >>>>>>> unconditional load. If yes, record the constant as the constant store >>>>>>> source. It eliminates unrolled zero stores after memset 0 in a loop >>>>>>> where a vector register is used as the zero store source. >>>>>>> >>>>>>> gcc/ >>>>>>> >>>>>>> PR rtl-optimization/105638 >>>>>>> * dse.cc (record_store): Use the constant source if the source >>>>>>> register is set only once. >>>>>>> >>>>>>> gcc/testsuite/ >>>>>>> >>>>>>> PR rtl-optimization/105638 >>>>>>> * g++.target/i386/pr105638.C: New test. >>>>>>> --- >>>>>>> gcc/dse.cc | 22 ++++++++++++ >>>>>>> gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++ >>>>>>> 2 files changed, 66 insertions(+) >>>>>>> create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C >>>>>>> >>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc >>>>>>> index 30c11cee034..af8e88dac32 100644 >>>>>>> --- a/gcc/dse.cc >>>>>>> +++ b/gcc/dse.cc >>>>>>> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info) >>>>>>> >>>>>>> if (tem && CONSTANT_P (tem)) >>>>>>> const_rhs = tem; >>>>>>> + else >>>>>>> + { >>>>>>> + /* If RHS is set only once to a constant, set CONST_RHS >>>>>>> + to the constant. */ >>>>>>> + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs)); >>>>>>> + if (def != nullptr >>>>>>> + && !DF_REF_IS_ARTIFICIAL (def) >>>>>>> + && !(DF_REF_FLAGS (def) >>>>>>> + & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)) >>>>>>> + && !DF_REF_NEXT_REG (def)) >>>>>> Can we introduce a helper for this? There are already similar tests >>>>>> in ira and loop-iv, and it seems a bit too complex to have to open-code >>>>>> each time. >>>>> I can use find_single_def_src in loop-iv.cc: >>>>> >>>>> /* If REGNO has a single definition, return its known value, otherwise return >>>>> null. */ >>>>> >>>>> rtx >>>>> find_single_def_src (unsigned int regno) >>>> Yeah, reusing that sounds good. Perhaps we should move it into df-core.cc, >>>> alongside the df_reg_used group of functions. >>>> >>>> I think the mode check in your original patch should be kept though, >>>> so how about we change the parameter to an rtx reg and use rtx_equal in: >>>> >>>> rtx set = single_set (DF_REF_INSN (adef)); >>>> if (set == NULL || !REG_P (SET_DEST (set)) >>>> || REGNO (SET_DEST (set)) != regno) >>>> return NULL_RTX; >>> Fixed. >>> >>>> rather than the existing !REG_P and REGNO checks. We should also add: >>>> >>>>> and do >>>>> >>>>> /* If RHS is set only once to a constant, set CONST_RHS >>>>> to the constant. */ >>>>> rtx def_src = find_single_def_src (REGNO (rhs)); >>>>> if (def_src != nullptr && CONSTANT_P (def_src)) >>>>> { >>>>> df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs)); >>>>> if (!(DF_REF_FLAGS (def) >>>>> & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))) >>>> …this check to the function, since it's needed for correctness even >>>> in the loop-iv.cc use. >>> Fixed. >>> >>>> Thanks, >>>> Richard >>> Here is the v3 patch. OK for master? >>> >>> Thanks. >>> >>> H.J. >>> --- >>> RTL DSE tracks redundant constant stores within a basic block. When RTL >>> loop invariant motion hoists a constant initialization out of the loop >>> into a separate basic block, the constant store value becomes unknown >>> within the original basic block. When recording store for RTL DSE, check >>> if the source register is set only once to a constant by a non-partial >>> unconditional load. If yes, record the constant as the constant store >>> source. It eliminates unrolled zero stores after memset 0 in a loop >>> where a vector register is used as the zero store source. >>> >>> Extract find_single_def_src from loop-iv.cc and move it to df-core.cc: >>> >>> 1. Rename to df_find_single_def_src. >>> 2. Change the argument to rtx and use rtx_equal_p. >>> 3. Return null for partial or conditional defs. >>> >>> gcc/ >>> >>> PR rtl-optimization/105638 >>> * df-core.cc (df_find_single_def_sr): Moved and renamed from >>> find_single_def_src in loop-iv.cc. Change the argument to rtx >>> and use rtx_equal_p. Return null for partial or conditional >>> defs. >>> * df.h (df_find_single_def_src): New prototype. >>> * dse.cc (record_store): Use the constant source if the source >>> register is set only once. >>> * loop-iv.cc (find_single_def_src): Moved to df-core.cc. >>> (replace_single_def_regs): Replace find_single_def_src with >>> df_find_single_def_src. >>> >>> gcc/testsuite/ >>> >>> PR rtl-optimization/105638 >>> * g++.target/i386/pr105638.C: New test. >> Avoiding REG_EQUAL and only handling REG_EQUIV notes would be better >> here. REG_EQUIV indicates the destination could be replaced with the >> source of the note at any point and semantically the code would still be >> valid. REG_EQUAL doesn't give us that guarantee. >> >> To allow REG_EQUAL you have to check that the block with the note >> dominates the use. > When a use only has one non-conditional def which doesn't dominate > the use, isn't its behavior undefined? I think so, even for an irreducible loop.  Even so, it's the safest thing to do, particularly if someone tries to extend this code in the future. jeff