From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id CE5513852748 for ; Tue, 2 Aug 2022 16:06:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CE5513852748 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 109411477; Tue, 2 Aug 2022 09:06:05 -0700 (PDT) Received: from [10.2.79.48] (unknown [10.2.79.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E6B2B3F70D; Tue, 2 Aug 2022 09:06:03 -0700 (PDT) Message-ID: Date: Tue, 2 Aug 2022 17:06:02 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH v2] cselib: add function to check if SET is redundant [PR106187] Content-Language: en-GB To: Jeff Law , gcc-patches@gcc.gnu.org References: <9e9b7ec5-4c10-115d-8b6d-a7a07bfd72be@gmail.com> <5df3ffd0-9a15-f4ad-325b-1507107a3504@foss.arm.com> From: Richard Earnshaw In-Reply-To: <5df3ffd0-9a15-f4ad-325b-1507107a3504@foss.arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3489.2 required=5.0 tests=BAYES_00, BODY_8BITS, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Aug 2022 16:06:07 -0000 On 01/08/2022 11:38, Richard Earnshaw via Gcc-patches wrote: > > > On 30/07/2022 20:57, Jeff Law via Gcc-patches wrote: >> >> >> On 7/29/2022 7:52 AM, Richard Earnshaw via Gcc-patches wrote: >>> A SET operation that writes memory may have the same value as an >>> earlier store but if the alias sets of the new and earlier store do >>> not conflict then the set is not truly redundant.  This can happen, >>> for example, if objects of different types share a stack slot. >>> >>> To fix this we define a new function in cselib that first checks for >>> equality and if that is successful then finds the earlier store in the >>> value history and checks the alias sets. >>> >>> The routine is used in two places elsewhere in the compiler. Firstly >>> in cfgcleanup and secondly in postreload. >>> >>> gcc/ChangeLog: >>>     * alias.h (mems_same_for_tbaa_p): Declare. >>>     * alias.cc (mems_same_for_tbaa_p): New function. >>>     * dse.cc (record_store): Use it instead of open-coding >>>     alias check. >>>     * cselib.h (cselib_redundant_set_p): Declare. >>>     * cselib.cc: Include alias.h >>>     (cselib_redundant_set_p): New function. >>>     * cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead >>>     of rtx_equal_for_cselib_p. >>>     * postreload.c (reload_cse_simplify): Use cselib_redundant_set_p. >>>     (reload_cse_noop_set_p): Delete. >> Seems quite reasonable.   The only question I would have would be >> whether or not you considered including the aliasing info into the >> hashing used by cselib.  You'd probably still need the bulk of this >> patch as well since we could presumably still get a hash conflict with >> two stores of the same value to the same location, but with different >> alias sets (it's just much less likely), so perhaps it doesn't really >> buy us anything. > > I thought about this, but if the alias set were included in the hash, > then surely you'd get every alias set in a different value.  Then you'd > miss the cases where the alias sets do conflict even though they are not > the same.  Anyway, the values /are/ the same so in some circumstances > you might want to know that. > >> >> Ideally this would include a testcase.  You might be able to turn that >> non-executawble reduced case into something useful by scanning the >> post-reload dumps. > > I considered this as well, but the testcase I have is far too fragile, I > think.  The existing test only fails on Arm, only fails on 11.2 (not > 11.3 or gcc-12 onwards), relies on two objects with the same value being > in distinct alias sets but still assigned to the same stack slot and for > some copy dance to end up trying to write back the original value to the > same slot but with a non-conflicting set.  And finally, the scheduler > has to then try to move a load past the non-aliasing store. > > To get anywhere close to this I think we'd need something akin to the > gimple reader but for RTL so that we could set up all the conditions for > the failure without the risk of an earlier transform blowing the test away. I wasn't aware of the rtl reader already in the compiler. But it doesn't really get me any closer as it is lacking in so many regards: - It can't handle (const_double:SF ...) - it tries to handle the argument as an int. This is a consequence, I think, of the reader being based on that for reading machine descriptions where FP const_double is simply never encountered. - It doesn't seem to handle anything much more than very basic types, and in particular appears to have no way of ensuring that alias sets match up with the type system. > > I even considered whether we could start with a gimple dump and > bypassing all the tree/gimple transformations, but even that would be > still at the mercy of the stack-slot allocation algorithm. I spent a while trying to get some gimple out of the dumpers in a form that was usable, but that's pretty much a non-starter. To make it work we'd need to add support for gimple clobbers on objects - without that there's no way to get the stack-slot sharing code to work. Furthermore, even feeding fully-optimized gimple directly into expand is such a long way from the postreload pass, that I can't believe the testcase would remain stable for long. And the other major issue is that the original testcase is heavily templated C++ and neither of the parsers gimple or rtl is supported in cc1plus: converting the boilerplate to be C-friendly is probably going to be hard. I can't afford to spend much more time on this, especially given the low-quality test we're going to get out of the end of the process. > >> >> Jeff > > R. R.