From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 73C9E3858C20 for ; Wed, 3 Aug 2022 09:09:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 73C9E3858C20 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AAD7913D5; Wed, 3 Aug 2022 02:09:17 -0700 (PDT) Received: from [10.57.13.163] (unknown [10.57.13.163]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7D7B13F67D; Wed, 3 Aug 2022 02:09:16 -0700 (PDT) Message-ID: Date: Wed, 3 Aug 2022 10:09:14 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH v2] cselib: add function to check if SET is redundant [PR106187] Content-Language: en-GB To: Jeff Law , gcc-patches@gcc.gnu.org References: <9e9b7ec5-4c10-115d-8b6d-a7a07bfd72be@gmail.com> <5df3ffd0-9a15-f4ad-325b-1507107a3504@foss.arm.com> <7a7a8864-49d9-b0f0-604b-ec5d98466200@gmail.com> From: Richard Earnshaw In-Reply-To: <7a7a8864-49d9-b0f0-604b-ec5d98466200@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3489.1 required=5.0 tests=BAYES_00, BODY_8BITS, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Aug 2022 09:09:19 -0000 On 03/08/2022 00:36, Jeff Law wrote: > > > On 8/2/2022 10:06 AM, Richard Earnshaw wrote: >> >> >> On 01/08/2022 11:38, Richard Earnshaw via Gcc-patches wrote: >>> >>> >>> On 30/07/2022 20:57, Jeff Law via Gcc-patches wrote: >>>> >>>> >>>> On 7/29/2022 7:52 AM, Richard Earnshaw via Gcc-patches wrote: >>>>> A SET operation that writes memory may have the same value as an >>>>> earlier store but if the alias sets of the new and earlier store do >>>>> not conflict then the set is not truly redundant.  This can happen, >>>>> for example, if objects of different types share a stack slot. >>>>> >>>>> To fix this we define a new function in cselib that first checks for >>>>> equality and if that is successful then finds the earlier store in the >>>>> value history and checks the alias sets. >>>>> >>>>> The routine is used in two places elsewhere in the compiler. Firstly >>>>> in cfgcleanup and secondly in postreload. >>>>> >>>>> gcc/ChangeLog: >>>>>     * alias.h (mems_same_for_tbaa_p): Declare. >>>>>     * alias.cc (mems_same_for_tbaa_p): New function. >>>>>     * dse.cc (record_store): Use it instead of open-coding >>>>>     alias check. >>>>>     * cselib.h (cselib_redundant_set_p): Declare. >>>>>     * cselib.cc: Include alias.h >>>>>     (cselib_redundant_set_p): New function. >>>>>     * cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead >>>>>     of rtx_equal_for_cselib_p. >>>>>     * postreload.c (reload_cse_simplify): Use cselib_redundant_set_p. >>>>>     (reload_cse_noop_set_p): Delete. >>>> Seems quite reasonable.   The only question I would have would be >>>> whether or not you considered including the aliasing info into the >>>> hashing used by cselib.  You'd probably still need the bulk of this >>>> patch as well since we could presumably still get a hash conflict >>>> with two stores of the same value to the same location, but with >>>> different alias sets (it's just much less likely), so perhaps it >>>> doesn't really buy us anything. >>> >>> I thought about this, but if the alias set were included in the hash, >>> then surely you'd get every alias set in a different value.  Then >>> you'd miss the cases where the alias sets do conflict even though >>> they are not the same.  Anyway, the values /are/ the same so in some >>> circumstances you might want to know that. >>> >>>> >>>> Ideally this would include a testcase.  You might be able to turn >>>> that non-executawble reduced case into something useful by scanning >>>> the post-reload dumps. >>> >>> I considered this as well, but the testcase I have is far too >>> fragile, I think.  The existing test only fails on Arm, only fails on >>> 11.2 (not 11.3 or gcc-12 onwards), relies on two objects with the >>> same value being in distinct alias sets but still assigned to the >>> same stack slot and for some copy dance to end up trying to write >>> back the original value to the same slot but with a non-conflicting >>> set.  And finally, the scheduler has to then try to move a load past >>> the non-aliasing store. >> >> >>> >>> To get anywhere close to this I think we'd need something akin to the >>> gimple reader but for RTL so that we could set up all the conditions >>> for the failure without the risk of an earlier transform blowing the >>> test away. >> >> I wasn't aware of the rtl reader already in the compiler.  But it >> doesn't really get me any closer as it is lacking in so many regards: >> >> - It can't handle (const_double:SF ...) - it tries to handle the >> argument as an int.  This is a consequence, I think, of the reader >> being based on that for reading machine descriptions where FP >> const_double is simply never encountered. >> >> - It doesn't seem to handle anything much more than very basic types, >> and in particular appears to have no way of ensuring that alias sets >> match up with the type system. >> >>> >>> I even considered whether we could start with a gimple dump and >>> bypassing all the tree/gimple transformations, but even that would be >>> still at the mercy of the stack-slot allocation algorithm. >> >> I spent a while trying to get some gimple out of the dumpers in a form >> that was usable, but that's pretty much a non-starter.  To make it >> work we'd need to add support for gimple clobbers on objects - without >> that there's no way to get the stack-slot sharing code to work. >> Furthermore, even feeding fully-optimized gimple directly into expand >> is such a long way from the postreload pass, that I can't believe the >> testcase would remain stable for long. >> >> And the other major issue is that the original testcase is heavily >> templated C++ and neither of the parsers gimple or rtl is supported in >> cc1plus: converting the boilerplate to be C-friendly is probably going >> to be hard. >> >> I can't afford to spend much more time on this, especially given the >> low-quality test we're going to get out of the end of the process. > Understood.  Let's just go with the patch as-is.  That's normal for > cases where we can't produce a reasonable test. > Thanks, committed to trunk. Will work on backports if it doesn't throw up any issues in the next few days. R. > jeff