From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id CCF7C382FADF for ; Fri, 28 Jun 2024 14:01:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CCF7C382FADF Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CCF7C382FADF Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719583294; cv=none; b=cgWoUIgdK8t/x3Z6IPqwGbuCEOEXXFP70Szae0HVuxC/SePdLU6PS0VOGaJNhcVzum9Qz/bY5KYRGE92E4StxCuEoWyOxX3ozGy0Y0i+uVlTpPC1rm78V+T0zxDP6tdE+D5CxiMl3gSOtNf0XJ6CPocZn4VRXbn7fr1NCd0A9AQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719583294; c=relaxed/simple; bh=knjTZw+BT/BKiU90Uh9PKPc7QvpPdiVQRDBK7zx4X/w=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=nIUFVdQQTtCL8LktOHhjBhNv+hLCwITS/ABrvDKEmL06PW2cJPQ8Ayan5lPaPcDhbBcS/liMwXNbAJBPMR1HVlDJIhtd3x8k8wJ0grEbwqkhgH0mbvVWLGUMQmvl+76UkbmcVanVJ8rMdEPdrNRulCT2IkwYaG204SzSGd6n9P0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6EB44106F; Fri, 28 Jun 2024 07:01:57 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 782003F6A8; Fri, 28 Jun 2024 07:01:31 -0700 (PDT) From: Richard Sandiford To: Thomas Schwinge Mail-Followup-To: Thomas Schwinge ,jlaw@ventanamicro.com, rdapp.gcc@gmail.com, gcc-patches@gcc.gnu.org, Tom de Vries , Roger Sayle , richard.sandiford@arm.com Cc: jlaw@ventanamicro.com, rdapp.gcc@gmail.com, gcc-patches@gcc.gnu.org, Tom de Vries , Roger Sayle Subject: Re: nvptx vs. [PATCH] Add a late-combine pass [PR106594] In-Reply-To: <87ed8i2ekt.fsf@euler.schwinge.ddns.net> (Thomas Schwinge's message of "Fri, 28 Jun 2024 00:41:54 +0200") References: <87r0citjoy.fsf@euler.schwinge.ddns.net> <87r0ci2kt2.fsf@euler.schwinge.ddns.net> <87jzia2ict.fsf@euler.schwinge.ddns.net> <87ed8i2ekt.fsf@euler.schwinge.ddns.net> Date: Fri, 28 Jun 2024 15:01:30 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-13.6 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Thomas Schwinge writes: > Hi! > > On 2024-06-27T23:20:18+0200, I wrote: >> On 2024-06-27T22:27:21+0200, I wrote: >>> On 2024-06-27T18:49:17+0200, I wrote: >>>> On 2023-10-24T19:49:10+0100, Richard Sandiford wrote: >>>>> This patch adds a combine pass that runs late in the pipeline. >>> >>> [After sending, I realized I replied to a previous thread of this work.] >>> >>>> I've beek looking a bit through recent nvptx target code generation >>>> changes for GCC target libraries, and thought I'd also share here my >>>> findings for the "late-combine" changes in isolation, for nvptx target. >>>> >>>> First the unexpected thing: >>> >>> So much for "unexpected thing" -- next level of unexpected here... >>> Appreciated if anyone feels like helping me find my way through this, but >>> I totally understand if you've got other things to do. >> >> OK, I found something already. (Unexpectedly quickly...) ;-) >> >>>> there are a few cases where we now see unused >>>> registers get declared > >> But in fact, for both cases > > Now tested: 's%both%all'. :-) > >> the unexpected difference goes away if after >> 'pass_late_combine' I inject a 'pass_fast_rtl_dce'. That's normally run >> as part of 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' -- but that's >> all not active for nvptx target given '!reload_completed', given nvptx is >> 'targetm.no_register_allocation'. Maybe we need to enable a few more >> passes, or is there anything in 'pass_late_combine' to change, so that we >> don't run into this? Does it inadvertently mark registers live or >> something like that? > > Basically, is 'pass_late_combine' potentionally doing things that depend > on later clean-up? (..., or shouldn't it be doing these things in the > first place?) It's possible that late-combine could expose dead code, but I imagine it's a niche case. I had a look at the nvptx logs from my comparison, and the cases in which I saw this seemed to be those where late-combine doesn't find anything to do. Does that match your examples? Specifically, the effect should be the same with -fdbg-cnt=late_combine:0-0 I think what's happening is that: - combine exposes dead code - ce2 previously ran df_analyze with DF_LR_RUN_DCE set, and so cleared up the dead code - late-combine instead runs df_analyze without that flag (since late-combine itself doesn't really care whether dead code is present) - if late-combine doesn't do anything, ce2's df_analyze call has nothing to do, and skips even the DCE The easiest fix would be to add: df_set_flags (DF_LR_RUN_DCE); before df_analyze in late-combine.cc, so that it behaves like ce2. But the arrangement feels wrong. I would have expected DF_LR_RUN_DCE to depend on whether df_analyze had been called since the last DCE pass (whether DF_LR_RUN_DCE or a full DCE). Thanks, Richard