From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nh504-vm13.bullet.mail.kks.yahoo.co.jp (nh504-vm13.bullet.mail.kks.yahoo.co.jp [183.79.57.99]) by sourceware.org (Postfix) with SMTP id 2C36338582AE for ; Thu, 4 Aug 2022 12:35:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2C36338582AE Received: from [183.79.100.138] by nh504.bullet.mail.kks.yahoo.co.jp with NNFMP; 04 Aug 2022 12:35:15 -0000 Received: from [183.79.100.134] by t501.bullet.mail.kks.yahoo.co.jp with NNFMP; 04 Aug 2022 12:35:15 -0000 Received: from [127.0.0.1] by omp503.mail.kks.yahoo.co.jp with NNFMP; 04 Aug 2022 12:35:15 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 423939.58678.bm@omp503.mail.kks.yahoo.co.jp Received: (qmail 66636 invoked by alias); 4 Aug 2022 12:35:15 -0000 Received: from unknown (HELO ?192.168.2.3?) (175.177.45.184 with ) by smtp6009.mail.ssk.ynwp.yahoo.co.jp with SMTP; 4 Aug 2022 12:35:15 -0000 X-YMail-JAS: 8CNEpIsVM1kctD1MEoCksOJ_yUU4rOnidWYvmJTv2UjeItYNd9aC9we.MvNMmpQ7mgfpQpiJhyWBrFjsXiqe5iYF_ktOHSisDhrsnlCSuPhwORvmBMaP4WbOzWZB9j2ONZkIGy6_2g-- X-Apparently-From: X-YMail-OSG: w1i7yDwVM1m5o1lCwb73ecWgb_xlZ7.yZi9qpWRcYndMzmF 7ivVPlowAzIa9.nJxF_r98b_ZRIYFDqAd.y6.qiqj36pq_0_9VHjZ2FBd3JL _9sCaNPNcXy6.iVYEP6fXltvqqm4CV7.DXunOX2e2_JYdUMPsoPNPQUkUhc6 rW77gAHuY2Wazmp.3G6_34F0RmKAmu1jzbcjO5hBBskBId1krmzZeWxfQbCX XbAF5NPjBLA1GgoIOIlFYnvZwaeEoT4_A_yZEyP3bxPIRk0FmNhqTvek8Y2v ePLzMhVzJNYXNlhD98LByShw6ZTzsyPRyHHa0JPdkFDh6lsRlmWKmJJ0SEyt MMls5_FEa371cr4E.uIMv4FyDXA0a.x1Jm54MvWxkwXfyhq.2S3FlV1anIsf z0.wcFXvLST8IxZJM8NF4BwoB485cw0NoJtONz.4JCqwjNRhNHuSEu1f6GaF zgjCEZu9HXXq8xL8kCNJi5h.dt7IQntebawbSZ8NuzHNxc346oCdD4P4XFxl TdPtpdq_PXrwxFP7bTEbSLS36nrf0xcSyDLbWyqZjkkI6FUTABxbrKZzEfES 4aQkXI_kSrsw7PGGpFg0qXTUrXq6t09yXrEPg8SQGxkGIbq9VHOuRkBRaTHR LarGkocP2ELvPfB4XN0iKzzmmvs7B44Q7ZF0aeyq3UoxdYrOCX4cWDMJiUcN EiCMtDEDUuTZiRW49Het.zc.PPTJxvmOxmvniEidaQSlaEf4p29ZnM.BADMp HXwhY_Re5Ok9Bzgg8FmP5KPJ6qp21wsfNf238xxqQqxeBuQzxRO3OoqSQhT4 0vD5PzgbBV5g4e7oDyYv3laE8mFS.UHbghMKgLkXLwwqoMLsFVOlNJR9fPPy aJtFpI020osmGE_s.61D6TuHJckS3MlpUe145SDLBbjaYZfivxff3OOM6Te7 IRGfzV30OK2cZ Message-ID: <87f124f0-8a10-6c3b-6b12-cabf855e2e4b@yahoo.co.jp> Date: Thu, 4 Aug 2022 21:35:14 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.1.0 From: Takayuki 'January June' Suwa Subject: Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))" To: Richard Sandiford References: <7e3fe210-6dbc-fc29-dbb8-b951e89cf7e9@yahoo.co.jp> Cc: GCC Patches In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Aug 2022 12:35:21 -0000 (sorry repost due to the lack of cc here) Hi! On 2022/08/04 18:49, Richard Sandiford wrote: > Takayuki 'January June' Suwa writes: >> Thanks for your response. >> >> On 2022/08/03 16:52, Richard Sandiford wrote: >>> Takayuki 'January June' Suwa via Gcc-patches writes: >>>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps >>>> data flow consistent, but it also increases register allocation pressure >>>> and thus often creates many unwanted register-to-register moves that >>>> cannot be optimized away. >>> >>> There are two things here: >>> >>> - If emit_move_complex_parts emits a clobber of a hard register, >>> then that's probably a bug/misfeature. The point of the clobber is >>> to indicate that the register has no useful contents. That's useful >>> for wide pseudos that are written to in parts, since it avoids the >>> need to track the liveness of each part of the pseudo individually. >>> But it shouldn't be necessary for hard registers, since subregs of >>> hard registers are simplified to hard registers wherever possible >>> (which on most targets is "always"). >>> >>> So I think the emit_move_complex_parts clobber should be restricted >>> to !HARD_REGISTER_P, like the lower-subreg clobber is. If that helps >>> (if only partly) then it would be worth doing as its own patch. >>> >>> - I think it'd be worth looking into more detail why a clobber makes >>> a difference to register pressure. A clobber of a pseudo register R >>> shouldn't make R conflict with things that are live at the point of >>> the clobber. >> >> I agree with its worth. >> In fact, aside from other ports, on the xtensa one, RA in code with frequent D[FC]mode pseudos is terribly bad. >> For example, in __muldc3 on libgcc2, the size of the stack frame reserved will almost double depending on whether or not this patch is applied. > > Yeah, that's a lot. So lots, but almost double might be an overstatement :) BTW after some quick experimentation, I found that turning on -fsplit-wide-types-early would roughly (but not completely) solve the problem. Surely, the output was not so bad in the past... > >>>> It seems just analogous to partial register >>>> stall which is a famous problem on processors that do register renaming. >>>> >>>> In my opinion, when the register to be clobbered is a composite of hard >>>> ones, we should clobber the individual elements separetely, otherwise >>>> clear the entire to zero prior to use as the "init-regs" pass does (like >>>> partial register stall workarounds on x86 CPUs). Such redundant zero >>>> constant assignments will be removed later in the "cprop_hardreg" pass. >>> >>> I don't think we should rely on the zero being optimised away later. >>> >>> Emitting the zero also makes it harder for the register allocator >>> to elide the move. For example, if we have: >>> >>> (set (subreg:SI (reg:DI P) 0) (reg:SI R0)) >>> (set (subreg:SI (reg:DI P) 4) (reg:SI R1)) >>> >>> then there is at least a chance that the RA could assign hard registers >>> R0:R1 to P, which would turn the moves into nops. If we emit: >>> >>> (set (reg:DI P) (const_int 0)) >>> >>> beforehand then that becomes impossible, since R0 and R1 would then >>> conflict with P. >> >> Ah, surely, as you pointed out for targets where "(reg: DI)" corresponds to one hard register. > > I was thinking here about the case where (reg:DI …) corresponds to > 2 hard registers. Each subreg move is then a single hard register > copy, but assigning P to the combination R0:R1 can remove both of > the subreg moves. > >>> TBH I'm surprised we still run init_regs for LRA. I thought there was >>> a plan to stop doing that, but perhaps I misremember. >> >> Sorry I am not sure about the status of LRA... because the xtensa port is still using reload. > > Ah, hadn't realised that. If you have time to work on it, it would be > really good to move over to LRA. There are plans to remove old reload. Alas you do overestimate me :) I've only been working about the GCC development for a little over a year. Well it's a lie that I'm not interested in it, but too much for me. > > It might be that old reload *does* treat a pseudo clobber as a conflict. > I can't remember now. If so, then zeroing the register wouldn't be > too bad (for old reload only). > >> As conclusion, trying to tweak the common code side may have been a bit premature. >> I'll consider if I can deal with those issues on the side of the target-specific code. > > It's likely to be at least partly a target-independent issue, so tweaking > the common code makes sense in principle. > > Does adding !HARD_REGISTER_P (x) to: > > /* Show the output dies here. This is necessary for SUBREGs > of pseudos since we cannot track their lifetimes correctly; > hard regs shouldn't appear here except as return values. */ > if (!reload_completed && !reload_in_progress > && REG_P (x) && !reg_overlap_mentioned_p (x, y)) > emit_clobber (x); > > in emit_move_complex_parts help? If so, I think we should do at Probably yes. Quick test says the abovementioned mod makes the ad-hoc fix I posted earlier (https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596626.html) a thing of the past. > least that much. > > Thanks, > Richard