From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 82E823858C00 for ; Thu, 4 Aug 2022 09:49:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 82E823858C00 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D666D11FB; Thu, 4 Aug 2022 02:49:34 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id ADC5B3F67D; Thu, 4 Aug 2022 02:49:33 -0700 (PDT) From: Richard Sandiford To: Takayuki 'January June' Suwa Mail-Followup-To: Takayuki 'January June' Suwa , GCC Patches , richard.sandiford@arm.com Cc: GCC Patches Subject: Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))" References: <7e3fe210-6dbc-fc29-dbb8-b951e89cf7e9@yahoo.co.jp> Date: Thu, 04 Aug 2022 10:49:32 +0100 In-Reply-To: <7e3fe210-6dbc-fc29-dbb8-b951e89cf7e9@yahoo.co.jp> (Takayuki Suwa's message of "Wed, 3 Aug 2022 20:17:23 +0900") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-46.3 required=5.0 tests=BAYES_00, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Aug 2022 09:49:36 -0000 Takayuki 'January June' Suwa writes: > Thanks for your response. > > On 2022/08/03 16:52, Richard Sandiford wrote: >> Takayuki 'January June' Suwa via Gcc-patches w= rites: >>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps >>> data flow consistent, but it also increases register allocation pressure >>> and thus often creates many unwanted register-to-register moves that >>> cannot be optimized away. >>=20 >> There are two things here: >>=20 >> - If emit_move_complex_parts emits a clobber of a hard register, >> then that's probably a bug/misfeature. The point of the clobber is >> to indicate that the register has no useful contents. That's useful >> for wide pseudos that are written to in parts, since it avoids the >> need to track the liveness of each part of the pseudo individually. >> But it shouldn't be necessary for hard registers, since subregs of >> hard registers are simplified to hard registers wherever possible >> (which on most targets is "always"). >>=20 >> So I think the emit_move_complex_parts clobber should be restricted >> to !HARD_REGISTER_P, like the lower-subreg clobber is. If that helps >> (if only partly) then it would be worth doing as its own patch. >>=20 >> - I think it'd be worth looking into more detail why a clobber makes >> a difference to register pressure. A clobber of a pseudo register R >> shouldn't make R conflict with things that are live at the point of >> the clobber. > > I agree with its worth. > In fact, aside from other ports, on the xtensa one, RA in code with frequ= ent D[FC]mode pseudos is terribly bad. > For example, in __muldc3 on libgcc2, the size of the stack frame reserved= will almost double depending on whether or not this patch is applied. Yeah, that's a lot. >>> It seems just analogous to partial register >>> stall which is a famous problem on processors that do register renaming. >>> >>> In my opinion, when the register to be clobbered is a composite of hard >>> ones, we should clobber the individual elements separetely, otherwise >>> clear the entire to zero prior to use as the "init-regs" pass does (like >>> partial register stall workarounds on x86 CPUs). Such redundant zero >>> constant assignments will be removed later in the "cprop_hardreg" pass. >>=20 >> I don't think we should rely on the zero being optimised away later. >>=20 >> Emitting the zero also makes it harder for the register allocator >> to elide the move. For example, if we have: >>=20 >> (set (subreg:SI (reg:DI P) 0) (reg:SI R0)) >> (set (subreg:SI (reg:DI P) 4) (reg:SI R1)) >>=20 >> then there is at least a chance that the RA could assign hard registers >> R0:R1 to P, which would turn the moves into nops. If we emit: >>=20 >> (set (reg:DI P) (const_int 0)) >>=20 >> beforehand then that becomes impossible, since R0 and R1 would then >> conflict with P. > > Ah, surely, as you pointed out for targets where "(reg: DI)" corresponds = to one hard register. I was thinking here about the case where (reg:DI =E2=80=A6) corresponds to 2 hard registers. Each subreg move is then a single hard register copy, but assigning P to the combination R0:R1 can remove both of the subreg moves. >> TBH I'm surprised we still run init_regs for LRA. I thought there was >> a plan to stop doing that, but perhaps I misremember. > > Sorry I am not sure about the status of LRA... because the xtensa port is= still using reload. Ah, hadn't realised that. If you have time to work on it, it would be really good to move over to LRA. There are plans to remove old reload. It might be that old reload *does* treat a pseudo clobber as a conflict. I can't remember now. If so, then zeroing the register wouldn't be too bad (for old reload only). > As conclusion, trying to tweak the common code side may have been a bit p= remature. > I'll consider if I can deal with those issues on the side of the target-s= pecific code. It's likely to be at least partly a target-independent issue, so tweaking the common code makes sense in principle. Does adding !HARD_REGISTER_P (x) to: /* Show the output dies here. This is necessary for SUBREGs of pseudos since we cannot track their lifetimes correctly; hard regs shouldn't appear here except as return values. */ if (!reload_completed && !reload_in_progress && REG_P (x) && !reg_overlap_mentioned_p (x, y)) emit_clobber (x); in emit_move_complex_parts help? If so, I think we should do at least that much. Thanks, Richard