From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BC6333858D35; Thu, 7 Mar 2024 10:56:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BC6333858D35 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1709808965; bh=hcLwltLqCssRVg/du3IcN+XHNtB9gddjdT7aHIxN9ow=; h=From:To:Subject:Date:In-Reply-To:References:From; b=tRp+3h4jUSQgKCWmKd7kwmNaRbi6sW4iGWArtAeZivX1M9NXnrrGhqapjndc+XxiX TaSqTAdEfS8rXkhzUFtinH0IqyFqerI3eDPow/JOIneRw853DV295Xbw1bB8J8SKpx eSN1iSjkppwGw+SGvu5raz9RhNjwK3o39zJQgGYo= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/114252] Introducing bswapsi reduces code performance Date: Thu, 07 Mar 2024 10:56:05 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114252 --- Comment #13 from Richard Biener --- (In reply to Georg-Johann Lay from comment #12) > (In reply to Richard Biener from comment #10) > > I think the target controls the "libcall" ABI that's used for calls to > > libgcc, >=20 > You have a pointer how to do it or an example? IIRC I looked into it quit= e a > while ago, and it didn't allow to specify/adjust call_used_regs[] etc. >=20 > > I think the target should implement an inline bswap, possibly via a > > define_insn_and_split or define_split so the byte ops are only exposed > > at a desired point; important points being lower_subreg (split-wide-ty= pes) > > and register allocation - possibly lower_subreg should itself know > > how to handle bswap (though the degenerate AVR case is quite special). >=20 > That would result in SUBREGs all over the place. As Vladimir pointed out= in=20 >=20 > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110093#c5 >=20 > DFA doesn't handle subregs properly, and register alloc then uses extra > reloads, bloating the code (not only in PR110093 but also 114243. Unlike= ly > any pass will untangle the mess of four (set (subreg:QI (SI)) (subreg:QI > (SI))) Yep. Which is why I was playing thoughts of having (bswap:SI ..) handled during reload itself ... The alternative would be to have SImode hardregs by using consecutive registers and special constraints. That reduces RA freedom but it would allow bswap:SI to be split after reload. Or not split at all but emitted directly as a sequence of those eor's - of course then making the assembly quite big, not taking advantage of the fact that we can probably elide most reg-reg moves. So splitting after reload might allow the moves to be eliminated and avoiding the subreg DF. That said, it probably needs (a lot of) experimenting. What I've tried to communicate with the store-merging patch attempt is that GIMPLE optimizations have not enough information to decide whether a bswap replacement is profitable or not. Or at least there's no sophisticated way I can think of that would work for AVR and other targets?=