From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7D88B3857BA7; Thu, 7 Mar 2024 10:47:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7D88B3857BA7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1709808464; bh=/rVXc+sWXEsj4Utf+XJmlJLiOnLLI7Tx073PvCEcB6M=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Q4K0EcvCf8PLXL1pWrxe5QerJWDhftd/pR9b22mN/KSyZwZNr1+MEtuRuYFMWjJda ZgdcKVPQwtFfbBbJod5YoRQkO8OAzdXV/qFnsGE2TeH/NSqFJDuofqx6pxCpFomo+Z I+mw9yOFUSAQ44dlJX0uIEEZpj1OxHREEpqn8leM= From: "gjl at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/114252] Introducing bswapsi reduces code performance Date: Thu, 07 Mar 2024 10:47:43 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: gjl at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114252 --- Comment #12 from Georg-Johann Lay --- (In reply to Richard Biener from comment #10) > I think the target controls the "libcall" ABI that's used for calls to > libgcc, You have a pointer how to do it or an example? IIRC I looked into it quite a while ago, and it didn't allow to specify/adjust call_used_regs[] etc. > I think the target should implement an inline bswap, possibly via a > define_insn_and_split or define_split so the byte ops are only exposed > at a desired point; important points being lower_subreg (split-wide-type= s) > and register allocation - possibly lower_subreg should itself know > how to handle bswap (though the degenerate AVR case is quite special). That would result in SUBREGs all over the place. As Vladimir pointed out i= n=20 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110093#c5 DFA doesn't handle subregs properly, and register alloc then uses extra reloads, bloating the code (not only in PR110093 but also 114243. Unlikely= any pass will untangle the mess of four (set (subreg:QI (SI)) (subreg:QI (SI))) > Yeah. Or comparing to open-coding the bswap without going through the ca= ll. > I don't have a AVR libgcc around, but libgcc2.s has >=20 > #ifdef L_bswapsi2 > SItype > __bswapsi2 (SItype u) > { > return ((((u) & 0xff000000u) >> 24) > | (((u) & 0x00ff0000u) >> 8) > | (((u) & 0x0000ff00u) << 8) > | (((u) & 0x000000ffu) << 24)); > } > #endif=20 The libgcc side is not a problem at all, libgcc/config/avr/lib1funcs.S has: ;; swap two registers with different register number .macro bswap a, b eor \a, \b eor \b, \a eor \a, \b .endm #if defined (L_bswapsi2) ;; swap bytes ;; r25:r22 =3D bswap32 (r25:r22) DEFUN __bswapsi2 bswap r22, r25 bswap r23, r24 ret ENDF __bswapsi2 #endif /* defined (L_bswapsi2) */ #if defined (L_bswapdi2) ;; swap bytes ;; r25:r18 =3D bswap64 (r25:r18) DEFUN __bswapdi2 bswap r18, r25 bswap r19, r24 bswap r20, r23 bswap r21, r22 ret ENDF __bswapdi2 #endif /* defined (L_bswapdi2) */ There's currently no handcrafted bswap16 though.=