From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 7D88B3857BA7; Thu,  7 Mar 2024 10:47:44 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7D88B3857BA7
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1709808464;
	bh=/rVXc+sWXEsj4Utf+XJmlJLiOnLLI7Tx073PvCEcB6M=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=Q4K0EcvCf8PLXL1pWrxe5QerJWDhftd/pR9b22mN/KSyZwZNr1+MEtuRuYFMWjJda
	 ZgdcKVPQwtFfbBbJod5YoRQkO8OAzdXV/qFnsGE2TeH/NSqFJDuofqx6pxCpFomo+Z
	 I+mw9yOFUSAQ44dlJX0uIEEZpj1OxHREEpqn8leM=
From: "gjl at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/114252] Introducing bswapsi reduces code performance
Date: Thu, 07 Mar 2024 10:47:43 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: gjl at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-114252-4-KlSJzJFIci@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114252-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114252-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114252
--- Comment #12 from Georg-Johann Lay <gjl at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #10)
> I think the target controls the "libcall" ABI that's used for calls to
> libgcc,

You have a pointer how to do it or an example? IIRC I looked into it quite a
while ago, and it didn't allow to specify/adjust call_used_regs[] etc.

> I think the target should implement an inline bswap, possibly via a
> define_insn_and_split or define_split so the byte ops are only exposed
> at a desired point;  important points being lower_subreg (split-wide-type=
s)
> and register allocation - possibly lower_subreg should itself know
> how to handle bswap (though the degenerate AVR case is quite special).

That would result in SUBREGs all over the place.  As Vladimir pointed out i=
n=20

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110093#c5

DFA doesn't handle subregs properly, and register alloc then uses extra
reloads, bloating the code (not only in PR110093 but also 114243.  Unlikely=
 any
pass will untangle the mess of four (set (subreg:QI (SI)) (subreg:QI (SI)))



> Yeah.  Or comparing to open-coding the bswap without going through the ca=
ll.
> I don't have a AVR libgcc around, but libgcc2.s has
>=20
> #ifdef L_bswapsi2
> SItype
> __bswapsi2 (SItype u)
> {
>   return ((((u) & 0xff000000u) >> 24)
>           | (((u) & 0x00ff0000u) >>  8)
>           | (((u) & 0x0000ff00u) <<  8)
>           | (((u) & 0x000000ffu) << 24));
> }
> #endif=20

The libgcc side is not a problem at all, libgcc/config/avr/lib1funcs.S has:

;; swap two registers with different register number
.macro bswap a, b
    eor \a, \b
    eor \b, \a
    eor \a, \b
.endm

#if defined (L_bswapsi2)
;; swap bytes
;; r25:r22 =3D bswap32 (r25:r22)
DEFUN __bswapsi2
    bswap r22, r25
    bswap r23, r24
    ret
ENDF __bswapsi2
#endif /* defined (L_bswapsi2) */

#if defined (L_bswapdi2)
;; swap bytes
;; r25:r18 =3D bswap64 (r25:r18)
DEFUN __bswapdi2
    bswap r18, r25
    bswap r19, r24
    bswap r20, r23
    bswap r21, r22
    ret
ENDF __bswapdi2
#endif /* defined (L_bswapdi2) */


There's currently no handcrafted bswap16 though.=