From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id BC6333858D35; Thu,  7 Mar 2024 10:56:05 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BC6333858D35
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1709808965;
	bh=hcLwltLqCssRVg/du3IcN+XHNtB9gddjdT7aHIxN9ow=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=tRp+3h4jUSQgKCWmKd7kwmNaRbi6sW4iGWArtAeZivX1M9NXnrrGhqapjndc+XxiX
	 TaSqTAdEfS8rXkhzUFtinH0IqyFqerI3eDPow/JOIneRw853DV295Xbw1bB8J8SKpx
	 eSN1iSjkppwGw+SGvu5raz9RhNjwK3o39zJQgGYo=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/114252] Introducing bswapsi reduces code performance
Date: Thu, 07 Mar 2024 10:56:05 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-114252-4-hPfISIJjRG@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114252-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114252-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114252
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Georg-Johann Lay from comment #12)
> (In reply to Richard Biener from comment #10)
> > I think the target controls the "libcall" ABI that's used for calls to
> > libgcc,
>=20
> You have a pointer how to do it or an example? IIRC I looked into it quit=
e a
> while ago, and it didn't allow to specify/adjust call_used_regs[] etc.
>=20
> > I think the target should implement an inline bswap, possibly via a
> > define_insn_and_split or define_split so the byte ops are only exposed
> > at a desired point;  important points being lower_subreg (split-wide-ty=
pes)
> > and register allocation - possibly lower_subreg should itself know
> > how to handle bswap (though the degenerate AVR case is quite special).
>=20
> That would result in SUBREGs all over the place.  As Vladimir pointed out=
 in=20
>=20
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110093#c5
>=20
> DFA doesn't handle subregs properly, and register alloc then uses extra
> reloads, bloating the code (not only in PR110093 but also 114243.  Unlike=
ly
> any pass will untangle the mess of four (set (subreg:QI (SI)) (subreg:QI
> (SI)))

Yep.  Which is why I was playing thoughts of having (bswap:SI ..) handled
during reload itself ...

The alternative would be to have SImode hardregs by using consecutive
registers and special constraints.  That reduces RA freedom but it would
allow bswap:SI to be split after reload.  Or not split at all but
emitted directly as a sequence of those eor's - of course then making
the assembly quite big, not taking advantage of the fact that we can
probably elide most reg-reg moves.  So splitting after reload might
allow the moves to be eliminated and avoiding the subreg DF.

That said, it probably needs (a lot of) experimenting.

What I've tried to communicate with the store-merging patch attempt is
that GIMPLE optimizations have not enough information to decide whether
a bswap replacement is profitable or not.  Or at least there's no
sophisticated way I can think of that would work for AVR and other targets?=