From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 53D0F3858D39; Thu, 7 Mar 2024 07:45:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 53D0F3858D39 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1709797523; bh=SBHVK5eTXfTDdczJhK2DDsZ1yrTkXqhfcdsbDFzsO14=; h=From:To:Subject:Date:In-Reply-To:References:From; b=UqDB6s9+3gwrJ1okbA2RIbSCbxHhWxrv4wgMVzqzi1Vr5nGXmETiQfmYEQVoQ3tdV cpWOLWL8B/T01ZV34uAtQDA8FcvTO5PXAWQNg5S2UN/KpmUsfzuXSLJG4XldHSZ8lv DHhB8JO75BejakiVdXpi/EhUIiTCdXd9DhjoaQLQ= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/114252] Introducing bswapsi reduces code performance Date: Thu, 07 Mar 2024 07:45:22 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114252 --- Comment #7 from Richard Biener --- Note I do understand what you are saying, just the middle-end in detecting = and using __builtin_bswap32 does what it does everywhere else - it checks wheth= er the target implements the operation. The middle-end doesn't try to actually compare costs (it has no idea of the bswapsi costs), and it most definitely doesn't see how AVR is special in having only QImode registers and thus the created SImode load (which the target supports!) will end up as four registers. To me a 'bswap' on AVR never makes sense since whatever is swapped will be _always_ available as a set of byte registers. That's why I question AVR exposing bswapsi to the middle-end rather than suggesting the middle-end should maybe see whether AVR has any regs of HImode or larger. Note that would break for targets that could eventually do a load-multiple byteswapped to a set of QImode regs (guess there's no such one in GCC at least), but it's the only heuristic that might work here. The only thing that maybe would make sense with AVR exposing bswapsi is users calling __builtin_bswap but since it always expands as a libcall even that makes no sense. So my preferred fix would be to remove bswapsi from avr.md? Does it benefit from recognizing bswap done with shifts on an int?=