From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9FB9E3857B8D; Thu, 7 Mar 2024 09:42:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9FB9E3857B8D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1709804529; bh=QS/mmTtjmz+tZxFDLPrkp0sgiyQMNnp3fpm3O4dOsTg=; h=From:To:Subject:Date:In-Reply-To:References:From; b=v1TgyvmyIr24SLq9d+Be++KWoSMIJsMxZW3vRfMSHQlAqhZ7Hss3NNOzp0C4mZ5Hd dzS7wmenE6wBbsQuHg0l+SVaptWWdh87SX/oLQTjPoP1o/KSsjt/dGNPvR3ikIoPr0 Gzy7216cQxn7VME+FDHOggT2lrjxZG/YnlQQGjf4= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/114252] Introducing bswapsi reduces code performance Date: Thu, 07 Mar 2024 09:42:07 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114252 --- Comment #11 from Richard Biener --- diff --git a/gcc/gimple-ssa-store-merging.cc b/gcc/gimple-ssa-store-merging= .cc index 42b68abf61b..c9d4662656f 100644 --- a/gcc/gimple-ssa-store-merging.cc +++ b/gcc/gimple-ssa-store-merging.cc @@ -170,6 +170,7 @@ #include "optabs-tree.h" #include "dbgcnt.h" #include "selftest.h" +#include "regs.h" /* The maximum size (in bits) of the stores this pass should generate. */ #define MAX_STORE_BITSIZE (BITS_PER_WORD) @@ -1484,7 +1485,8 @@ maybe_optimize_vector_constructor (gimple *cur_stmt) break; case 32: if (builtin_decl_explicit_p (BUILT_IN_BSWAP32) - && optab_handler (bswap_optab, SImode) !=3D CODE_FOR_nothing) + && optab_handler (bswap_optab, SImode) !=3D CODE_FOR_nothing + && have_regs_of_mode[SImode]) { load_type =3D uint32_type_node; fndecl =3D builtin_decl_explicit (BUILT_IN_BSWAP32); @@ -1545,7 +1547,8 @@ pass_optimize_bswap::execute (function *fun) tree bswap32_type =3D NULL_TREE, bswap64_type =3D NULL_TREE; bswap32_p =3D (builtin_decl_explicit_p (BUILT_IN_BSWAP32) - && optab_handler (bswap_optab, SImode) !=3D CODE_FOR_nothing= ); + && optab_handler (bswap_optab, SImode) !=3D CODE_FOR_nothing + && have_regs_of_mode[SImode]); bswap64_p =3D (builtin_decl_explicit_p (BUILT_IN_BSWAP64) && (optab_handler (bswap_optab, DImode) !=3D CODE_FOR_nothing || (bswap32_p && word_mode =3D=3D SImode))); doesn't work. AVR has regs of SImode. There doesn't seem to be a way to query the (maximum?) number of hardregs used for a mode. Using bswap32_p =3D (builtin_decl_explicit_p (BUILT_IN_BSWAP32) && optab_handler (bswap_optab, SImode) !=3D CODE_FOR_nothing && have_regs_of_mode[SImode] && hard_regno_nregs (0, SImode) =3D=3D 1); "works" but is surely wrong (whatever hardreg zero corresponds to). Looking only at word_mode, requiring SImode size >=3D word_mode size like w= ith bswap32_p =3D (builtin_decl_explicit_p (BUILT_IN_BSWAP32) && optab_handler (bswap_optab, SImode) !=3D CODE_FOR_nothing && known_ge (GET_MODE_SIZE (word_mode), GET_MODE_SIZE (SImode))); "works" but would affect many more targets. Maybe && word_mode !=3D QImode is better. Note that this will cut off _all_ bswap detection. Thus my question on profitability of detecting cases like those in libgcc2.c which then produces __bswapsi2: push r12 push r13 push r14 push r15 push r16 push r17 /* prologue: function */ /* frame size =3D 0 */ /* stack size =3D 6 */ .L__stack_usage =3D 6 mov r16,r22 mov r17,r23 mov r18,r24 mov r19,r25 mov r22,r19 clr r23 clr r24 clr r25 mov r15,r16 clr r14 clr r13 clr r12 or r22,r12 or r23,r13 or r24,r14 or r25,r15 mov r12,r17 mov r13,r18 mov r14,r19 clr r15 clr r12 clr r14 clr r15 or r22,r12 or r23,r13 or r24,r14 or r25,r15 mov r19,r18 mov r18,r17 mov r17,r16 clr r16 clr r16 clr r17 clr r19 or r22,r16 or r23,r17 or r24,r18 or r25,r19 /* epilogue start */ pop r17 pop r16 pop r15 pop r14 pop r13 pop r12 ret then. bswap detection does not try to do any sophisticated evaluation of costs.=