From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7D5CF385843A; Wed, 6 Mar 2024 10:01:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7D5CF385843A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1709719293; bh=DhHJzkxggA2nYOylQrjER0I6NQ7bmot2tu0F5N56Ifk=; h=From:To:Subject:Date:From; b=Ug1jGoFMd9qsHnNoKTm0DMQcvMPcMa0FLq7PGDYERR9d019MzQKiMI42Cy9aL25qV 8NMhxyrokCsmIx13w8++Znz+Fojw7cknA9fhSaGuG5C1uXr+OoCEFMYEfX5YyULmlu c41HUukEnWSzNGgqUr7UgXrW6ryNNsyOIJADDUxQ= From: "gjl at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114252] New: Introducing bswapsi reduces code performance Date: Wed, 06 Mar 2024 10:01:32 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: gjl at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114252 Bug ID: 114252 Summary: Introducing bswapsi reduces code performance Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gjl at gcc dot gnu.org Target Milestone: --- Created attachment 57628 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D57628&action=3Dedit GNU-C test case typedef __UINT8_TYPE__ uint8_t; typedef __UINT32_TYPE__ uint32_t; typedef uint8_t __attribute__((vector_size(4))) v4u8_t; uint32_t func1 (const uint8_t *buf) { v4u8_t v4 =3D { buf[1], buf[0], buf[3], buf[2] }; return (uint32_t) v4; } Compile the code with $ avr-gcc code.c -S -Os -dp with v13 the result is: func1: mov r30,r24 ; 37 [c=3D4 l=3D1] movqi_insn/0 mov r31,r25 ; 38 [c=3D4 l=3D1] movqi_insn/0 ldd r22,Z+1 ; 39 [c=3D4 l=3D1] movqi_insn/3 ld r23,Z ; 40 [c=3D4 l=3D1] movqi_insn/3 ldd r24,Z+3 ; 41 [c=3D4 l=3D1] movqi_insn/3 ldd r25,Z+2 ; 42 [c=3D4 l=3D1] movqi_insn/3 /* epilogue start */ ret ; 45 [c=3D0 l=3D1] return which is good code: insn 37, 38 move the address to pointer register Z, and then follow 4 loads, one for each byte. When compiled with v14 however: func1: mov r30,r24 ; 23 [c=3D4 l=3D2] *movhi/0 mov r31,r25 ld r22,Z ; 24 [c=3D16 l=3D4] *movsi/2 ldd r23,Z+1 ldd r24,Z+2 ldd r25,Z+3 rcall __bswapsi2 ; 25 [c=3D16 l=3D1] *bswapsi2.libgcc mov r31,r23 ; 32 [c=3D4 l=3D1] movqi_insn/0 mov r23,r25 ; 33 [c=3D4 l=3D1] movqi_insn/0 mov r25,r31 ; 34 [c=3D4 l=3D1] movqi_insn/0 mov r31,r22 ; 35 [c=3D4 l=3D1] movqi_insn/0 mov r22,r24 ; 36 [c=3D4 l=3D1] movqi_insn/0 mov r24,r31 ; 37 [c=3D4 l=3D1] movqi_insn/0 /* epilogue start */ ret ; 40 [c=3D0 l=3D1] return Target: avr Configured with: ../../source/gcc-master/configure --target=3Davr --disable= -nls --with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared --enable-languages=3Dc,c++ Thread model: single Supported LTO compression algorithms: zlib gcc version 14.0.1 20240303 (experimental) (GCC)=