From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 2C077385141F; Mon, 21 Jun 2021 02:30:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2C077385141F From: "luoxhu at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9 Date: Mon, 21 Jun 2021 02:29:59 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 8.3.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: luoxhu at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jun 2021 02:30:00 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100866 --- Comment #8 from luoxhu at gcc dot gnu.org --- (In reply to Jens Seifert from comment #7) > Regarding vec_revb for vector unsigned int. I agree that > revb: > .LFB0: > .cfi_startproc > vspltish %v1,8 > vspltisw %v0,-16 > vrlh %v2,%v2,%v1 > vrlw %v2,%v2,%v0 > blr >=20 > works. But in this case, I would prefer the vperm approach assuming that = the > loaded constant for the permute vector can be re-used multiple times. > But please get rid of the xxlnor 32,32,32. That does not make sense after > loading a constant. Change the constant that need to be loaded. xxlnor is LE specific requirement(not existed if build with -mbig), we need= to turn the index {0,1,2,3} to {31, 30,29,28} for vperm usage, it is required otherwise produces incorrect result: 6| 0x0000000010000630 <+16>: lvx v0,0,r9 7+> 0x0000000010000634 <+20>: xxlnor vs32,vs32,vs32 8| 0x0000000010000638 <+24>: vperm v2,v2,v2,v0 9| 0x000000001000063c <+28>: blr (gdb) 0x0000000010000634 in revb () 2: /x $vs34.uint128 =3D 0x42345678323456782234567812345678 5: /x $vs32.uint128 =3D 0xc0d0e0f08090a0b0405060700010203 (gdb) si 0x0000000010000638 in revb () 2: /x $vs34.uint128 =3D 0x42345678323456782234567812345678 5: /x $vs32.uint128 =3D 0xf3f2f1f0f7f6f5f4fbfaf9f8fffefdfc (gdb) si 0x000000001000063c in revb () 2: /x $vs34.uint128 =3D 0x78563442785634327856342278563412 5: /x $vs32.uint128 =3D 0xf3f2f1f0f7f6f5f4fbfaf9f8fffefdfc Quoted from the ISA: vperm VRT,VRA,VRB,VRC vsrc.qword[0] =E2=86=90 VSR[VRA+32] vsrc.qword[1] =E2=86=90 VSR[VRB+32] do i =3D 0 to 15 index =E2=86=90 VSR[VRC+32].byte[i].bit[3:7] VSR[VRT+32].byte[i] =E2=86=90 src.byte[index] end Let the source vector be the concatenation of the contents of VSR[VRA+32] followed by the contents of VSR[VRB+32]. For each integer value i from 0 to 15, do the following. Let index be the value specified by bits 3:7 of byte element i of VSR[VRC+32]. The contents of byte element index of src are placed into byte element i of VSR[VRT+32].=