From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7E8A33858D28; Fri, 29 Mar 2024 18:11:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7E8A33858D28 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1711735894; bh=Yi5d7fva6eT5v+nNSx9g/e2eTT5THcPztrCaEvDRnNs=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ZB9XC/BfUA/rMeeH+a4u6tE3SiQ0YoUx2lAwvnlr5Mxi2aNVBS+m3t8mCr/i1GWrG NnFN9z+lDn2o9MluYLznuD7h6RBQOHz0hHnkSJShcN0a1gr/jehBQ81++1RAQwnsnm WyPS/O4zRJWjpMJLo/QxUymy6wSoIu4bbCySO9f0= From: "meissner at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/110960] TestSatWidenMulPairwiseAdd in the Google Highway test suite fails when compiled with GCC 12 or later with the -mcpu=power9 option Date: Fri, 29 Mar 2024 18:11:32 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.3.0 X-Bugzilla-Keywords: needs-bisection, wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: meissner at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110960 Michael Meissner changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |meissner at gcc dot gnu.org --- Comment #12 from Michael Meissner --- The test case actually shows on power8 GCC was generating incorrect code, a= nd power9 is actually doing the right thing. But the test case was written assuming the previous behavior was correct. TL;DNR answer power8 generated STVX instead of STXVD2VX. Power9 generates STXV. To explain what the issue is, we need to go back in history. PowerPC processors (and Power before it) were originally designed for big endian environments. The Altivec instruction set had limited vector save a= nd load instructions (STVX and LVX) which ignored the bottom 4 bits of the address. STVX and LVX did the correct byte swapping if the PowerPC was run= ning in little endian mode. When power7 came out with the VSX instruction set, the vector save and load instructions (STXVD2X, STXV4X, LXVD2X, and LXV4X) were added. These instructions allowed saving and loading all 64 VSX registers (32 registers = that overlapped with floating point registers, and 32 registers that overlapped = with traditional Altivec registers). However, these instructions only store and load values using big endian ordering. After the power8 came out, the PowerPC Linux systems were moved from being = big endian to little endian. This meant that after doing a vector load instruction, we had to do explicit byte swapping, and before a vector save = we had to do the byte swapping of the value before doing the save. We added an optimization to GCC that in the special case of storing/loading temporaries on the stack, we would use the Altivec instructions STVX and LVX and elimiante the byte swapping instructions since we could insure that all temporaries were correctly aligned. But we couldn't use STVX and LVX in general due to these instructions ignoring the bottom 4 bits of the address= and they restricted the vector registers to just the VSX registers that overlap with the Altivec registers. When power9 came out, we added new vector store and load instructions (STXV, STXVX, LXV, and LXVX) that did the correct byte swapping on little endian systems. GCC now generates these instructions and eliminates the special c= ode to use the Altivec STVX and LVX instructions. In the test case, VerifyVecEqToActual takes 2 vector arguments, and creates= 2 16 byte arrays, and stores each vector into the array. It uses reinterpret_cast to convert this into a store instruction. However, since the temporary is on the stack, on power8 this uses the Altiv= ec STVX instruction and it gets byte swapped.=