From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 47F993858CDB; Sat, 18 May 2024 10:06:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 47F993858CDB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1716026811; bh=TZinEJJrjadXXuIgwfqy2WbJL40vcMotcAFTnZsodBs=; h=From:To:Subject:Date:From; b=A4XzzntivnMIHDIktAYSgMDuoXjPF9FxtvCCQjXQYQFqGnce0SfZsvZu/UN9koXC+ rJ56DW2Qm8vs5pJJ7zl/zlcb475z2uBfEPUU91Na7Alut+APU69zV3dQgDWPm16Mzn hmnu89lCYiJ7nBqAiFFYQc1E4U+bg0EHY58cWQXA= From: "slyfox at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/115146] New: [15 Regression] Incorrect 8-byte vectorization: psllw/psraw confusion Date: Sat, 18 May 2024 10:06:50 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: slyfox at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115146 Bug ID: 115146 Summary: [15 Regression] Incorrect 8-byte vectorization: psllw/psraw confusion Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: slyfox at gcc dot gnu.org Target Milestone: --- Initially observed as a test failures on highway-1.0.7 on r15-644-g7422e050f33dd9 compiler: HwyReverseTestGroup/HwyReverseTest.TestAllReverseLaneBytes/EMU128 FAILED Self-contained example: // $ cat bug.c typedef unsigned char u8; __attribute__((noipa)) static void fill_src(u8 * src) { src[0] =3D 0x00; src[1] =3D 0xff; } __attribute__((noipa)) static void assert_dst(const u8 * dst) { if (dst[0] !=3D 0xff) __builtin_trap(); if (dst[1] !=3D 0x00) __builtin_trap(); } int main() { u8 src[8] __attribute__((aligned(16))) =3D { 0 }; u8 dst[8] __attribute__((aligned(16))) =3D { 0 }; // place 0x00 into src[0] and 0xFF into src[1] fill_src(src); // swap bytes: // place 0xFF into dst[0], 0x00 into dst[1] for (unsigned long i =3D 0; i < 8; i +=3D 2) { dst[i + 0] =3D src[i + 1]; dst[i + 1] =3D src[i + 0]; } // make sure bytes swapped assert_dst(dst); } Triggering: $ gcc bug.c -o a -O1 && ./a $ gcc bug.c -o a -O2 && ./a Illegal instruction (core dumped) A bit of analysis: Dump of assembler code for function main: 0x0000000000401030 <+0>: sub $0x28,%rsp 0x0000000000401034 <+4>: mov %rsp,%rdi 0x0000000000401037 <+7>: movq $0x0,(%rsp) 0x000000000040103f <+15>: movq $0x0,0x10(%rsp) 0x0000000000401048 <+24>: call 0x401170 0x000000000040104d <+29>: movq (%rsp),%xmm0 0x0000000000401052 <+34>: lea 0x10(%rsp),%rdi 0x0000000000401057 <+39>: movdqa %xmm0,%xmm1 0x000000000040105b <+43>: psllw $0x8,%xmm0 0x0000000000401060 <+48>: psraw $0x8,%xmm1 ; <<<- why arithmetic shi= ft? should be psrllw 0x0000000000401065 <+53>: por %xmm0,%xmm1 0x0000000000401069 <+57>: movq %xmm1,0x10(%rsp) 0x000000000040106f <+63>: call 0x401180 0x0000000000401074 <+68>: xor %eax,%eax 0x0000000000401076 <+70>: add $0x28,%rsp 0x000000000040107a <+74>: ret End of assembler dump. Here `psraw` should have been `psrllw` to avoid sign bit extension. $ gcc -v Using built-in specs. COLLECT_GCC=3D/<>/gcc-15.0.0/bin/gcc COLLECT_LTO_WRAPPER=3D/<>/gcc-15.0.0/libexec/gcc/x86_64-unknown-linux-= gnu/15.0.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../source/configure --prefix=3D/<>/gcc-15.0.0 --with-gmp-include=3D/<>/gmp-6.3.0-dev/include --with-gmp-lib=3D/<>/gmp-6.3.0/lib --with-mpfr-include=3D/<>/mpfr-4.2.1-dev/include --with-mpfr-lib=3D/<>/mpfr-4.2.1/lib --with-mpc=3D/<>/libmpc-1.3.1 --with-native-system-header-dir=3D/<>/glibc-2.39-52-dev/include --with-build-sysroot=3D/ --with-gxx-include-dir=3D/<>/gcc-15.0.0/include/c++/15.0.0/ --program-prefix=3D --enable-lto --disable-libstdcxx-pch --without-included-gettext --with-system-zlib --enable-checking=3Drelease --enable-static --enable-languages=3Dc,c++ --disable-multilib --enable-plug= in --disable-libcc1 --with-isl=3D/<>/isl-0.20 --disable-bootstrap --build=3Dx86_64-unknown-linux-gnu --host=3Dx86_64-unknown-linux-gnu --target=3Dx86_64-unknown-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib gcc version 15.0.0 99999999 (experimental) (GCC)=