From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 543A23858C2C; Sat, 26 Aug 2023 18:29:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 543A23858C2C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1693074541; bh=U2OBzHeqMgOpahlE6j1RB3GsgpvnzCN+qiyBcz2WkdE=; h=From:To:Subject:Date:From; b=UGdinD5yxQV8jTMSBTu0H35Jt5pZtIKlntsn+Szwf0bZf1YQmxr/5+jG9xXihfb9t lsW/HbDBWduGrVSJfZwRx38251tgm0b3udJaSNsAvVs81Qm7k+Dh9aBDynXoSM2q25 d3BxH3J4PYO1dqyI4bSLCkje455t70K5uSVSHZ3U= From: "gnu_bugzilla_gcc at catelyn dot tech" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/111166] New: gcc unnecessarily creates vector operations for packing 32 bit integers into struct (x86_64) Date: Sat, 26 Aug 2023 18:29:00 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.2.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: gnu_bugzilla_gcc at catelyn dot tech X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111166 Bug ID: 111166 Summary: gcc unnecessarily creates vector operations for packing 32 bit integers into struct (x86_64) Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gnu_bugzilla_gcc at catelyn dot tech Target Milestone: --- Created attachment 55799 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D55799&action=3Dedit preprocessed file that triggers the bug, as requested GCC version: gcc version 13.2.1 20230801 (GCC) Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --enable-languages=3Dada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-boots= trap --prefix=3D/usr --libdir=3D/usr/lib --libexecdir=3D/usr/lib --mandir=3D/usr= /share/man --infodir=3D/usr/share/info --with-bugurl=3Dhttps://bugs.archlinux.org/ --with-build-config=3Dbootstrap-lto --with-linker-hash-style=3Dgnu --with-system-zlib --enable-__cxa_atexit --enable-cet=3Dauto --enable-checking=3Drelease --enable-clocale=3Dgnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-obj= ect --enable-libstdcxx-backtrace --enable-link-serialization=3D1 --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=3Dposix --disable-libssp --disable-libstdc= xx-pch --disable-werror Command used: gcc -v -save-temps weird_gcc_behaviour.c -o weird_gcc_behavio= ur.s -S -O3 -mtune=3Dgeneric -march=3Dx86-64 (same behaviour is observed with -O2) Command gives no output to stdout nor stderr, and returns with exit code 0 When compiling the function `turn_into_struct`, a simple function that pack= s 4 32 bit unsigned integers arguments into a simple struct holding 4 such inte= gers and passes that along to `do_smth_with_4_u32`, at -O2 or -O3 the generated assembly contains a couple vector operations (`punpckldq` and `punpcklqdq`)= , as well as spilling onto the stack. This does not seem like a good idea to me, performance wise When compiled at -Os it instead uses `salq`, `movl` (to ensure the upper 32 bits are cleared) and `orq` to pack the data together, avoiding memory altogether, which (intuitively to me) seems like a significantly faster implementation as it doesn't need to touch SSE nor memory=