From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E8A593864818; Fri, 30 Jul 2021 14:04:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E8A593864818 From: "ts.tomeksopel at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers. Date: Fri, 30 Jul 2021 14:04:23 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 11.2.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: ts.tomeksopel at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2021 14:04:24 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D101693 Bug ID: 101693 Summary: Terrible SIMD register allocation with a tight loop operating on 8 registers. Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ts.tomeksopel at gmail dot com Target Milestone: --- There are a few issues regarding unnecessary register spilling, but this al= so exhibits a lot of unnecessary juggling between registers. See https://godbolt.org/z/da76fY1n7 and https://www.reddit.com/r/cpp_questions/comments/oui5tc/simd_what_to_do_when= _your_compiler_forgets_how_to/ The gist is that there's a tight loop, executed a constant number of times = (~64 times) where accumulation happens to 8 ymm registers, and only those 8 registers are used from outside of the loop. Before the loop zeros are assinged, and after the loop horizontal addition is performed. GCC generates suboptimal code, whereas clang gets it right. It seems to perform unnecessa= ry movs in a pattern following a -> b -> vpdpbusd to b -> a. All versions on godbolt >=3D8.1 seem to exhibit the issue, including trunk.=