From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 008F13842438; Fri, 19 Mar 2021 23:30:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 008F13842438 From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/99563] [10 Regression] Code miscompilation caused by _mm256_zeroupper() Date: Fri, 19 Mar 2021 23:30:41 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 10.2.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org X-Bugzilla-Target-Milestone: 10.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Mar 2021 23:30:42 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99563 --- Comment #6 from CVS Commits --- The releases/gcc-10 branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:788da80413551fe1a1411c700864640b590dcfc5 commit r10-9486-g788da80413551fe1a1411c700864640b590dcfc5 Author: Jakub Jelinek Date: Tue Mar 16 11:16:15 2021 +0100 i386: Fix up _mm256_vzeroupper() handling [PR99563] My r10-6451-gb7b3378f91c0641f2ef4d88db22af62a571c9359 fix for vzeroupper vs. ms ABI apparently broke the explicit vzeroupper handling when the implicit vzeroupper handling is disabled. The epilogue_completed splitter for vzeroupper now adds clobbers for all registers which don't have explicit sets in the pattern and the sets are added during vzeroupper pass. Before my changes, for explicit user vzeroupper, we just weren't modelling its effects at all, it was just unspec that didn't tell that it clobbers the upper parts of all XMM < %xmm16 registers. But now the splitter will even for those add clobbers and as it has no sets, it will add clobbers for all registers, which means we optimize away anything that lived across that vzeroupper. The vzeroupper pass has two parts, one is the mode switching that compu= tes where to put the implicit vzeroupper calls and puts them there, and then another that uses df to figure out what sets to add to all the vzeroupp= er. The former part should be done only under the conditions we have in the gate, but the latter as this PR shows needs to happen either if we perf= orm the implicit vzeroupper additions, or if there are (or could be) any explicit vzeroupper instructions. As that function does df_analyze and walks the whole IL, I think it would be too expensive to run it always whenever TARGET_AVX, so this patch remembers if we've expanded at least one __builtin_ia32_vzeroupper in the function and runs that part of the vzeroupper pass both when the old condition is true or when this new flag is set. 2021-03-16 Jakub Jelinek PR target/99563 * config/i386/i386.h (struct machine_function): Add has_explicit_vzeroupper bitfield. * config/i386/i386-expand.c (ix86_expand_builtin): Set cfun->machine->has_explicit_vzeroupper when expanding IX86_BUILTIN_VZEROUPPER. * config/i386/i386-features.c (rest_of_handle_insert_vzeroupper= ): Do the mode switching only when TARGET_VZEROUPPER, expensive optimizations turned on and not optimizing for size. (pass_insert_vzeroupper::gate): Enable even when cfun->machine->has_explicit_vzeroupper is set. * gcc.target/i386/avx-pr99563.c: New test. (cherry picked from commit 82085eb3d44833bd1557fdd932c4738d987f559d)=