From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id ABBDF3858034; Tue, 6 Apr 2021 08:25:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ABBDF3858034 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/99912] Unnecessary / inefficient spilling of AVX2 ymm registers Date: Tue, 06 Apr 2021 08:25:24 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc keywords cf_gcctarget Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Apr 2021 08:25:25 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99912 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org Keywords| |missed-optimization Target| |x86_64-*-* --- Comment #3 from Richard Biener --- Which function does the loop kernel reside in? I see you have some lambdas in Z4c_RHS, done fancy as out-of-line functions, that do look like they could comprise the actual kernels. In apply_upwind_diss I see cases without stack usage. I'm looking at -O2 -march=3Dskylake compiles Note that with C++ it's easy to retain some abstraction and thus misinterpr= et stack accesses as spilling where they are aggregates not eliminated. For example in one of the lambdas I see _61489 =3D __builtin_ia32_maskloadpd256 (_104487, _61513); D.545024[1].elts.car =3D _61489; ... MEM[(struct vect *)&D.544982].elts._M_elems[1] =3D MEM[(const struct simd &)&D.545024 + 32]; ... MEM[(struct mat3 *)&vars + 992B] =3D MEM[(const struct mat3 &)&D.544982]; and D.544982 is later variable indexed in some MIN/MAX, FMA using code (instead of using 'vars' there). Looking at what -fdump-tree-optimized produces is sometimes pointing at problems. That said, the code is large so please point at some source lines within the important kernel(s) (of the preprocessed source, that is) and the compile options used.=