From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BD4CD3858CDB; Mon, 16 Jan 2023 08:07:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BD4CD3858CDB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1673856458; bh=/4h4J2HLZxO+Rhsh12rn+cyym+x8yalBVblgLn1AIm0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=txhF3cdWP7VTcY+o0x4CINSmV8cuetGi9wxNLSOiZ+dq9A7/rHogbp4BPUsNU+CGH qBAkneTOZH2nt2lRQIgTTKMLxYTGV2nkmh83Cz1KNxu6M0N5mPFaGFdgGUkNSetq8V JjCa/MPjvIn0lUO5kJFQPT2hyWmRQu8dz1MpZQpM= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/108410] x264 averaging loop not optimized well for avx512 Date: Mon, 16 Jan 2023 08:07:37 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: blocked cf_reconfirmed_on cf_gcctarget keywords cc everconfirmed bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108410 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |53947 Last reconfirmed| |2023-01-16 Target| |x86_64-*-* Keywords| |missed-optimization CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from Richard Biener --- One issue is that we at most perform one epilogue loop vectorization, so wi= th AVX512 we vectorize the epilogue with AVX2 but its epilogue remains unvectorized. With AVX512 we'd want to use a fully masked epilogue using AVX512 instead. I started working on fully masked vectorization support for AVX512 but got distracted. Another option would be to use SSE vectorization for the epilogue (note for SSE we vectorize the epilogue with 64bit half-SSE vectors!), which would mean giving the target (some) control over the mode used for vectorizing the epilogue. That is, in vect_analyze_loop change /* For epilogues start the analysis from the first mode. The motivation behind starting from the beginning comes from cases where the VECTOR_M= ODES array may contain length-agnostic and length-specific modes. Their ordering is not guaranteed, so we could end up picking a mode for the = main loop that is after the epilogue's optimal mode. */ vector_modes[0] =3D autodetected_vector_mode; to go through a target hook (possibly first produce a "candidate mode" set and allow the target to prune that). This might be an "easy" fix for the AVX512 issue for low-trip loops. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations=