From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id BD4CD3858CDB; Mon, 16 Jan 2023 08:07:38 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BD4CD3858CDB
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1673856458;
	bh=/4h4J2HLZxO+Rhsh12rn+cyym+x8yalBVblgLn1AIm0=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=txhF3cdWP7VTcY+o0x4CINSmV8cuetGi9wxNLSOiZ+dq9A7/rHogbp4BPUsNU+CGH
	 qBAkneTOZH2nt2lRQIgTTKMLxYTGV2nkmh83Cz1KNxu6M0N5mPFaGFdgGUkNSetq8V
	 JjCa/MPjvIn0lUO5kJFQPT2hyWmRQu8dz1MpZQpM=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/108410] x264 averaging loop not optimized well for
 avx512
Date: Mon, 16 Jan 2023 08:07:37 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: blocked cf_reconfirmed_on cf_gcctarget keywords cc
 everconfirmed bug_status
Message-ID: <bug-108410-4-jo2RTcRzuE@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-108410-4@http.gcc.gnu.org/bugzilla/>
References: <bug-108410-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108410

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947
   Last reconfirmed|                            |2023-01-16
             Target|                            |x86_64-*-*
           Keywords|                            |missed-optimization
                 CC|                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
One issue is that we at most perform one epilogue loop vectorization, so wi=
th
AVX512 we vectorize the epilogue with AVX2 but its epilogue remains
unvectorized.  With AVX512 we'd want to use a fully masked epilogue using
AVX512 instead.

I started working on fully masked vectorization support for AVX512 but
got distracted.

Another option would be to use SSE vectorization for the epilogue
(note for SSE we vectorize the epilogue with 64bit half-SSE vectors!),
which would mean giving the target (some) control over the mode used
for vectorizing the epilogue.   That is, in vect_analyze_loop change

  /* For epilogues start the analysis from the first mode.  The motivation
     behind starting from the beginning comes from cases where the VECTOR_M=
ODES
     array may contain length-agnostic and length-specific modes.  Their
     ordering is not guaranteed, so we could end up picking a mode for the =
main
     loop that is after the epilogue's optimal mode.  */
  vector_modes[0] =3D autodetected_vector_mode;

to go through a target hook (possibly first produce a "candidate mode" set
and allow the target to prune that).  This might be an "easy" fix for the
AVX512 issue for low-trip loops.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations=