From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id F0E4C3857C62; Wed, 24 Nov 2021 20:38:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F0E4C3857C62 From: "jschoen4 at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 Date: Wed, 24 Nov 2021 20:38:59 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: jschoen4 at gmail dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Nov 2021 20:39:00 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103393 --- Comment #4 from John S --- I can Confirm from my side that it does appear to be the memmove inline expansion and not the auto vectorizer. It also occurs with builtin_memset/builtin_memcpy as well. For some context, this is an issue would prevent the usage of gcc in my production environment. It will certainly impact other use cases outside o= f my own as well. For example, it becomes impossible to use "-mno-vzeroupper -m= avx -mpreferred-vector-width=3D128" and use _mm256_xxx + _mm256_zeroupper() intrinsics to properly manage the ymm state (clear or not) since the compil= er is now able to insert ymm's almost anywhere via the memmove inlining. Up until now the prefer-width has always behaved as in a way that all auto generated vector uses will not exceed the preferred width. Only explicit u= se of the _mm256/_mm512_ .. intrinsics or the "vector types" i.e. `__m256 var; __m512 var;` would result in wider register usage. I do believe Clang/icc behave this way as well and there are dependencies on this behavior. The same also applies w/ avx-512 enabled with ZMM usage + prefer=3D128/256 where the downclocking issues can be even more pronounced.=