From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 947AA384600C; Mon, 16 May 2022 14:08:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 947AA384600C From: "already5chosen at yahoo dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/105617] [12/13 Regression] Slp is maybe too aggressive in some/many cases Date: Mon, 16 May 2022 14:08:29 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.1.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: already5chosen at yahoo dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.2 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2022 14:08:29 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D105617 --- Comment #6 from Michael_S --- (In reply to Michael_S from comment #5) >=20 > Even scalar-to-scalar or vector-to-vector moves that are hoisted at renam= er > does not have a zero cost, because quite often renamer itself constitutes > the narrowest performance bottleneck. But those moves... I don't think th= at > they are hoisted by renamer. I took a look at several Intel and AMD Optimization Reference Manuals and instruction tables. None of existing x86 microarchitectures, either old or = new, eliminates scalar-to-SIMD moves at renamer. Which is sort of obvious for new microarchitectures (Bulldozer or later for AMD, Sandy Bridge or later for Intel), because on these microarchitectures scalar and SIMD registers live = in separate physical register files. As to older microarchitectures, they, may be, had them in the common physic= al storage area, but they simply were not sufficiently smart to eliminate the moves. So, these moves have non-zero latency. On some of the cores, including some= of the newest, the latency is even higher than one clock. And the throughput t= ends to be rather low, most typically, one scalar-to-SIMD move per clock. For comparison, scalar-to-scalar and SIMD-to-SIMD moves can be executed (or eliminated at renamer) at rates of 2, 3 or even 4 per clock.=