From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id F24D23858D28; Mon, 7 Aug 2023 09:10:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F24D23858D28 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1691399456; bh=TVwxnZ5xpmYezWeGbraqxJIfd8PXEpl1J9G/nDtogNU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Mjs4Zq1ZAX1zIJtWNiw6eWJoNyomtYVLk27yGlA+Y7nkPALMOa/p3BRTKUJ+IlY3U EvggE6ASA71zLMHRzUQKqKNEmmJ1Vh0phwfdr+cYNA7bnIKqGodgbC7W9ePmaMkq9M bMHquCvuTjM20Ym/iqJ4bYfCmkVop8I6oV6uAgxQ= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/49955] Fails to do partial basic-block SLP Date: Mon, 07 Aug 2023 09:10:50 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 4.7.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D49955 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot = gnu.org --- Comment #5 from Richard Biener --- The loop in comment#1 isn't vectorized because we do not have interleaving support for a group size of 5: t.f:18:17: missed: the size of the group of accesses is not a power of 2 = or not equal to 3 t.f:18:17: missed: not falling back to elementwise accesses t.f:19:72: missed: not vectorized: relevant stmt not supported: t1_83 =3D (*q_82(D))[_21]; t.f:18:17: missed: bad operation or unsupported loop bound. we don't try to SLP this because there's just a single lane reduction. The= re's not really a loop vectorization opportunity and as comment#3 says there's at most a BB reduction opportunity. We try to analyze that now: _58 =3D powmult_9 + powmult_107; t7_108 =3D _58 + powmult_88; t7_109 =3D __builtin_sqrt (t7_108); M.7_110 =3D MAX_EXPR ; and t.f:28:72: note: Starting SLP discovery for t.f:28:72: note: powmult_88 =3D _106 * _106; t.f:28:72: note: powmult_9 =3D _101 * _101; t.f:28:72: note: powmult_107 =3D _96 * _96; t.f:28:72: note: starting SLP discovery for node 0x50ef8a0 t.f:28:72: note: Build SLP for powmult_88 =3D _106 * _106; t.f:28:72: note: get vectype for scalar type (group size 3): real(kind=3D= 8) t.f:28:72: note: vectype: vector(2) real(kind=3D8) t.f:28:72: note: nunits =3D 2 t.f:28:72: missed: Build SLP failed: unrolling required in basic block SLP we do not yet have code to limit a BB reduction vectorization to a subset of lanes (in this case it's uniform so choosing any power-of-two elements would work but ideally we'd let SLP discovery figure out the "best" lane combination to vectorize - there's more missing support for BB reduction vectorization).=