From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 255A83857033; Wed, 27 Jul 2022 07:27:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 255A83857033 From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/106346] [11/12/13 Regression] Potential regression on vectorization of left shift with constants since r11-5160-g9fc9573f9a5e94 Date: Wed, 27 Jul 2022 07:27:33 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: tnfchris at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.5 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: short_desc priority cc assigned_to target_milestone bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2022 07:27:34 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106346 Tamar Christina changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Potential regression on |[11/12/13 Regression] |vectorization of left shift |Potential regression on |with constants since |vectorization of left shift |r11-5160-g9fc9573f9a5e94 |with constants since | |r11-5160-g9fc9573f9a5e94 Priority|P3 |P2 CC| |rguenth at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot= gnu.org Target Milestone|--- |11.5 Status|NEW |ASSIGNED --- Comment #4 from Tamar Christina --- I believe the problem is actually g:27842e2a1eb26a7eae80b8efd98fb8c8bd74a68e We added an optab for the widening left shift pattern there however the operation requires a uniform shift constant to work. See https://godbolt.org/z/4hqKc69Ke The existing pattern that deals with this is vect_recog_widen_shift_pattern which is a scalar pattern. during build_slp it validates that constants are the same and when they're not it aborts SLP. This is why we lose vectorization. Eventually we hit V4HI for which we have no widening shift optab for and it vectorizes using that low VF. This example shows a number of things wrong: 1. The generic costing seems off, this sequence shouldn't have been generat= ed, as a vector sequence it's more inefficient than the scalar sequence. Using -mcpu=3Dneover-n1 or any other costing structure correctly only gives scala= r. 2. vect_recog_widen_shift_pattern is implemented in the wrong place. It predates the existence of the SLP pattern matcher. Because of the uniform requirements it's better to use the SLP pattern matcher where we have acces= s to all the constants to decide whether the pattern is a match or not. That wa= y we don't abort SLP. Are you ok with this as a fix Richi? 3. The epilogue costing seems off.. This example https://godbolt.org/z/YoPcWv6Td ends up generating an exceptionally high epilogue cost and so thinks vectorization at the higher = VF is not profitable. *src1_18(D) 1 times vec_to_scalar costs 2 in epilogue MEM[(uint16_t *)src1_18(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue MEM[(uint16_t *)src1_18(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue MEM[(uint16_t *)src1_18(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue MEM[(uint16_t *)src1_18(D) + 8B] 1 times vec_to_scalar costs 2 in epilogue MEM[(uint16_t *)src1_18(D) + 10B] 1 times vec_to_scalar costs 2 in epilogue MEM[(uint16_t *)src1_18(D) + 12B] 1 times vec_to_scalar costs 2 in epilogue MEM[(uint16_t *)src1_18(D) + 14B] 1 times vec_to_scalar costs 2 in epilogue /app/example.c:16:12: note: Cost model analysis for part in loop 0: Vector cost: 23 Scalar cost: 17 For some reason it thinks it needs a scalar epilogue? using -fno-vect-cost-model gives the desired codegen.=