From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 255A83857033; Wed, 27 Jul 2022 07:27:34 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 255A83857033
From: "tnfchris at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/106346] [11/12/13 Regression] Potential regression on
 vectorization of left shift with constants since r11-5160-g9fc9573f9a5e94
Date: Wed, 27 Jul 2022 07:27:33 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: tnfchris at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: tnfchris at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 11.5
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: short_desc priority cc assigned_to
 target_milestone bug_status
Message-ID: <bug-106346-4-T7mYlRSBOE@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-106346-4@http.gcc.gnu.org/bugzilla/>
References: <bug-106346-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2022 07:27:34 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106346

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Potential regression on     |[11/12/13 Regression]
                   |vectorization of left shift |Potential regression on
                   |with constants since        |vectorization of left shift
                   |r11-5160-g9fc9573f9a5e94    |with constants since
                   |                            |r11-5160-g9fc9573f9a5e94
           Priority|P3                          |P2
                 CC|                            |rguenth at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |tnfchris at gcc dot=
 gnu.org
   Target Milestone|---                         |11.5
             Status|NEW                         |ASSIGNED
--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
I believe the problem is actually g:27842e2a1eb26a7eae80b8efd98fb8c8bd74a68e

We added an optab for the widening left shift pattern there however the
operation requires a uniform shift constant to work. See
https://godbolt.org/z/4hqKc69Ke

The existing pattern that deals with this is vect_recog_widen_shift_pattern
which is a scalar pattern.  during build_slp it validates that constants are
the same and when they're not it aborts SLP.  This is why we lose
vectorization.  Eventually we hit V4HI for which we have no widening shift
optab for and it vectorizes using that low VF.

This example shows a number of things wrong:

1. The generic costing seems off, this sequence shouldn't have been generat=
ed,
as a vector sequence it's more inefficient than the scalar sequence. Using
-mcpu=3Dneover-n1 or any other costing structure correctly only gives scala=
r.

2. vect_recog_widen_shift_pattern is implemented in the wrong place.  It
predates the existence of the SLP pattern matcher. Because of the uniform
requirements it's better to use the SLP pattern matcher where we have acces=
s to
all the constants to decide whether the pattern is a match or not.  That wa=
y we
don't abort SLP. Are you ok with this as a fix Richi?

3. The epilogue costing seems off..

This example https://godbolt.org/z/YoPcWv6Td ends up generating an
exceptionally high epilogue cost and so thinks vectorization at the higher =
VF
is not profitable.

*src1_18(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 8B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 10B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 12B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 14B] 1 times vec_to_scalar costs 2 in epilogue
/app/example.c:16:12: note: Cost model analysis for part in loop 0:
  Vector cost: 23
  Scalar cost: 17

For some reason it thinks it needs a scalar epilogue? using
-fno-vect-cost-model gives the desired codegen.=