From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <segher@kernel.crashing.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
 by sourceware.org (Postfix) with ESMTP id 42CE93858C39
 for <gcc-help@gcc.gnu.org>; Wed,  4 Aug 2021 21:19:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 42CE93858C39
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=kernel.crashing.org
Authentication-Results: sourceware.org;
 spf=fail smtp.mailfrom=kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
 by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 174LIFDP032704;
 Wed, 4 Aug 2021 16:18:15 -0500
Received: (from segher@localhost)
 by gate.crashing.org (8.14.1/8.14.1/Submit) id 174LICPb032701;
 Wed, 4 Aug 2021 16:18:12 -0500
X-Authentication-Warning: gate.crashing.org: segher set sender to
 segher@kernel.crashing.org using -f
Date: Wed, 4 Aug 2021 16:18:12 -0500
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Richard Biener <rguenther@suse.de>, Hongtao Liu <crazylht@gmail.com>,
 Jan Hubicka <hubicka@ucw.cz>, gcc-help <gcc-help@gcc.gnu.org>,
 bin.cheng@linux.alibaba.com, 172060045@hdu.edu.cn,
 richard.sandiford@arm.com
Subject: Re: Why vectorization didn't turn on by -O2
Message-ID: <20210804211812.GK1583@gate.crashing.org>
References: <nycvar.YFH.7.76.2105101018060.9200@zhemvz.fhfr.qr>
 <20210510092440.GY10366@gate.crashing.org>
 <20210517160309.GA27888@kam.mff.cuni.cz> <mpt7djxw5n6.fsf@arm.com>
 <CAMZc-bx-ieOvN7phQrWpjZbZEemNEe08LNaM-7As9C4wny9n3A@mail.gmail.com>
 <mptbl6daab0.fsf@arm.com>
 <nycvar.YFH.7.76.2108041028300.11781@zhemvz.fhfr.qr>
 <mptbl6d8tir.fsf@arm.com> <20210804095643.GC1583@gate.crashing.org>
 <mpttuk57blu.fsf@arm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <mpttuk57blu.fsf@arm.com>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL,
 KAM_DMARC_STATUS, KAM_NUMSUBJECT, TXREP, T_SPF_HELO_PERMERROR,
 T_SPF_PERMERROR autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-help@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-help mailing list <gcc-help.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-help>,
 <mailto:gcc-help-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-help/>
List-Post: <mailto:gcc-help@gcc.gnu.org>
List-Help: <mailto:gcc-help-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-help>,
 <mailto:gcc-help-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Aug 2021 21:19:21 -0000

On Wed, Aug 04, 2021 at 11:22:53AM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Wed, Aug 04, 2021 at 10:10:36AM +0100, Richard Sandiford wrote:
> >> Richard Biener <rguenther@suse.de> writes:
> >> > Alternatively only enable loop vectorization at -O2 (the above checks
> >> > flag_tree_slp_vectorize as well).  At least the cost model kind
> >> > does not have any influence on BB vectorization, that is, we get the
> >> > same pros and cons as we do for -O3.
> >> 
> >> Yeah, but a lot of the loop vector cost model choice is about controlling
> >> code size growth and avoiding excessive runtime versioning tests.
> >
> > Both of those depend a lot on the target, and target-specific conditions
> > as well (which CPU model is selected for example).  Can we factor that
> > in somehow?  Maybe we need some target hook that returns the expected
> > percentage code growth for vectorising a given loop, for example, and
> > -O2 vs. -O3 then selects what percentage is acceptable.
> >
> >> BB SLP
> >> should be a win on both code size and performance (barring significant
> >> target costing issues).
> >
> > Yeah -- but this could use a similar hook as well (just a straightline
> > piece of code instead of a loop).
> 
> I think anything like that should be driven by motivating use cases.
> It's not something that we can easily decide in the abstract.
> 
> The results so far with using very-cheap at -O2 have been promising,
> so I don't think new hooks should block that becoming the default.

Right, but it wouldn't hurt to think a sec if we are on the right path
forward.  It's is crystal clear that to make good decisions about what
and how to vectorise you need to take *some* target characteristics into
account, and that will have to happen sooner rather than later.

This was all in reply to

> >> Yeah, but a lot of the loop vector cost model choice is about controlling
> >> code size growth and avoiding excessive runtime versioning tests.

It was not meant to hold up these patches :-)

> >> PR100089 was an exception because we ended up keeping unvectorised
> >> scalar code that would never have existed otherwise.  BB SLP proper
> >> shouldn't have that problem.
> >
> > It also is a tiny piece of code.  There will always be tiny examples
> > that are much worse (or much better) than average.
> 
> Yeah, what makes PR100089 important isn't IMO the test itself, but the
> underlying problem that the PR exposed.  Enabling this “BB SLP in loop
> vectorisation” code can lead to the generation of scalar COND_EXPRs even
> though we know that ifcvt doesn't have a proper cost model for deciding
> whether scalar COND_EXPRs are a win.
> 
> Introducing scalar COND_EXPRs at -O3 is arguably an acceptable risk
> (although still dubious), but I think it's something we need to avoid
> for -O2, even if that means losing the optimisation.

Yeah -- -O2 should almost always do the right thing, while -O3 can do
bad things more often, it just has to be better "on average".


Segher