From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 42CE93858C39 for ; Wed, 4 Aug 2021 21:19:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 42CE93858C39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 174LIFDP032704; Wed, 4 Aug 2021 16:18:15 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 174LICPb032701; Wed, 4 Aug 2021 16:18:12 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Wed, 4 Aug 2021 16:18:12 -0500 From: Segher Boessenkool To: Richard Biener , Hongtao Liu , Jan Hubicka , gcc-help , bin.cheng@linux.alibaba.com, 172060045@hdu.edu.cn, richard.sandiford@arm.com Subject: Re: Why vectorization didn't turn on by -O2 Message-ID: <20210804211812.GK1583@gate.crashing.org> References: <20210510092440.GY10366@gate.crashing.org> <20210517160309.GA27888@kam.mff.cuni.cz> <20210804095643.GC1583@gate.crashing.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, KAM_NUMSUBJECT, TXREP, T_SPF_HELO_PERMERROR, T_SPF_PERMERROR autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-help@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-help mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2021 21:19:21 -0000 On Wed, Aug 04, 2021 at 11:22:53AM +0100, Richard Sandiford wrote: > Segher Boessenkool writes: > > On Wed, Aug 04, 2021 at 10:10:36AM +0100, Richard Sandiford wrote: > >> Richard Biener writes: > >> > Alternatively only enable loop vectorization at -O2 (the above checks > >> > flag_tree_slp_vectorize as well). At least the cost model kind > >> > does not have any influence on BB vectorization, that is, we get the > >> > same pros and cons as we do for -O3. > >> > >> Yeah, but a lot of the loop vector cost model choice is about controlling > >> code size growth and avoiding excessive runtime versioning tests. > > > > Both of those depend a lot on the target, and target-specific conditions > > as well (which CPU model is selected for example). Can we factor that > > in somehow? Maybe we need some target hook that returns the expected > > percentage code growth for vectorising a given loop, for example, and > > -O2 vs. -O3 then selects what percentage is acceptable. > > > >> BB SLP > >> should be a win on both code size and performance (barring significant > >> target costing issues). > > > > Yeah -- but this could use a similar hook as well (just a straightline > > piece of code instead of a loop). > > I think anything like that should be driven by motivating use cases. > It's not something that we can easily decide in the abstract. > > The results so far with using very-cheap at -O2 have been promising, > so I don't think new hooks should block that becoming the default. Right, but it wouldn't hurt to think a sec if we are on the right path forward. It's is crystal clear that to make good decisions about what and how to vectorise you need to take *some* target characteristics into account, and that will have to happen sooner rather than later. This was all in reply to > >> Yeah, but a lot of the loop vector cost model choice is about controlling > >> code size growth and avoiding excessive runtime versioning tests. It was not meant to hold up these patches :-) > >> PR100089 was an exception because we ended up keeping unvectorised > >> scalar code that would never have existed otherwise. BB SLP proper > >> shouldn't have that problem. > > > > It also is a tiny piece of code. There will always be tiny examples > > that are much worse (or much better) than average. > > Yeah, what makes PR100089 important isn't IMO the test itself, but the > underlying problem that the PR exposed. Enabling this “BB SLP in loop > vectorisation” code can lead to the generation of scalar COND_EXPRs even > though we know that ifcvt doesn't have a proper cost model for deciding > whether scalar COND_EXPRs are a win. > > Introducing scalar COND_EXPRs at -O3 is arguably an acceptable risk > (although still dubious), but I think it's something we need to avoid > for -O2, even if that means losing the optimisation. Yeah -- -O2 should almost always do the right thing, while -O3 can do bad things more often, it just has to be better "on average". Segher