From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id F12BF3846047 for ; Fri, 20 Nov 2020 18:10:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org F12BF3846047 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=segher@kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 0AKI9kln016794; Fri, 20 Nov 2020 12:09:46 -0600 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 0AKI9j5I016793; Fri, 20 Nov 2020 12:09:45 -0600 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Fri, 20 Nov 2020 12:09:45 -0600 From: Segher Boessenkool To: Jan Hubicka Cc: Jeff Law , wschmidt@linux.ibm.com, gcc-patches@gcc.gnu.org, dje.gcc@gmail.com Subject: Re: [PATCH] Check calls before loop unrolling Message-ID: <20201120180945.GC2672@gate.crashing.org> References: <20200820043445.2216872-1-guojiufu@linux.ibm.com> <20201119194206.GX2672@gate.crashing.org> <83caaae1-4348-97ee-be9c-5e3c2083729c@redhat.com> <20201119200111.GY2672@gate.crashing.org> <20201119235620.GB2672@gate.crashing.org> <20201120152247.GA97803@kam.mff.cuni.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201120152247.GA97803@kam.mff.cuni.cz> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, TXREP, T_SPF_HELO_PERMERROR, T_SPF_PERMERROR autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Nov 2020 18:10:48 -0000 Hi! On Fri, Nov 20, 2020 at 04:22:47PM +0100, Jan Hubicka wrote: > As you know I spend quite some time on inliner heuristics but even after > the years I have no clear idea how the requirements differs from x86-64 > to ppc, arm and s390. Clearly compared to x86_64 prologues may get more > expensive on ppc/arm because of more registers (so we should inline less > to cold code) and function calls are more expensive (so we sould inline > more to hot code). We do have PR for that in testusite where most of > them I looked through. I made -fshrink-wrap-separate to make prologues less expensive for stuff that is only used on the cold paths. This matters a lot -- and much more could be done there, but that requires changing the generated code, not just reordering it, so it is harder to do. Prologues (and epilogues) are only expensive if they are only needed for cold code, in a hot function. > Problem is that each of us has different metodology - different > bechmarks to look at This is a good thing often as well, it increases our total coverage. But if not everything sees all results that also hurts :-/ > and different opinions on what is good for O2 and > O3. Yeah. The documentation for -O3 merely says "Optimize yet more.", but that is no guidance at all: why would a user ever use -O2 then? I always understood it as "-O2 is always faster than -O1, but -O3 is not always faster than -O2". Aka "-O2 is always a good choice, and -O3 is a an even better choice for *some* code, but that needs testing per case". In at least that understanding, and also to battle inflation in general, we probably should move some things from -O3 to -O2. > From long term maintenace POV I am worried about changing a lot of > --param defaults in different backends Me too. But changing a few key ones is just too important for performance :-/ > simply becuase the meaning of > those values keeps changing (as early opts improve; we get better on > tracking optimizations during IPA passes; and our focus shift from C > with sane inlines to basic C++ to heavy templatized C++ with many broken > inline hints to heavy C++ with lto). I don't like if targets start to differ too much (in what generic passes effectively do), no matter what. It's just not maintainable. > For this reason I tend to preffer to not tweak in taret specific ways > unless there is very clear evidence to do so just because I think I will > not be able to maintain code quality testing in future. Yes, completely agreed. But that exception is important :-) > It would be very interesting to set up testing that could let us compare > basic arches side to side to different defaults. Our LNT testing does > good job for x86-64 but we have basically zero coverage publically > available on other targets and it is very hard to get inliner relevant > banchmarks (where SPEC is not the best choice) done in comparable way on > multiple arches. We cannot help with that on the cfarm, unless we get dedicated hardware for such benchmarking (and I am not holding my breath for that, getting good coverage at all is hard enough). So you probably need to get such support for every arch separately, elsewhere :-/ Segher