From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <segher@kernel.crashing.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
 by sourceware.org (Postfix) with ESMTP id F12BF3846047
 for <gcc-patches@gcc.gnu.org>; Fri, 20 Nov 2020 18:10:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org F12BF3846047
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=kernel.crashing.org
Authentication-Results: sourceware.org;
 spf=fail smtp.mailfrom=segher@kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
 by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 0AKI9kln016794;
 Fri, 20 Nov 2020 12:09:46 -0600
Received: (from segher@localhost)
 by gate.crashing.org (8.14.1/8.14.1/Submit) id 0AKI9j5I016793;
 Fri, 20 Nov 2020 12:09:45 -0600
X-Authentication-Warning: gate.crashing.org: segher set sender to
 segher@kernel.crashing.org using -f
Date: Fri, 20 Nov 2020 12:09:45 -0600
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Jan Hubicka <hubicka@ucw.cz>
Cc: Jeff Law <law@redhat.com>, wschmidt@linux.ibm.com, gcc-patches@gcc.gnu.org,
 dje.gcc@gmail.com
Subject: Re: [PATCH] Check calls before loop unrolling
Message-ID: <20201120180945.GC2672@gate.crashing.org>
References: <20200820043445.2216872-1-guojiufu@linux.ibm.com>
 <h48r1rm2ms9.fsf@genoa.aus.stglabs.ibm.com>
 <b3afa3f6-19cd-4076-e42a-82ad5d967b34@redhat.com>
 <20201119194206.GX2672@gate.crashing.org>
 <83caaae1-4348-97ee-be9c-5e3c2083729c@redhat.com>
 <20201119200111.GY2672@gate.crashing.org>
 <d8723b5b-d1a5-1ac1-aa5e-ef7ef0424ba0@redhat.com>
 <20201119235620.GB2672@gate.crashing.org>
 <20201120152247.GA97803@kam.mff.cuni.cz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20201120152247.GA97803@kam.mff.cuni.cz>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL,
 KAM_DMARC_STATUS, TXREP, T_SPF_HELO_PERMERROR,
 T_SPF_PERMERROR autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Nov 2020 18:10:48 -0000

Hi!

On Fri, Nov 20, 2020 at 04:22:47PM +0100, Jan Hubicka wrote:
> As you know I spend quite some time on inliner heuristics but even after
> the years I have no clear idea how the requirements differs from x86-64
> to ppc, arm and s390.  Clearly compared to x86_64 prologues may get more
> expensive on ppc/arm because of more registers (so we should inline less
> to cold code) and function calls are more expensive (so we sould inline
> more to hot code). We do have PR for that in testusite where most of
> them I looked through.

I made -fshrink-wrap-separate to make prologues less expensive for stuff
that is only used on the cold paths.  This matters a lot -- and much
more could be done there, but that requires changing the generated code,
not just reordering it, so it is harder to do.

Prologues (and epilogues) are only expensive if they are only needed for
cold code, in a hot function.

> Problem is that each of us has different metodology - different
> bechmarks to look at

This is a good thing often as well, it increases our total coverage.
But if not everything sees all results that also hurts :-/

> and different opinions on what is good for O2 and
> O3.

Yeah.  The documentation for -O3 merely says "Optimize yet more.", but
that is no guidance at all: why would a user ever use -O2 then?

I always understood it as "-O2 is always faster than -O1, but -O3 is not
always faster than -O2".  Aka "-O2 is always a good choice, and -O3 is a
an even better choice for *some* code, but that needs testing per case".

In at least that understanding, and also to battle inflation in general,
we probably should move some things from -O3 to -O2.

> From long term maintenace POV I am worried about changing a lot of
> --param defaults in different backends

Me too.  But changing a few key ones is just too important for
performance :-/

> simply becuase the meaning of
> those values keeps changing (as early opts improve; we get better on
> tracking optimizations during IPA passes; and our focus shift from C
> with sane inlines to basic C++ to heavy templatized C++ with many broken
> inline hints to heavy C++ with lto).

I don't like if targets start to differ too much (in what generic passes
effectively do), no matter what.  It's just not maintainable.

> For this reason I tend to preffer to not tweak in taret specific ways
> unless there is very clear evidence to do so just because I think I will
> not be able to maintain code quality testing in future.

Yes, completely agreed.  But that exception is important :-)

> It would be very interesting to set up testing that could let us compare
> basic arches side to side to different defaults. Our LNT testing does
> good job for x86-64 but we have basically zero coverage publically
> available on other targets and it is very hard to get inliner relevant
> banchmarks (where SPEC is not the best choice) done in comparable way on
> multiple arches.

We cannot help with that on the cfarm, unless we get dedicated hardware
for such benchmarking (and I am not holding my breath for that, getting
good coverage at all is hard enough).  So you probably need to get such
support for every arch separately, elsewhere :-/


Segher