From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com
 [IPv6:2a00:1450:4864:20::634])
 by sourceware.org (Postfix) with ESMTPS id C444A3850424
 for <gcc-patches@gcc.gnu.org>; Mon, 23 Nov 2020 08:42:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C444A3850424
Received: by mail-ej1-x634.google.com with SMTP id z5so2015369ejp.4
 for <gcc-patches@gcc.gnu.org>; Mon, 23 Nov 2020 00:42:55 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=122ueRAjVG4xO6i+pSyKQJNbwMZWLeDWGmAWnY6pIHc=;
 b=qDstAz1nKdFrKdpKBq6mfqpe8VOKhJRh/Mvy1Ofq6Z+T6bv7J8seVDr2VWRhbiUcfo
 xp99vDINBZkbyFElxwZQ2ImYOtk5OLNNYiOzjTD+sqTdYGK5vLUSYNK91FAriJZLoKqY
 u2xiZTubh7DwXV1bjs6LcqWRKiMUIoSWy6MmGbpxjNhUXJ2g0+FK2AmdF85FtjOFIHTe
 3udaeA0HeqNzCrNZFJYobcieQLALV4/KvYl/PH17oounq+REx91qGwkLxBcmr5S+JjZs
 JIKTCGzWDZZnG7bvn7b+qG5LyMm4f1ODECkRB1UH4XCtsSj1kYCItSinynebtSU8T2/X
 phRg==
X-Gm-Message-State: AOAM53272GcW2nqP6ddL8WppvOVJPFN6OIInfyS2Pj7oRihoHJZRWZMX
 JxMTXI7Ik+yExZ8/drs+4zpjQKb2+x7xYTJtSc8=
X-Google-Smtp-Source: ABdhPJyC9jajiRe8SIeq3QIqYuCimdHqG5Ic45PN8gRD20ngCJv+KYJ1RAJNpmSy/+XBIJ8d/Hpk9ScsJckzXKit9Qo=
X-Received: by 2002:a17:906:5587:: with SMTP id
 y7mr21181621ejp.138.1606120974808; 
 Mon, 23 Nov 2020 00:42:54 -0800 (PST)
MIME-Version: 1.0
References: <20200820043445.2216872-1-guojiufu@linux.ibm.com>
 <h48r1rm2ms9.fsf@genoa.aus.stglabs.ibm.com>
 <b3afa3f6-19cd-4076-e42a-82ad5d967b34@redhat.com>
 <20201119194206.GX2672@gate.crashing.org>
 <83caaae1-4348-97ee-be9c-5e3c2083729c@redhat.com>
 <20201119200111.GY2672@gate.crashing.org>
 <d8723b5b-d1a5-1ac1-aa5e-ef7ef0424ba0@redhat.com>
 <20201119235620.GB2672@gate.crashing.org>
 <20201120152247.GA97803@kam.mff.cuni.cz>
 <20201120180945.GC2672@gate.crashing.org>
In-Reply-To: <20201120180945.GC2672@gate.crashing.org>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 23 Nov 2020 09:42:43 +0100
Message-ID: <CAFiYyc01dgP-6bm=X6Dajj+f6CGRErxv6J9cXeFG5i5M0NGZng@mail.gmail.com>
Subject: Re: [PATCH] Check calls before loop unrolling
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Jan Hubicka <hubicka@ucw.cz>, GCC Patches <gcc-patches@gcc.gnu.org>, 
 Bill Schmidt <wschmidt@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Nov 2020 08:42:57 -0000

On Fri, Nov 20, 2020 at 7:11 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Fri, Nov 20, 2020 at 04:22:47PM +0100, Jan Hubicka wrote:
> > As you know I spend quite some time on inliner heuristics but even after
> > the years I have no clear idea how the requirements differs from x86-64
> > to ppc, arm and s390.  Clearly compared to x86_64 prologues may get more
> > expensive on ppc/arm because of more registers (so we should inline less
> > to cold code) and function calls are more expensive (so we sould inline
> > more to hot code). We do have PR for that in testusite where most of
> > them I looked through.
>
> I made -fshrink-wrap-separate to make prologues less expensive for stuff
> that is only used on the cold paths.  This matters a lot -- and much
> more could be done there, but that requires changing the generated code,
> not just reordering it, so it is harder to do.
>
> Prologues (and epilogues) are only expensive if they are only needed for
> cold code, in a hot function.
>
> > Problem is that each of us has different metodology - different
> > bechmarks to look at
>
> This is a good thing often as well, it increases our total coverage.
> But if not everything sees all results that also hurts :-/
>
> > and different opinions on what is good for O2 and
> > O3.
>
> Yeah.  The documentation for -O3 merely says "Optimize yet more.", but
> that is no guidance at all: why would a user ever use -O2 then?
>
> I always understood it as "-O2 is always faster than -O1, but -O3 is not
> always faster than -O2".  Aka "-O2 is always a good choice, and -O3 is a
> an even better choice for *some* code, but that needs testing per case".

So basically -O2 is supposed to be well-balanced in compile-time, code-size,
performance and debuggability (if there's sth like that with optimized code...).

-O1 is what you should use for machine-generated code, we kind-of promise
to have no quadratic or worse algorithms in compile-time/memory-use here
so you can throw a multi-gigabyte source function at GCC and it should not
blow up.  And -O1 still optimizes.

-Os is when you want small code size at all cost (compile-time, less
performance).

-O3 is when you want performance at all cost (compile-time + code size and
the ability to debug)

So I'd always use -O2 unless doing a compute workload where I'd chose
-O3 (maybe selective for the relevant TUs).

Then there's profile-feedback and LTO which are enable them if you can
(I'd avoid them for code you need -O1 for).  It really helps GCC to make
an appropriate decision what code to optimize more/less for the goal
of the balanced profile which -O2 is.

> In at least that understanding, and also to battle inflation in general,
> we probably should move some things from -O3 to -O2.
>
> > From long term maintenace POV I am worried about changing a lot of
> > --param defaults in different backends
>
> Me too.  But changing a few key ones is just too important for
> performance :-/
>
> > simply becuase the meaning of
> > those values keeps changing (as early opts improve; we get better on
> > tracking optimizations during IPA passes; and our focus shift from C
> > with sane inlines to basic C++ to heavy templatized C++ with many broken
> > inline hints to heavy C++ with lto).
>
> I don't like if targets start to differ too much (in what generic passes
> effectively do), no matter what.  It's just not maintainable.
>
> > For this reason I tend to preffer to not tweak in taret specific ways
> > unless there is very clear evidence to do so just because I think I will
> > not be able to maintain code quality testing in future.
>
> Yes, completely agreed.  But that exception is important :-)
>
> > It would be very interesting to set up testing that could let us compare
> > basic arches side to side to different defaults. Our LNT testing does
> > good job for x86-64 but we have basically zero coverage publically
> > available on other targets and it is very hard to get inliner relevant
> > banchmarks (where SPEC is not the best choice) done in comparable way on
> > multiple arches.
>
> We cannot help with that on the cfarm, unless we get dedicated hardware
> for such benchmarking (and I am not holding my breath for that, getting
> good coverage at all is hard enough).  So you probably need to get such
> support for every arch separately, elsewhere :-/
>
>
> Segher