From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ouLs=DS=kam.mff.cuni.cz=hubicka@sourceware.org>
Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16])
	by sourceware.org (Postfix) with ESMTPS id 0DF023858D28;
	Tue,  1 Aug 2023 08:44:04 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0DF023858D28
Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=ucw.cz
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kam.mff.cuni.cz
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)
	id 16CA32828C9; Tue,  1 Aug 2023 10:44:02 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucw.cz; s=gen1;
	t=1690879442;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=6WpALreZ/SuZFZ39aD5KjUU3Ldlu2nAPjkww96PYIgQ=;
	b=WLivz3utfaSVGWxZKcN2+D4LoQXH32XzPMMm6Nal2USclQRR9sn3pNRKm0qmCEdAn3zLt9
	JtHib2Ot8U7wlik5SJZXo6PwpOZhpSrGZrwX7LzvWsnHHnpebsCF0Bzjm1tnZECrePwf8S
	Yi7xbVIrZf6Wb03So/qo05jM68TeeO8=
Date: Tue, 1 Aug 2023 10:44:02 +0200
From: Jan Hubicka <hubicka@ucw.cz>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>, gcc@gcc.gnu.org,
	gcc-bugs@gcc.gnu.org, Ning Jia <ning.jia@huawei.com>,
	Li Yu <marvin.tms@huawei.com>, Wang Nan <wangnan0@huawei.com>,
	Hui Wang <hw.huiwang@huawei.com>
Subject: Re: [Predicated Ins vs Branches] O3 and PGO result in 2x performance
 drop relative to O2
Message-ID: <ZMjF0jYKFMb2DVPP@kam.mff.cuni.cz>
References: <20230731125535.wpgchdsjegx2yg4h@M910t>
 <CAFiYyc2TtaSZtcmY0XjYzxMubNMSuuqvTHS_OzKTLc_M8Mg=Aw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAFiYyc2TtaSZtcmY0XjYzxMubNMSuuqvTHS_OzKTLc_M8Mg=Aw@mail.gmail.com>
X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,JMQ_SPF_NEUTRAL,KAM_NUMSUBJECT,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc.gcc.gnu.org>

> > If I comment it out as above patch, then O3/PGO can get 16% and 12% performance
> > improvement compared to O2 on x86.
> >
> >                         O2              O3              PGO
> > cycles                  2,497,674,824   2,104,993,224   2,199,753,593
> > instructions            10,457,508,646  9,723,056,131   10,457,216,225
> > branches                2,303,029,380   2,250,522,323   2,302,994,942
> > branch-misses           0.00%           0.01%           0.01%
> >
> > The main difference in the compilation output about code around the miss-prediction
> > branch is:
> >   o In O2: predicated instruction (cmov here) is selected to eliminate above
> >     branch. cmov is true better than branch here.
> >   o In O3/PGO: bitout() is inlined into encode_file(), and branch instruction
> >     is selected. But this branch is obviously *unpredictable* and the compiler
> >     doesn't know it. This why O3/PGO are are so bad for this program.
> >
> > Gcc doesn't support __builtin_unpredictable() which has been introduced by llvm.
> > Then I tried to see if __builtin_expect_with_probability(e,x, 0.5) can serve the
> > same purpose. The result is negative.
> 
> But does it appear to be predictable with your profiling data?

Also one thing is that __builtin_expect and
__builtin_expect_with_probability only affects the static branch
prediciton algorithm, so with profile feedback they are ignored on every
branch executed at least once during the train run.

setting probability 0.5 is really not exactly the same as hint that the
branch will be mispredicted, since modern CPUs handle well regularly
behaving branchs (such as a branch firing every even iteration of loop).

So I think having the builting is not a bad idea.  I was thinking if it
makes sense to represent it withing profile_probability type and I am
not convinced, since "unpredictable probability" sounds counceptually
odd and we would need to keep the flag intact over all probability
updates we do.  For things like loop exits we recompute probabilities
from frequencies after unrolling/vectorizaiton and other things and we
would need to invent new API to propagate the flag from previous
probability (which is not even part of the computation right now)

So I guess the challenge is how to pass this info down through the
optimization pipeline, since we would need to annotate gimple
conds/switches and manage it to RTL level.  On gimple we have flags and
on rtl level notes so there is space for it, but we would need to
maintain the info through CFG changes.

Auto-FDO may be interesting way to detect such branches.

Honza
> 
> > I think we could come to a conclusion that there must be something can improve in
> > Gcc's heuristic strategy about Predicated Instructions and branches, at least
> > for O3 and PGO.
> >
> > And can we add __builtin_unpredictable() support for Gcc? As usually it's hard
> > for the compiler to detect unpredictable branches.
> >
> > --
> > Cheers,
> > Changbin Du