From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 0DF023858D28; Tue, 1 Aug 2023 08:44:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0DF023858D28 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=ucw.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kam.mff.cuni.cz Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 16CA32828C9; Tue, 1 Aug 2023 10:44:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucw.cz; s=gen1; t=1690879442; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6WpALreZ/SuZFZ39aD5KjUU3Ldlu2nAPjkww96PYIgQ=; b=WLivz3utfaSVGWxZKcN2+D4LoQXH32XzPMMm6Nal2USclQRR9sn3pNRKm0qmCEdAn3zLt9 JtHib2Ot8U7wlik5SJZXo6PwpOZhpSrGZrwX7LzvWsnHHnpebsCF0Bzjm1tnZECrePwf8S Yi7xbVIrZf6Wb03So/qo05jM68TeeO8= Date: Tue, 1 Aug 2023 10:44:02 +0200 From: Jan Hubicka To: Richard Biener Cc: Changbin Du , gcc@gcc.gnu.org, gcc-bugs@gcc.gnu.org, Ning Jia , Li Yu , Wang Nan , Hui Wang Subject: Re: [Predicated Ins vs Branches] O3 and PGO result in 2x performance drop relative to O2 Message-ID: References: <20230731125535.wpgchdsjegx2yg4h@M910t> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,JMQ_SPF_NEUTRAL,KAM_NUMSUBJECT,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > > If I comment it out as above patch, then O3/PGO can get 16% and 12% performance > > improvement compared to O2 on x86. > > > > O2 O3 PGO > > cycles 2,497,674,824 2,104,993,224 2,199,753,593 > > instructions 10,457,508,646 9,723,056,131 10,457,216,225 > > branches 2,303,029,380 2,250,522,323 2,302,994,942 > > branch-misses 0.00% 0.01% 0.01% > > > > The main difference in the compilation output about code around the miss-prediction > > branch is: > > o In O2: predicated instruction (cmov here) is selected to eliminate above > > branch. cmov is true better than branch here. > > o In O3/PGO: bitout() is inlined into encode_file(), and branch instruction > > is selected. But this branch is obviously *unpredictable* and the compiler > > doesn't know it. This why O3/PGO are are so bad for this program. > > > > Gcc doesn't support __builtin_unpredictable() which has been introduced by llvm. > > Then I tried to see if __builtin_expect_with_probability(e,x, 0.5) can serve the > > same purpose. The result is negative. > > But does it appear to be predictable with your profiling data? Also one thing is that __builtin_expect and __builtin_expect_with_probability only affects the static branch prediciton algorithm, so with profile feedback they are ignored on every branch executed at least once during the train run. setting probability 0.5 is really not exactly the same as hint that the branch will be mispredicted, since modern CPUs handle well regularly behaving branchs (such as a branch firing every even iteration of loop). So I think having the builting is not a bad idea. I was thinking if it makes sense to represent it withing profile_probability type and I am not convinced, since "unpredictable probability" sounds counceptually odd and we would need to keep the flag intact over all probability updates we do. For things like loop exits we recompute probabilities from frequencies after unrolling/vectorizaiton and other things and we would need to invent new API to propagate the flag from previous probability (which is not even part of the computation right now) So I guess the challenge is how to pass this info down through the optimization pipeline, since we would need to annotate gimple conds/switches and manage it to RTL level. On gimple we have flags and on rtl level notes so there is space for it, but we would need to maintain the info through CFG changes. Auto-FDO may be interesting way to detect such branches. Honza > > > I think we could come to a conclusion that there must be something can improve in > > Gcc's heuristic strategy about Predicated Instructions and branches, at least > > for O3 and PGO. > > > > And can we add __builtin_unpredictable() support for Gcc? As usually it's hard > > for the compiler to detect unpredictable branches. > > > > -- > > Cheers, > > Changbin Du