From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <hubicka@kam.mff.cuni.cz>
Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16])
 by sourceware.org (Postfix) with ESMTPS id 05AC33858D28;
 Thu, 16 Dec 2021 11:18:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 05AC33858D28
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)
 id 7E213281EAA; Thu, 16 Dec 2021 12:18:18 +0100 (CET)
Date: Thu, 16 Dec 2021 12:18:18 +0100
From: Jan Hubicka <hubicka@kam.mff.cuni.cz>
To: Xionghu Luo <luoxhu@linux.ibm.com>
Cc: wschmidt@linux.ibm.com, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org,
 linkw@gcc.gnu.org, segher@kernel.crashing.org
Subject: Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]
Message-ID: <20211216111818.GE4516@kam.mff.cuni.cz>
References: <20211208055416.1415283-1-luoxhu@linux.ibm.com>
 <20211208055416.1415283-3-luoxhu@linux.ibm.com>
 <20211213092548.GA91590@kam.mff.cuni.cz>
 <5a057da8-677c-b5e9-48b3-2cb434e68505@linux.ibm.com>
 <8af589d9-13c1-2ff8-08d3-7caf98fc037a@linux.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8af589d9-13c1-2ff8-08d3-7caf98fc037a@linux.ibm.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Dec 2021 11:18:23 -0000

> > 
> > 
> > ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
> > HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> > noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
> > Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
> > loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
> > __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
> > loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
> > extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
> > guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
> > negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
> > loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
> > const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
> > indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
> > polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
> > recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
> > goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
> > null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
> > continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
> > loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
> > opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
> > loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
> > loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
> > pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
> > call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
> > opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
> > guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%
> > 
> > HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> > no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
> > first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
> > DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
> > combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%
> > 
> > Loop count: 37870
> >   avg. # of iter: 8444.77
> >   median # of iter: 7.00
> >   avg. (1% cutoff) # of iter: 174.68
> >   avg. (5% cutoff) # of iter: 55.14
> >   avg. (10% cutoff) # of iter: 35.21
> >   avg. (20% cutoff) # of iter: 26.23
> >   avg. (30% cutoff) # of iter: 21.70
> 
> This is the output data collected without the patch, as can be seen, no difference on "extra loop exit".
> But this issue should be fixed.
> 
> 
> ./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/
> 
> benchspec
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688238    3.27G   0.6%                     53%:2
> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303795    3.78G   0.7%                     52%:3
> loop guard                                   1178   1.6%       56.37%   42.54% /  80.32%     7373601533    7.37G   1.4%                     50%:2
> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   5.9%                     21%:2
> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.4%                     18%:1
> loop iterations                              4772   6.3%       99.98%   84.27% /  84.27%    74045982111   74.05G  13.8%
> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.7%                     34%:1
> opcode values nonequal (on trees)           12240  16.2%       70.71%   70.86% /  83.54%    36638772682   36.64G   6.9%
> guessed loop iterations                     16854  22.4%       99.78%   91.21% /  91.22%   169765264401  169.77G  31.7%
> 
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> no prediction                               12731  16.9%       39.30%   33.32% /  79.93%   121106031963  121.11G  22.6%
> first match                                 25366  33.7%       92.20%   88.24% /  88.88%   304047352001  304.05G  56.9%
> DS theory                                   28337  37.6%       63.03%   72.05% /  85.00%   109563734430  109.56G  20.5%
> combined                                    75342 100.0%       73.21%   72.49% /  86.06%   534746603167  534.75G 100.0%

Thank you.  So it seems that the problem does not trigger in Spec but I
was also wondering if our current predict.def values are anywhere near
to reality.

THe table reads as follows:  
 - BRANCHES is number of branches the heuristics hit on (so extra loop
   exit has 80 and therefore we do not have that good statistics on it)
 - HITRATE is the probability that the prediction goes given direction
   during the train run.
   after / is the value which would be reached by perfect predictor
   (which predict branch to the direction that dominates during train)
   Extra loop exit is 81% out of 89% so it is pretty close to optimum
 - COVERAGE is how many times the predicted branch was executed

In general the idea is that for most heuristics (wihch can not determine
exact value like loop iteraitons) HITRATE values can be put to
predict.def so the Dempster-Shafer formula (DS theory) combines the
hypothesis sort of realistically (it assumes that all the predictors are
staistically independent which they are not).

We have HITRATE 67 for extra loop exit which is bit off what we do have
in the measured data, but I think our predict.def is still based on
spec2006 numbers.

So the patch is OK.  Perhaps we could experiment with updating
predict.def (It does develop even when run across same benchmark suite
since early optimizations change - this stage1 I think the threading
work definitly affects the situation substantially)

Honza
> 
> Loop count: 38058
>   avg. # of iter: 8403.32
>   median # of iter: 7.00
>   avg. (1% cutoff) # of iter: 173.72
>   avg. (5% cutoff) # of iter: 54.90
>   avg. (10% cutoff) # of iter: 35.20
>   avg. (20% cutoff) # of iter: 26.35
>   avg. (30% cutoff) # of iter: 21.87
> 
> 
> -- 
> Thanks,
> Xionghu