From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <luoxhu@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com
 [148.163.158.5])
 by sourceware.org (Postfix) with ESMTPS id 9829D3858C3A;
 Wed, 15 Dec 2021 06:40:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9829D3858C3A
Received: from pps.filterd (m0098414.ppops.net [127.0.0.1])
 by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1BF4PEC0010280; 
 Wed, 15 Dec 2021 06:40:22 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0b-001b2d01.pphosted.com with ESMTP id 3cy2t75jcn-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 15 Dec 2021 06:40:22 +0000
Received: from m0098414.ppops.net (m0098414.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1BF6eL2P029547;
 Wed, 15 Dec 2021 06:40:21 GMT
Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com
 [169.51.49.98])
 by mx0b-001b2d01.pphosted.com with ESMTP id 3cy2t75jbw-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 15 Dec 2021 06:40:21 +0000
Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1])
 by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1BF6MYfx019700;
 Wed, 15 Dec 2021 06:40:20 GMT
Received: from b06cxnps4076.portsmouth.uk.ibm.com
 (d06relay13.portsmouth.uk.ibm.com [9.149.109.198])
 by ppma03ams.nl.ibm.com with ESMTP id 3cy7jqsp5h-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 15 Dec 2021 06:40:19 +0000
Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com
 [9.149.105.62])
 by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 1BF6eHEv32506214
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Wed, 15 Dec 2021 06:40:17 GMT
Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 83C7CAE058;
 Wed, 15 Dec 2021 06:40:17 +0000 (GMT)
Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 70464AE053;
 Wed, 15 Dec 2021 06:40:15 +0000 (GMT)
Received: from [9.200.154.17] (unknown [9.200.154.17])
 by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTPS;
 Wed, 15 Dec 2021 06:40:15 +0000 (GMT)
Message-ID: <8af589d9-13c1-2ff8-08d3-7caf98fc037a@linux.ibm.com>
Date: Wed, 15 Dec 2021 14:40:12 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
 Gecko/20100101 Thunderbird/91.4.0
Subject: Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]
Content-Language: en-US
To: Jan Hubicka <hubicka@kam.mff.cuni.cz>
Cc: wschmidt@linux.ibm.com, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org,
 linkw@gcc.gnu.org, segher@kernel.crashing.org
References: <20211208055416.1415283-1-luoxhu@linux.ibm.com>
 <20211208055416.1415283-3-luoxhu@linux.ibm.com>
 <20211213092548.GA91590@kam.mff.cuni.cz>
 <5a057da8-677c-b5e9-48b3-2cb434e68505@linux.ibm.com>
From: Xionghu Luo <luoxhu@linux.ibm.com>
In-Reply-To: <5a057da8-677c-b5e9-48b3-2cb434e68505@linux.ibm.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: OuexXQPhCsdAzh6DG6qj982ULSvf2wPn
X-Proofpoint-ORIG-GUID: 4KSMEZbxF5Hd5iEELOXm3zgl9xcOSLY4
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.11.62.513
 definitions=2021-12-15_06,2021-12-14_01,2021-12-02_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 impostorscore=0
 suspectscore=0 clxscore=1015 lowpriorityscore=0 mlxlogscore=999
 phishscore=0 spamscore=0 priorityscore=1501 malwarescore=0 mlxscore=0
 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2110150000 definitions=main-2112150036
X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Dec 2021 06:40:25 -0000


On 2021/12/14 17:27, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/12/13 17:25, Jan Hubicka wrote:
>>> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
>>> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
>>> is marked as inner loop's extra loop exit and set with incorrect
>>> prediction, then a hot inner loop will become cold loop finally through
>>> optimizations, this patch add loop check when searching extra exit edges
>>> to avoid unexpected predict_edge from predict_paths_for_bb.
>>>
>>> Regression tested on P8LE, OK for master?
>>>
>>> gcc/ChangeLog:
>>>
>>> 	PR middle-end/103270
>>> 	* predict.c (predict_extra_loop_exits): Add loop parameter.
>>> 	(predict_loops): Call with loop argument.
>>
>> With changes to branch predictors it is useful to re-test their
>> effectivity on spec and see if their hitrates are still mathcing
>> reality.  You can do it by buiding spec with -fprofile-generate, train
>> it and then build with -fprofile-use -fdump-tree-ipa-profile-details
>> and use contrib/analyze_brprob.py that will collect info on how they
>> work.
>>
>> This patch looks good to me, but it would be nice to have things reality
>> checked (and since we did not do the stats for some time, there may be
>> surprises) so if you could run the specs and post results of
>> analyze_brprob, it would be great.  I will also try to get to that soon,
>> but currently I am bit swamped by other problems I noticed on clang
>> builds.
>>
>> Thanks a lot for working on profile fixes - I am trying now to get
>> things into shape.  With Martin we added basic testing infrastructure
>> for keeping track of profile updates and I am trying to see how it works
>> in practice now.  Hopefully it will make it easier to judge on profile
>> updating patches. I would welcome list of patches I should look at.
>>
>> I will write separate mail on this.
>> Honza
> 
> 
> With the patch, the analyze_brprob.py outputs below data with PGO build,
> there is no verification code in the script, so how to check whether it
> is correct?  Run it again without the patch and compare "extra loop exit"
> field?
> 
> 
> ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
> loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
> loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
> opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
> guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%
> 
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
> first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
> DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
> combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%
> 
> Loop count: 37870
>   avg. # of iter: 8444.77
>   median # of iter: 7.00
>   avg. (1% cutoff) # of iter: 174.68
>   avg. (5% cutoff) # of iter: 55.14
>   avg. (10% cutoff) # of iter: 35.21
>   avg. (20% cutoff) # of iter: 26.23
>   avg. (30% cutoff) # of iter: 21.70

This is the output data collected without the patch, as can be seen, no difference on "extra loop exit".
But this issue should be fixed.


./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/

benchspec
HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
__builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688238    3.27G   0.6%                     53%:2
recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303795    3.78G   0.7%                     52%:3
loop guard                                   1178   1.6%       56.37%   42.54% /  80.32%     7373601533    7.37G   1.4%                     50%:2
opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   5.9%                     21%:2
loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.4%                     18%:1
loop iterations                              4772   6.3%       99.98%   84.27% /  84.27%    74045982111   74.05G  13.8%
pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.7%                     34%:1
opcode values nonequal (on trees)           12240  16.2%       70.71%   70.86% /  83.54%    36638772682   36.64G   6.9%
guessed loop iterations                     16854  22.4%       99.78%   91.21% /  91.22%   169765264401  169.77G  31.7%

HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
no prediction                               12731  16.9%       39.30%   33.32% /  79.93%   121106031963  121.11G  22.6%
first match                                 25366  33.7%       92.20%   88.24% /  88.88%   304047352001  304.05G  56.9%
DS theory                                   28337  37.6%       63.03%   72.05% /  85.00%   109563734430  109.56G  20.5%
combined                                    75342 100.0%       73.21%   72.49% /  86.06%   534746603167  534.75G 100.0%

Loop count: 38058
  avg. # of iter: 8403.32
  median # of iter: 7.00
  avg. (1% cutoff) # of iter: 173.72
  avg. (5% cutoff) # of iter: 54.90
  avg. (10% cutoff) # of iter: 35.20
  avg. (20% cutoff) # of iter: 26.35
  avg. (30% cutoff) # of iter: 21.87


-- 
Thanks,
Xionghu