From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <luoxhu@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 7350B385841F;
 Tue, 21 Dec 2021 03:56:49 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7350B385841F
Received: from pps.filterd (m0098399.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1BL0P2w9002714; 
 Tue, 21 Dec 2021 03:56:48 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3d1s7qk77w-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 21 Dec 2021 03:56:48 +0000
Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1BL3mR43020553;
 Tue, 21 Dec 2021 03:56:47 GMT
Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com
 [169.51.49.99])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3d1s7qk77f-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 21 Dec 2021 03:56:47 +0000
Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1])
 by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1BL3sbSD018547;
 Tue, 21 Dec 2021 03:56:45 GMT
Received: from b06cxnps3074.portsmouth.uk.ibm.com
 (d06relay09.portsmouth.uk.ibm.com [9.149.109.194])
 by ppma04ams.nl.ibm.com with ESMTP id 3d1799hrue-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 21 Dec 2021 03:56:44 +0000
Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com
 [9.149.105.58])
 by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 1BL3ugcL42402096
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Tue, 21 Dec 2021 03:56:42 GMT
Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 904314C050;
 Tue, 21 Dec 2021 03:56:42 +0000 (GMT)
Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 509964C040;
 Tue, 21 Dec 2021 03:56:40 +0000 (GMT)
Received: from [9.197.245.13] (unknown [9.197.245.13])
 by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTPS;
 Tue, 21 Dec 2021 03:56:39 +0000 (GMT)
Message-ID: <e4b5cb93-413b-eaca-151e-fa1c668d9c81@linux.ibm.com>
Date: Tue, 21 Dec 2021 11:56:37 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
 Gecko/20100101 Thunderbird/91.4.0
Subject: Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]
Content-Language: en-US
To: Jan Hubicka <hubicka@kam.mff.cuni.cz>
Cc: wschmidt@linux.ibm.com, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org,
 linkw@gcc.gnu.org, segher@kernel.crashing.org
References: <20211208055416.1415283-1-luoxhu@linux.ibm.com>
 <20211208055416.1415283-3-luoxhu@linux.ibm.com>
 <20211213092548.GA91590@kam.mff.cuni.cz>
 <5a057da8-677c-b5e9-48b3-2cb434e68505@linux.ibm.com>
 <8af589d9-13c1-2ff8-08d3-7caf98fc037a@linux.ibm.com>
 <20211216111818.GE4516@kam.mff.cuni.cz>
From: Xionghu Luo <luoxhu@linux.ibm.com>
In-Reply-To: <20211216111818.GE4516@kam.mff.cuni.cz>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: jhwZzyWOoJrIBxMiva3wMI8cyY__qmT1
X-Proofpoint-ORIG-GUID: Lt-piBQXdzvoCOb5VCGT3fYjKcwwoNFW
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.11.62.513
 definitions=2021-12-21_01,2021-12-20_01,2021-12-02_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 clxscore=1015 bulkscore=0
 malwarescore=0 mlxlogscore=999 priorityscore=1501 adultscore=0
 suspectscore=0 mlxscore=0 spamscore=0 lowpriorityscore=0 impostorscore=0
 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2110150000 definitions=main-2112210013
X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Dec 2021 03:56:51 -0000


On 2021/12/16 19:18, Jan Hubicka wrote:
>>>
>>>
>>> ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
>>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>>> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
>>> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
>>> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
>>> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
>>> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
>>> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
>>> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
>>> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
>>> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
>>> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
>>> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
>>> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
>>> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
>>> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
>>> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
>>> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
>>> loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
>>> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
>>> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
>>> loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
>>> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
>>> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
>>> opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
>>> guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%
>>>
>>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>>> no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
>>> first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
>>> DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
>>> combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%
>>>
>>> Loop count: 37870
>>>   avg. # of iter: 8444.77
>>>   median # of iter: 7.00
>>>   avg. (1% cutoff) # of iter: 174.68
>>>   avg. (5% cutoff) # of iter: 55.14
>>>   avg. (10% cutoff) # of iter: 35.21
>>>   avg. (20% cutoff) # of iter: 26.23
>>>   avg. (30% cutoff) # of iter: 21.70
>>
>> This is the output data collected without the patch, as can be seen, no difference on "extra loop exit".
>> But this issue should be fixed.
>>
>>
>> ./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/
>>
>> benchspec
>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
>> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
>> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
>> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
>> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
>> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
>> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
>> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
>> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
>> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
>> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
>> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688238    3.27G   0.6%                     53%:2
>> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
>> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
>> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
>> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303795    3.78G   0.7%                     52%:3
>> loop guard                                   1178   1.6%       56.37%   42.54% /  80.32%     7373601533    7.37G   1.4%                     50%:2
>> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   5.9%                     21%:2
>> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.4%                     18%:1
>> loop iterations                              4772   6.3%       99.98%   84.27% /  84.27%    74045982111   74.05G  13.8%
>> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
>> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.7%                     34%:1
>> opcode values nonequal (on trees)           12240  16.2%       70.71%   70.86% /  83.54%    36638772682   36.64G   6.9%
>> guessed loop iterations                     16854  22.4%       99.78%   91.21% /  91.22%   169765264401  169.77G  31.7%
>>
>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>> no prediction                               12731  16.9%       39.30%   33.32% /  79.93%   121106031963  121.11G  22.6%
>> first match                                 25366  33.7%       92.20%   88.24% /  88.88%   304047352001  304.05G  56.9%
>> DS theory                                   28337  37.6%       63.03%   72.05% /  85.00%   109563734430  109.56G  20.5%
>> combined                                    75342 100.0%       73.21%   72.49% /  86.06%   534746603167  534.75G 100.0%
> 
> Thank you.  So it seems that the problem does not trigger in Spec but I
> was also wondering if our current predict.def values are anywhere near
> to reality.
> 
> THe table reads as follows:  
>  - BRANCHES is number of branches the heuristics hit on (so extra loop
>    exit has 80 and therefore we do not have that good statistics on it)
>  - HITRATE is the probability that the prediction goes given direction
>    during the train run.
>    after / is the value which would be reached by perfect predictor
>    (which predict branch to the direction that dominates during train)
>    Extra loop exit is 81% out of 89% so it is pretty close to optimum
>  - COVERAGE is how many times the predicted branch was executed
> 
> In general the idea is that for most heuristics (wihch can not determine
> exact value like loop iteraitons) HITRATE values can be put to
> predict.def so the Dempster-Shafer formula (DS theory) combines the
> hypothesis sort of realistically (it assumes that all the predictors are
> staistically independent which they are not).
> 
> We have HITRATE 67 for extra loop exit which is bit off what we do have
> in the measured data, but I think our predict.def is still based on
> spec2006 numbers.
> 
> So the patch is OK.  Perhaps we could experiment with updating
> predict.def (It does develop even when run across same benchmark suite
> since early optimizations change - this stage1 I think the threading
> work definitly affects the situation substantially)

Thanks, committed to r12-6085.

> 
> Honza
>>
>> Loop count: 38058
>>   avg. # of iter: 8403.32
>>   median # of iter: 7.00
>>   avg. (1% cutoff) # of iter: 173.72
>>   avg. (5% cutoff) # of iter: 54.90
>>   avg. (10% cutoff) # of iter: 35.20
>>   avg. (20% cutoff) # of iter: 26.35
>>   avg. (30% cutoff) # of iter: 21.87
>>
>>
>> -- 
>> Thanks,
>> Xionghu

-- 
Thanks,
Xionghu