From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id EA6D33858022; Fri, 25 Jun 2021 09:41:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EA6D33858022 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15P9Y2lw077983; Fri, 25 Jun 2021 05:41:41 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 39dcd68p9s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Jun 2021 05:41:41 -0400 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 15P9Ye0F080453; Fri, 25 Jun 2021 05:41:40 -0400 Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0b-001b2d01.pphosted.com with ESMTP id 39dcd68p92-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Jun 2021 05:41:40 -0400 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 15P9bQGW004414; Fri, 25 Jun 2021 09:41:38 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma01fra.de.ibm.com with ESMTP id 3998789ndd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Jun 2021 09:41:38 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 15P9faVh20709680 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 25 Jun 2021 09:41:36 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F412B52057; Fri, 25 Jun 2021 09:41:35 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.200.46.149]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTPS id 9ABED5205A; Fri, 25 Jun 2021 09:41:33 +0000 (GMT) Subject: Re: [PATCH] New hook adjust_iv_update_pos To: Richard Biener Cc: GCC Patches , Segher Boessenkool , Bill Schmidt , linkw@gcc.gnu.org, David Edelsohn References: <20210625083101.2828805-1-luoxhu@linux.ibm.com> From: Xionghu Luo Message-ID: Date: Fri, 25 Jun 2021 17:41:30 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.0; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------C3B9E0B6843A91195869285F" Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: dRq-VD3WR6GVnNPyfYY4MekGyRKjHwwc X-Proofpoint-ORIG-GUID: Rk9OcjLSVEJEWM1vMwgPwIKQUbc2w2m5 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-06-25_03:2021-06-25, 2021-06-25 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0 suspectscore=0 lowpriorityscore=0 mlxlogscore=999 phishscore=0 adultscore=0 impostorscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2106250055 X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Jun 2021 09:41:44 -0000 This is a multi-part message in MIME format. --------------C3B9E0B6843A91195869285F Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 2021/6/25 16:54, Richard Biener wrote: > On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches > wrote: >> >> From: Xiong Hu Luo >> >> adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance >> on Power. For example, it generates mismatched address offset after >> adjust iv update statement position: >> >> [local count: 70988443]: >> _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1]; >> ivtmp.30_415 = ivtmp.30_414 + 1; >> _34 = ref_180 + 18446744073709551615; >> _86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1]; >> if (_84 == _86) >> goto ; [94.50%] >> else >> goto ; [5.50%] >> >> Disable it will produce: >> >> [local count: 70988443]: >> _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1]; >> _86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1]; >> ivtmp.30_415 = ivtmp.30_414 + 1; >> if (_84 == _86) >> goto ; [94.50%] >> else >> goto ; [5.50%] >> >> Then later pass loop unroll could benefit from same address offset >> with different base address and reduces register dependency. >> This patch could improve performance by 10% for typical case on Power, >> no performance change observed for X86 or Aarch64 due to small loops >> not unrolled on these platforms. Any comments? > > The case you quote is special in that if we hoisted the IV update before > the other MEM _also_ used in the condition it would be fine again. Thanks. I tried to hoist the IV update statement before the first MEM (Fix 2), it shows even worse performance due to not unroll(two more "base-1" is generated in gimple, then loop->ninsns is 11 so small loops is not unrolled), change the threshold from 10 to 12 in rs6000_loop_unroll_adjust would make it also unroll 2 times, the performance is SAME to the one that IV update statement in the *MIDDLE* (trunk). >From the ASM, we can see the index register %r4 is used in two iterations which maybe a bottle neck for hiding instruction latency? Then it seems reasonable the performance would be better if keep the IV update statement at *LAST* (Fix 1). (Fix 2): [local count: 70988443]: ivtmp.30_415 = ivtmp.30_414 + 1; _34 = ip_229 + 18446744073709551615; _84 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1]; _33 = ref_180 + 18446744073709551615; _86 = MEM[(uint8_t *)_33 + ivtmp.30_415 * 1]; if (_84 == _86) goto ; [94.50%] else goto ; [5.50%] .L67: lbzx %r12,%r24,%r4 lbzx %r25,%r7,%r4 cmpw %cr0,%r12,%r25 bne %cr0,.L11 mr %r26,%r4 addi %r4,%r4,1 lbzx %r12,%r24,%r4 lbzx %r25,%r7,%r4 mr %r6,%r26 cmpw %cr0,%r12,%r25 bne %cr0,.L11 mr %r26,%r4 .L12: cmpdi %cr0,%r10,1 addi %r4,%r26,1 mr %r6,%r26 addi %r10,%r10,-1 bne %cr0,.L67 > > Now, adjust_iv_update_pos doesn't seem to check that the > condition actually uses the IV use stmt def, so it likely applies to > too many cases. > > Unfortunately the introducing rev didn't come with a testcase, > but still I think fixing up adjust_iv_update_pos is better than > introducing a way to short-cut it per target decision. > > One "fix" might be to add a check that either the condition > lhs or rhs is the def of the IV use and the other operand > is invariant. Or if it's of similar structure hoist across the > other iv-use as well. Not that I understand the argument > about the overlapping life-range. > > You also don't provide a complete testcase ... > Attached the test code, will also add it it patch in future version. The issue comes from a very small hot loop: do { len++; } while(len < maxlen && ip[len] == ref[len]); -- Thanks, Xionghu --------------C3B9E0B6843A91195869285F Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0"; name="test_i2_4_6.c" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="test_i2_4_6.c" I2luY2x1ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxzdGRsaWIuaD4KI2luY2x1ZGUgPHN0ZGlu dC5oPgoKIyBkZWZpbmUgSExPRyAxNgojZGVmaW5lICAgICAgICBNQVhfTElUICAgICAgICAo MSA8PCAgNSkKdHlwZWRlZiBjb25zdCB1aW50OF90ICpMWkZfSFNMT1Q7CnR5cGVkZWYgTFpG X0hTTE9UIExaRl9TVEFURVsxIDw8IChITE9HKV07CgppbnQgY29tcHV0ZV9vbl9ieXRlcyh1 aW50OF90ICosIGludCwgdWludDhfdCAqLCBpbnQgKTsKCmludCBtYWluKGludCBhcmdjLCBj aGFyICoqYXJndikKewoJLy9EZWNsYXJhdGlvbnMKCUZJTEUgKmZwdHI7CglpbnQgbGVuX2xp bWl0OwoJdWludDhfdCAqaW5wdXRidWYsKm91dHB1dGJ1ZjsKCgl1aW50OF90ICppcCwqb3A7 CglpbnQgbGVuLCBpOwoJbG9uZyBpbnQgc3VtPTA7CgoJLy9yYW5kb21uZXNzCglpZiAoYXJn dlsxXSAhPSBOVUxMICkKCQlsZW49YXRvaShhcmd2WzFdKTsKCgkvL3JlYWQKCWZwdHIgPSBm b3Blbihhcmd2WzJdLCJyYiIpOwkKCglpZihmcHRyID09IE5VTEwpCgl7CSAgCgkgICAgICAg cHJpbnRmKCJFcnJvciEiKTsgICAKCSAgICAgICBleGl0KDEpOyAgICAgICAgICAgICAKCX0K Cglmc2VlayggZnB0ciAsIDBMICwgU0VFS19FTkQpOwoJbGVuX2xpbWl0ID0gZnRlbGwoIGZw dHIgKTsKCXJld2luZCggZnB0ciApOwoKCS8qIGFsbG9jYXRlIG1lbW9yeSBmb3IgZW50aXJl IGlucHV0IGNvbnRlbnQgKi8KCWlucHV0YnVmID0gY2FsbG9jKCAxLCBsZW5fbGltaXQrMSAp OwoJaWYoICFpbnB1dGJ1ZiApIGZjbG9zZShmcHRyKSxmcHV0cygibWVtb3J5IGFsbG9jIGZh aWxzIixzdGRlcnIpLGV4aXQoMSk7CgkKCS8qIGFsbG9jYXRlIG1lbW9yeSBmb3IgZW50aXJl IG91dHB1dCAqLwoJb3V0cHV0YnVmID0gY2FsbG9jKCAxLCBsZW5fbGltaXQrMSApOwoJaWYo ICFpbnB1dGJ1ZiApIGZjbG9zZShmcHRyKSxmcHV0cygibWVtb3J5IGFsbG9jIGZhaWxzIixz dGRlcnIpLGV4aXQoMSk7CgoJLyogY29weSB0aGUgZmlsZSBpbnRvIHRoZSBidWZmZXIgKi8K CWlmKCAxIT1mcmVhZCggaW5wdXRidWYgLCBsZW5fbGltaXQsIDEgLCBmcHRyKSApCgkgIGZj bG9zZShmcHRyKSxmcmVlKGlucHV0YnVmKSxmcHV0cygiZW50aXJlIHJlYWQgZmFpbHMiLHN0 ZGVyciksZXhpdCgxKTsKCgkvL2NvbXBhcmUKCWlwPWlucHV0YnVmOwoJb3A9b3V0cHV0YnVm OwoKCWZvciAoaT0wOyBpIDwgbGVuX2xpbWl0OyBpPWkrKGxlbi80KSkKCQlzdW0rPWNvbXB1 dGVfb25fYnl0ZXMoaXAraSxsZW4sb3AsbGVuLTQpOwoJZmNsb3NlKGZwdHIpOwoJZnJlZShp bnB1dGJ1Zik7CglyZXR1cm4gc3VtOwp9CgoKLy9fX2F0dHJpYnV0ZV9fKChub2lubGluZSkp ICBpbnQgY29tcHV0ZV9vbl9ieXRlcyh1aW50OF90ICppcCwgdWludDhfdCAqcmVmLCBpbnQg bGVuX2xpbWl0LCBpbnQgdGVtcDIgKQogaW50IGNvbXB1dGVfb25fYnl0ZXModWludDhfdCAq aW5fZGF0YSwgaW50IGluX2xlbiwgdWludDhfdCAqb3V0X2RhdGEsIGludCBvdXRfbGVuKQp7 CglMWkZfU1RBVEUgaHRhYjsKCgl1aW50OF90ICppcCA9IGluX2RhdGE7Cgl1aW50OF90ICpv cCA9IG91dF9kYXRhOwoJdWludDhfdCAqaW5fZW5kID0gaXAgKyBpbl9sZW47Cgl1aW50OF90 ICpvdXRfZW5kID0gb3AgKyBvdXRfbGVuOwoJdWludDhfdCAqcmVmOwoKCXVuc2lnbmVkIGxv bmcgb2ZmOwoJdW5zaWduZWQgaW50IGh2YWw7CglpbnQgbGl0OwoKCWlmICghaW5fbGVuIHx8 ICFvdXRfbGVuKQoJICAgIHJldHVybiAwOwoKCWxpdCA9IDA7IG9wKys7CglodmFsID0gKCgo aXBbMF0pIDw8IDgpIHwgaXBbMV0pOwoKCXdoaWxlKCBpcCA8IGluX2VuZCAtIDIgKQoJewoJ CXVpbnQ4X3QgKmhzbG90OwoKCQlodmFsID0gKCgoaHZhbCkgPDwgOCkgfCBpcFsyXSk7CgkJ aHNsb3QgPSBodGFiICsgKCgoIGh2YWwgPj4gKDMqOCAtIDE2KSkgLSBodmFsKjUpICYgKCgx IDw8ICgxNikpIC0gMSkpOwoKCQlyZWYgPSAqaHNsb3QgKyBpbl9kYXRhOyAKCQkqaHNsb3Qg PSBpcCAtIGluX2RhdGE7CgkJCgkJaWYgKDEgJiYgKG9mZiA9IGlwIC0gcmVmIC0gMSkgPCAo MSA8PCAxMykgJiYgcmVmID4gaW5fZGF0YSAmJiByZWZbMl0gPT0gaXBbMl0gJiYgKChyZWZb MV0gPDwgOCkgfCByZWZbMF0pID09ICgoaXBbMV0gPDwgOCkgfCBpcFswXSkgKQoJCXsKCQkJ dW5zaWduZWQgaW50IGxlbiA9IDI7CgkJCXVuc2lnbmVkIGludCBtYXhsZW4gPSBpbl9lbmQg LSBpcCAtIGxlbjsKCSAgICAgICAJCW1heGxlbiA9IG1heGxlbiA+ICgoMSA8PCA4KSArICgx IDw8IDMpKSA/ICgoMSA8PCA4KSArICgxIDw8IDMpKSA6IG1heGxlbjsKCgkJCWlmICgob3Ag KyAzICsgMSA+PSBvdXRfZW5kKSAhPSAwKQoJCQkJaWYgKG9wIC0gIWxpdCArIDMgKyAxID49 IG91dF9lbmQpCgkJCQkJcmV0dXJuIDA7CgoJCQlvcCBbLSBsaXQgLSAxXSA9IGxpdCAtIDE7 CgkJCW9wIC09ICFsaXQ7CgoJCQlmb3IgKDs7KQoJCQl7CgkJCQlpZiAoIG1heGxlbiA+IDE2 ICkKCSAgICAgICAgICAgICAgIAkJeyAKCSAgICAgICAgICAgICAgIAkJCWxlbisrOyBpZiAo cmVmIFtsZW5dICE9IGlwIFtsZW5dKSBicmVhazsKCSAgICAgICAgICAgICAgIAkJCWxlbisr OyBpZiAocmVmIFtsZW5dICE9IGlwIFtsZW5dKSBicmVhazsKCSAgICAgICAgICAgICAgIAkJ CWxlbisrOyBpZiAocmVmIFtsZW5dICE9IGlwIFtsZW5dKSBicmVhazsKCSAgICAgICAgICAg ICAgIAkJCWxlbisrOyBpZiAocmVmIFtsZW5dICE9IGlwIFtsZW5dKSBicmVhazsKCgkgICAg ICAgICAgICAgICAJCQlsZW4rKzsgaWYgKHJlZiBbbGVuXSAhPSBpcCBbbGVuXSkgYnJlYWs7 CgkgICAgICAgICAgICAgICAJCQlsZW4rKzsgaWYgKHJlZiBbbGVuXSAhPSBpcCBbbGVuXSkg YnJlYWs7CgkgICAgICAgICAgICAgICAJCQlsZW4rKzsgaWYgKHJlZiBbbGVuXSAhPSBpcCBb bGVuXSkgYnJlYWs7CgkgICAgICAgICAgICAgICAJCQlsZW4rKzsgaWYgKHJlZiBbbGVuXSAh PSBpcCBbbGVuXSkgYnJlYWs7CgoJICAgICAgICAgICAgICAgCQkJbGVuKys7IGlmIChyZWYg W2xlbl0gIT0gaXAgW2xlbl0pIGJyZWFrOwoJICAgICAgICAgICAgICAgCQkJbGVuKys7IGlm IChyZWYgW2xlbl0gIT0gaXAgW2xlbl0pIGJyZWFrOwoJICAgICAgICAgICAgICAgCQkJbGVu Kys7IGlmIChyZWYgW2xlbl0gIT0gaXAgW2xlbl0pIGJyZWFrOwoJICAgICAgICAgICAgICAg CQkJbGVuKys7IGlmIChyZWYgW2xlbl0gIT0gaXAgW2xlbl0pIGJyZWFrOwoKCSAgICAgICAg ICAgICAgIAkJCWxlbisrOyBpZiAocmVmIFtsZW5dICE9IGlwIFtsZW5dKSBicmVhazsKCSAg ICAgICAgICAgICAgIAkJCWxlbisrOyBpZiAocmVmIFtsZW5dICE9IGlwIFtsZW5dKSBicmVh azsKCSAgICAgICAgICAgICAgIAkJCWxlbisrOyBpZiAocmVmIFtsZW5dICE9IGlwIFtsZW5d KSBicmVhazsKCSAgICAgICAgICAgICAgIAkJCWxlbisrOyBpZiAocmVmIFtsZW5dICE9IGlw IFtsZW5dKSBicmVhazsKCSAgICAgICAgICAgICAJCX0KCgkgICAgICAgCQkJZG8gewoJICAg ICAgICAgICAgICAgCSAgCQlsZW4rKzsKCgkgICAgICAgCSAgICAJCX13aGlsZShsZW4gPCBt YXhsZW4gJiYgaXBbbGVuXSA9PSByZWZbbGVuXSk7CgoJCQkJYnJlYWs7CgkJCX0KCgkJCWxl biAtPSAyOwogICAgICAgICAgCQlpcCsrOwoKCSAgICAgICAgICAgIAlpZiAobGVuIDwgNykK CQkgICAgICAgICB7IAoJCQkJKm9wKysgPSAob2ZmID4+IDgpICsgKGxlbiA8PCA1KTsKCQkJ fQoJCSAgICAgICAgZWxzZQoJCQl7IAoJCQkJKm9wKysgPSAob2ZmID4+IDgpICsgKCA3IDw8 IDUpOwoJCQkJKm9wKysgPSBsZW4gLSA3OwoJCQl9CgkJICAgICAgICAqb3ArKyA9IG9mZjsK CQkJbGl0ID0gMDsgb3ArKzsKCQkJaXAgKz0gbGVuICsgMTsKCgkJCWlmIChpcCA+PSBpbl9l bmQgLSAyKSAKCQkJCWJyZWFrOwoKCQkJLS1pcDsKCQkJLS1pcDsKCgkJCWh2YWwgPSAoKChp cFswXSkgPDwgOCkgfCBpcFsxXSk7CgkJCWh2YWwgPSAoKChodmFsKSA8PCA4KSB8IGlwWzJd KTsKCQkJaHRhYlsoKCggaHZhbCA+PiAoMyo4IC0gMTYpKSAtIGh2YWwqNSkgJiAoKDEgPDwg KDE2KSkgLSAxKSldID0gaXAgLSBpbl9kYXRhOwoJCQlpcCsrOwoKCQkJaHZhbCA9ICgoKGh2 YWwpIDw8IDgpIHwgaXBbMl0pOwoJCQlodGFiWygoKCBodmFsID4+ICgzKjggLSAxNikpIC0g aHZhbCo1KSAmICgoMSA8PCAoMTYpKSAtIDEpKV0gPSBpcCAtIGluX2RhdGE7CgkJCWlwKys7 CgoJCX0KCSAgICAgIGVsc2UgIHsKCQkgICAgICAgIGlmKG9wID49IG91dF9lbmQpCgkJCQly ZXR1cm4gMDsKCgkJCWxpdCsrOyAqb3ArKyA9ICppcCsrOwoKCQkJaWYgKGxpdCA9PSAoMSA8 PCA1KSkgCgkJCXsgCgkJCQlvcCBbLSBsaXQgLSAxXSA9IGxpdCAtIDE7CgkJCQlsaXQgPSAw OyBvcCsrOwoJCQl9CgkJfQoJICB9CglpZiAob3AgKyAzID4gb3V0X2VuZCkgLyogYXQgbW9z dCAzIGJ5dGVzIGNhbiBiZSBtaXNzaW5nIGhlcmUgKi8KCQkgICAgcmV0dXJuIDA7CgoJd2hp bGUgKGlwIDwgaW5fZW5kKQoJeyAKCQlsaXQrKzsgKm9wKysgPSAqaXArKzsKCQlpZiAobGl0 ID09IE1BWF9MSVQpCgkJeyAKCQkJb3AgWy0gbGl0IC0gMV0gPSBsaXQgLSAxOyAvKiBzdG9w IHJ1biAqLwoJCQlsaXQgPSAwOyBvcCsrOyAvKiBzdGFydCBydW4gKi8KCQl9Cgl9CgoJb3Ag Wy0gbGl0IC0gMV0gPSBsaXQgLSAxOyAvKiBlbmQgcnVuICovCglvcCAtPSAhbGl0OyAvKiB1 bmRvIHJ1biBpZiBsZW5ndGggaXMgemVybyAqLwoKCXJldHVybiBvcCAtIG91dF9kYXRhOwoK fQo= --------------C3B9E0B6843A91195869285F--