From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 7D5DA3858401 for ; Tue, 7 Dec 2021 11:27:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7D5DA3858401 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0A87B11FB; Tue, 7 Dec 2021 03:27:23 -0800 (PST) Received: from [10.57.3.27] (unknown [10.57.3.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6F6F03F73B; Tue, 7 Dec 2021 03:27:22 -0800 (PST) Content-Type: multipart/mixed; boundary="------------dqa7UYM6fnmmHKlsG1CvCH9i" Message-ID: Date: Tue, 7 Dec 2021 11:27:22 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.3.2 Subject: [vect] Re-analyze all modes for epilogues Content-Language: en-US To: Richard Biener Cc: "gcc-patches@gcc.gnu.org" , richard.sandiford@arm.com References: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com> <4272814n-8538-p793-157q-5n6q16r48n51@fhfr.qr> <623fbfd9-b97c-8c6e-0348-07d6c4496592@arm.com> <5c887c48-7f7e-c02b-2998-7a7c41b11af8@arm.com> <33cb143e-bb2e-e214-cd5f-66fd2d1bd20b@arm.com> <5op15ns-4sq8-2sn3-41qs-49q44417sp6@fhfr.qr> <99qs2o2p-pn87-n164-q8n9-9p814r6n75r1@fhfr.qr> <475fae98-9541-5dca-2e60-eaff172ff787@arm.com> <8p72o15s-5894-4or0-409r-oo4p74o238r1@fhfr.qr> <21e3500d-6cf5-ed46-6f95-1f554c5dbc50@arm.com> <5477e0cb-6dc9-e828-7c20-a99de3c6840c@arm.com> From: "Andre Vieira (lists)" In-Reply-To: X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, BODY_8BITS, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Dec 2021 11:27:25 -0000 This is a multi-part message in MIME format. --------------dqa7UYM6fnmmHKlsG1CvCH9i Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi, I've split this particular part off, since it's not only relevant to unrolling. The new test shows how this is useful for existing (non-unrolling) cases. I also had to fix the costing function, the main_vf / epilogue_vf calculations for old and new didn't take into consideration that the main_vf could be lower, nor did it take into consideration that they were not necessarily always a multiple of each other.  So using CEIL here is the correct approach. Bootstrapped and regression tested on aarch64-none-linux-gnu. OK for trunk? gcc/ChangeLog:         * tree-vect-loop.c (vect_better_loop_vinfo_p): Round factors up for epilogue costing.         (vect_analyze_loop): Re-analyze all modes for epilogues. gcc/testsuite/ChangeLog:         * gcc.target/aarch64/masked_epilogue.c: New test. On 30/11/2021 13:56, Richard Biener wrote: > On Tue, 30 Nov 2021, Andre Vieira (lists) wrote: > >> On 25/11/2021 12:46, Richard Biener wrote: >>> Oops, my fault, yes, it does. I would suggest to refactor things so >>> that the mode_i = first_loop_i case is there only once. I also wonder >>> if all the argument about starting at 0 doesn't apply to the >>> not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P as well? So >>> what's the reason to differ here? So in the end I'd just change >>> the existing >>> >>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) >>> { >>> >>> to >>> >>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo) >>> || first_loop_vinfo->suggested_unroll_factor > 1) >>> { >>> >>> and maybe revisit this when we have an actual testcase showing that >>> doing sth else has a positive effect? >>> >>> Thanks, >>> Richard. >> So I had a quick chat with Richard Sandiford and he is suggesting resetting >> mode_i to 0 for all cases. >> >> He pointed out that for some tunings the SVE mode might come after the NEON >> mode, which means that even for not-unrolled loop_vinfos we could end up with >> a suboptimal choice of mode for the epilogue. I.e. it could be that we pick >> V16QI for main vectorization, but that's VNx16QI + 1 in the array, so we'd not >> try VNx16QI for the epilogue. >> >> This would simplify the mode selecting cases, by just simply restarting at >> mode_i in all epilogue cases. Is that something you'd be OK? > Works for me with an updated comment. Even better with showing a > testcase exercising such tuning. > > Richard. --------------dqa7UYM6fnmmHKlsG1CvCH9i Content-Type: text/plain; charset=UTF-8; name="epilogue_modes.patch" Content-Disposition: attachment; filename="epilogue_modes.patch" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L21hc2tlZF9l cGlsb2d1ZS5jIGIvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvbWFza2VkX2Vw aWxvZ3VlLmMKbmV3IGZpbGUgbW9kZSAxMDA2NDQKaW5kZXggMDAwMDAwMDAwMDAwMDAwMDAw MDAwMDAwMDAwMDAwMDAwMDAwMDAwMC4uMjg2YTdiZTIzNmYzMzdmZWU0YzQ2NTBmNDJkYTcy MDAwODU1YzVlNgotLS0gL2Rldi9udWxsCisrKyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdl dC9hYXJjaDY0L21hc2tlZF9lcGlsb2d1ZS5jCkBAIC0wLDAgKzEsMTAgQEAKKy8qIHsgZGct ZG8gY29tcGlsZSB9ICovCisvKiB7IGRnLW9wdGlvbnMgIi1PMiAtZnRyZWUtdmVjdG9yaXpl IC1mZHVtcC10cmVlLXZlY3QtZGV0YWlscyAtbWFyY2g9YXJtdjgtYStzdmUgLW1zdmUtdmVj dG9yLWJpdHM9c2NhbGFibGUiIH0gKi8KKwordm9pZCBmKHVuc2lnbmVkIGNoYXIgeVtyZXN0 cmljdF0sCisgICAgICAgdW5zaWduZWQgY2hhciB4W3Jlc3RyaWN0XSwgaW50IG4pIHsKKyAg Zm9yIChpbnQgaSA9IDA7IGkgPCBuOyArK2kpCisgICAgeVtpXSA9ICh5W2ldICsgeFtpXSAr IDEpID4+IDE7Cit9CisKKy8qIHsgZGctZmluYWwgeyBzY2FuLXRyZWUtZHVtcCB7TE9PUCBF UElMT0dVRSBWRUNUT1JJWkVEIFwoTU9ERT1WTnh9ICJ2ZWN0IiB9IH0gKi8KZGlmZiAtLWdp dCBhL2djYy90cmVlLXZlY3QtbG9vcC5jIGIvZ2NjL3RyZWUtdmVjdC1sb29wLmMKaW5kZXgg YTI4YmI2MzIxZDc2YjgyMjJiYzhjZmRhZGUxNTFjYTliNGRjYTQwNi4uMTdiMDkwMTcwZDRh NWRhZDIyMDk3YTcyN2JjMjVhNjNlMjMwZTI3OCAxMDA2NDQKLS0tIGEvZ2NjL3RyZWUtdmVj dC1sb29wLmMKKysrIGIvZ2NjL3RyZWUtdmVjdC1sb29wLmMKQEAgLTI4MjQsMTEgKzI4MjQs MTMgQEAgdmVjdF9iZXR0ZXJfbG9vcF92aW5mb19wIChsb29wX3ZlY19pbmZvIG5ld19sb29w X3ZpbmZvLAogCXsKIAkgIHVuc2lnbmVkIEhPU1RfV0lERV9JTlQgbWFpbl92Zl9tYXgKIAkg ICAgPSBlc3RpbWF0ZWRfcG9seV92YWx1ZSAobWFpbl9wb2x5X3ZmLCBQT0xZX1ZBTFVFX01B WCk7CisJICB1bnNpZ25lZCBIT1NUX1dJREVfSU5UIG9sZF92Zl9tYXgKKwkgICAgPSBlc3Rp bWF0ZWRfcG9seV92YWx1ZSAob2xkX3ZmLCBQT0xZX1ZBTFVFX01BWCk7CisJICB1bnNpZ25l ZCBIT1NUX1dJREVfSU5UIG5ld192Zl9tYXgKKwkgICAgPSBlc3RpbWF0ZWRfcG9seV92YWx1 ZSAobmV3X3ZmLCBQT0xZX1ZBTFVFX01BWCk7CiAKLQkgIG9sZF9mYWN0b3IgPSBtYWluX3Zm X21heCAvIGVzdGltYXRlZF9wb2x5X3ZhbHVlIChvbGRfdmYsCi0JCQkJCQkJICAgUE9MWV9W QUxVRV9NQVgpOwotCSAgbmV3X2ZhY3RvciA9IG1haW5fdmZfbWF4IC8gZXN0aW1hdGVkX3Bv bHlfdmFsdWUgKG5ld192ZiwKLQkJCQkJCQkgICBQT0xZX1ZBTFVFX01BWCk7CisJICBvbGRf ZmFjdG9yID0gQ0VJTCAobWFpbl92Zl9tYXgsIG9sZF92Zl9tYXgpOworCSAgbmV3X2ZhY3Rv ciA9IENFSUwgKG1haW5fdmZfbWF4LCBuZXdfdmZfbWF4KTsKIAogCSAgLyogSWYgdGhlIGxv b3AgaXMgbm90IHVzaW5nIHBhcnRpYWwgdmVjdG9ycyB0aGVuIGl0IHdpbGwgaXRlcmF0ZSBv bmUKIAkgICAgIHRpbWUgbGVzcyB0aGFuIG9uZSB0aGF0IGRvZXMuICBJdCBpcyBzYWZlIHRv IHN1YnRyYWN0IG9uZSBoZXJlLApAQCAtMzA2OSw4ICszMDcxLDYgQEAgdmVjdF9hbmFseXpl X2xvb3AgKGNsYXNzIGxvb3AgKmxvb3AsIHZlY19pbmZvX3NoYXJlZCAqc2hhcmVkKQogICBt YWNoaW5lX21vZGUgYXV0b2RldGVjdGVkX3ZlY3Rvcl9tb2RlID0gVk9JRG1vZGU7CiAgIG9w dF9sb29wX3ZlY19pbmZvIGZpcnN0X2xvb3BfdmluZm8gPSBvcHRfbG9vcF92ZWNfaW5mbzo6 c3VjY2VzcyAoTlVMTCk7CiAgIHVuc2lnbmVkIGludCBtb2RlX2kgPSAwOwotICB1bnNpZ25l ZCBpbnQgZmlyc3RfbG9vcF9pID0gMDsKLSAgdW5zaWduZWQgaW50IGZpcnN0X2xvb3BfbmV4 dF9pID0gMDsKICAgdW5zaWduZWQgSE9TVF9XSURFX0lOVCBzaW1kbGVuID0gbG9vcC0+c2lt ZGxlbjsKIAogICAvKiBGaXJzdCBkZXRlcm1pbmUgdGhlIG1haW4gbG9vcCB2ZWN0b3JpemF0 aW9uIG1vZGUsIGVpdGhlciB0aGUgZmlyc3QKQEAgLTMwNzksNyArMzA3OSw2IEBAIHZlY3Rf YW5hbHl6ZV9sb29wIChjbGFzcyBsb29wICpsb29wLCB2ZWNfaW5mb19zaGFyZWQgKnNoYXJl ZCkKICAgICAgbG93ZXN0IGNvc3QgaWYgcGlja19sb3dlc3RfY29zdF9wLiAgKi8KICAgd2hp bGUgKDEpCiAgICAgewotICAgICAgdW5zaWduZWQgaW50IGxvb3BfdmluZm9faSA9IG1vZGVf aTsKICAgICAgIGJvb2wgZmF0YWw7CiAgICAgICBvcHRfbG9vcF92ZWNfaW5mbyBsb29wX3Zp bmZvCiAJPSB2ZWN0X2FuYWx5emVfbG9vcF8xIChsb29wLCBzaGFyZWQsICZsb29wX2Zvcm1f aW5mbywKQEAgLTMxMDgsMTEgKzMxMDcsNyBAQCB2ZWN0X2FuYWx5emVfbG9vcCAoY2xhc3Mg bG9vcCAqbG9vcCwgdmVjX2luZm9fc2hhcmVkICpzaGFyZWQpCiAJICAgICAgZmlyc3RfbG9v cF92aW5mbyA9IG9wdF9sb29wX3ZlY19pbmZvOjpzdWNjZXNzIChOVUxMKTsKIAkgICAgfQog CSAgaWYgKGZpcnN0X2xvb3BfdmluZm8gPT0gTlVMTCkKLQkgICAgewotCSAgICAgIGZpcnN0 X2xvb3BfdmluZm8gPSBsb29wX3ZpbmZvOwotCSAgICAgIGZpcnN0X2xvb3BfaSA9IGxvb3Bf dmluZm9faTsKLQkgICAgICBmaXJzdF9sb29wX25leHRfaSA9IG1vZGVfaTsKLQkgICAgfQor CSAgICBmaXJzdF9sb29wX3ZpbmZvID0gbG9vcF92aW5mbzsKIAkgIGVsc2UKIAkgICAgewog CSAgICAgIGRlbGV0ZSBsb29wX3ZpbmZvOwpAQCAtMzE1OCwyNiArMzE1MywxOCBAQCB2ZWN0 X2FuYWx5emVfbG9vcCAoY2xhc3MgbG9vcCAqbG9vcCwgdmVjX2luZm9fc2hhcmVkICpzaGFy ZWQpCiAgIC8qIE5vdyBhbmFseXplIGZpcnN0X2xvb3BfdmluZm8gZm9yIGVwaWxvZ3VlIHZl Y3Rvcml6YXRpb24uICAqLwogICBwb2x5X3VpbnQ2NCBsb3dlc3RfdGggPSBMT09QX1ZJTkZP X1ZFUlNJT05JTkdfVEhSRVNIT0xEIChmaXJzdF9sb29wX3ZpbmZvKTsKIAotICAvKiBIYW5k bGUgdGhlIGNhc2UgdGhhdCB0aGUgb3JpZ2luYWwgbG9vcCBjYW4gdXNlIHBhcnRpYWwKLSAg ICAgdmVjdG9yaXphdGlvbiwgYnV0IHdhbnQgdG8gb25seSBhZG9wdCBpdCBmb3IgdGhlIGVw aWxvZ3VlLgotICAgICBUaGUgcmV0cnkgc2hvdWxkIGJlIGluIHRoZSBzYW1lIG1vZGUgYXMg b3JpZ2luYWwuICAqLwotICBpZiAoTE9PUF9WSU5GT19FUElMX1VTSU5HX1BBUlRJQUxfVkVD VE9SU19QIChmaXJzdF9sb29wX3ZpbmZvKSkKLSAgICB7Ci0gICAgICBnY2NfYXNzZXJ0IChM T09QX1ZJTkZPX0NBTl9VU0VfUEFSVElBTF9WRUNUT1JTX1AgKGZpcnN0X2xvb3BfdmluZm8p Ci0JCSAgJiYgIUxPT1BfVklORk9fVVNJTkdfUEFSVElBTF9WRUNUT1JTX1AgKGZpcnN0X2xv b3BfdmluZm8pKTsKLSAgICAgIGlmIChkdW1wX2VuYWJsZWRfcCAoKSkKLQlkdW1wX3ByaW50 Zl9sb2MgKE1TR19OT1RFLCB2ZWN0X2xvY2F0aW9uLAotCQkJICIqKioqKiBSZS10cnlpbmcg YW5hbHlzaXMgd2l0aCBzYW1lIHZlY3RvciBtb2RlIgotCQkJICIgJXMgZm9yIGVwaWxvZ3Vl IHdpdGggcGFydGlhbCB2ZWN0b3JzLlxuIiwKLQkJCSBHRVRfTU9ERV9OQU1FIChmaXJzdF9s b29wX3ZpbmZvLT52ZWN0b3JfbW9kZSkpOwotICAgICAgbW9kZV9pID0gZmlyc3RfbG9vcF9p OwotICAgIH0KLSAgZWxzZQotICAgIHsKLSAgICAgIG1vZGVfaSA9IGZpcnN0X2xvb3BfbmV4 dF9pOwotICAgICAgaWYgKG1vZGVfaSA9PSB2ZWN0b3JfbW9kZXMubGVuZ3RoICgpKQotCXJl dHVybiBmaXJzdF9sb29wX3ZpbmZvOwotICAgIH0KKyAgLyogRm9yIGVwaWxvZ3VlcyBzdGFy dCB0aGUgYW5hbHlzaXMgZnJvbSB0aGUgZmlyc3QgbW9kZS4gIFRoZSBtb3RpdmF0aW9uCisg ICAgIGJlaGluZCBzdGFydGluZyBmcm9tIHRoZSBiZWdpbm5pbmcgY29tZXMgZnJvbSBjYXNl cyB3aGVyZSB0aGUgVkVDVE9SX01PREVTCisgICAgIGFycmF5IG1heSBjb250YWluIGxlbmd0 aCBhZ25vc3RpYyBhbmQgbGVuZ3RoIGZpeGVkIG1vZGVzLiAgVGhlaXIgb3JkZXJpbmcKKyAg ICAgaXMgbm90IGd1YXJhbnRlZWQsIHNvIHdlIGNvdWxkIGVuZCB1cCBwaWNraW5nIGEgbW9k ZSBmb3IgdGhlIG1haW4gbG9vcAorICAgICB0aGF0IGlzIGFmdGVyIHRoZSBlcGlsb2d1ZSdz IG9wdGltYWwgbW9kZS4gICovCisgIG1vZGVfaSA9IDA7CisKKyAgaWYgKGR1bXBfZW5hYmxl ZF9wICgpKQorICAgIGR1bXBfcHJpbnRmX2xvYyAoTVNHX05PVEUsIHZlY3RfbG9jYXRpb24s CisJCSAgICAgIioqKioqIFJlLXRyeWluZyBhbmFseXNpcyB3aXRoIGZpcnN0IHZlY3RvciBt b2RlIgorCQkgICAgICIgJXMgZm9yIGVwaWxvZ3VlLlxuIiwKKwkJICAgICBHRVRfTU9ERV9O QU1FICh2ZWN0b3JfbW9kZXNbbW9kZV9pXSkpOwogCiAgIC8qID8/PyAgSWYgZmlyc3RfbG9v cF92aW5mbyB3YXMgdXNpbmcgVk9JRG1vZGUgdGhlbiB3ZSBwcm9iYWJseQogICAgICB3YW50 IHRvIGluc3RlYWQgc2VhcmNoIGZvciB0aGUgY29ycmVzcG9uZGluZyBtb2RlIGluIHZlY3Rv cl9tb2Rlc1tdLiAgKi8K --------------dqa7UYM6fnmmHKlsG1CvCH9i--