From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id E386D3858C56 for ; Tue, 26 Apr 2022 14:34:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E386D3858C56 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7826523A; Tue, 26 Apr 2022 07:34:18 -0700 (PDT) Received: from [10.57.10.193] (unknown [10.57.10.193]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E46633F73B; Tue, 26 Apr 2022 07:34:17 -0700 (PDT) Content-Type: multipart/mixed; boundary="------------VqTJAgdUKN7uwftTzqI4hr8z" Message-ID: <8462f41b-895f-9aca-499e-7713ec161673@arm.com> Date: Tue, 26 Apr 2022 15:34:12 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Content-Language: en-US From: "Andre Vieira (lists)" Subject: [PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment To: "gcc-patches@gcc.gnu.org" Cc: Richard Sandiford , Richard Biener X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Apr 2022 14:34:20 -0000 This is a multi-part message in MIME format. --------------VqTJAgdUKN7uwftTzqI4hr8z Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi, This patch disables epilogue vectorization when we are peeling for alignment in the prologue and we can't guarantee the main vectorized loop is entered.  This is to prevent executing vectorized code with an unaligned access if the target has indicated it wants to peel for alignment. We take this conservative approach as we currently do not distinguish between peeling for alignment for correctness or for performance. A better codegen would be to make it skip to the scalar epilogue in case the main loop isn't entered when alignment peeling is required. However, that would require a more aggressive change to the codebase which we chose to avoid at this point of development.  We can revisit this option during stage 1 if we choose to. Bootstrapped on aarch64-none-linux and regression tested on aarch64-none-elf. gcc/ChangeLog:     PR tree-optimization/105219     * tree-vect-loop.cc (vect_epilogue_when_peeling_for_alignment): New function.     (vect_analyze_loop): Use vect_epilogue_when_peeling_for_alignment to determine     whether to vectorize epilogue.     * testsuite/gcc.target/aarch64/pr105219.c: New.     * testsuite/gcc.target/aarch64/pr105219-2.c: New.     * testsuite/gcc.target/aarch64/pr105219-3.c: New. --------------VqTJAgdUKN7uwftTzqI4hr8z Content-Type: text/plain; charset=UTF-8; name="pr105219.patch" Content-Disposition: attachment; filename="pr105219.patch" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ByMTA1MjE5 LTIuYyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ByMTA1MjE5LTIuYwpu ZXcgZmlsZSBtb2RlIDEwMDY0NAppbmRleCAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw MDAwMDAwMDAwMDAwLi5jOTdkMWRjMTAwMTgxYjc3YWYwNzY2ZTA4NDA3ZTFlMzUyZjYwNGZl Ci0tLSAvZGV2L251bGwKKysrIGIvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQv cHIxMDUyMTktMi5jCkBAIC0wLDAgKzEsMjkgQEAKKy8qIHsgZGctZG8gcnVuIH0gKi8KKy8q IHsgZGctb3B0aW9ucyAiLU8zIC1tYXJjaD1hcm12OC4yLWEgLW10dW5lPXRodW5kZXJ4IC1m bm8tdmVjdC1jb3N0LW1vZGVsIiB9ICovCisvKiB7IGRnLXNraXAtaWYgImluY29tcGF0aWJs ZSBvcHRpb25zIiB7ICotKi0qIH0geyAiLW1hcmNoPSoiIH0geyAiLW1hcmNoPWFybXY4LjIt YSIgfSB9ICovCisvKiB7IGRnLXNraXAtaWYgImluY29tcGF0aWJsZSBvcHRpb25zIiB7ICot Ki0qIH0geyAiLW10dW5lPSoiIH0geyAiLW10dW5lPXRodW5kZXJ4IiB9IH0gKi8KKy8qIHsg ZGctc2tpcC1pZiAiaW5jb21wYXRpYmxlIG9wdGlvbnMiIHsgKi0qLSogfSB7ICItbWNwdT0q IiB9IH0gKi8KKy8qIFBSIDEwNTIxOS4gICovCitpbnQgZGF0YVsxMjhdOworCit2b2lkIF9f YXR0cmlidXRlKChub2lwYSkpCitmb28gKGludCAqZGF0YSwgaW50IG4pCit7CisgIGZvciAo aW50IGkgPSAwOyBpIDwgbjsgKytpKQorICAgIGRhdGFbaV0gPSBpOworfQorCitpbnQgbWFp bigpCit7CisgIGZvciAoaW50IHN0YXJ0ID0gMDsgc3RhcnQgPCAxNjsgKytzdGFydCkKKyAg ICBmb3IgKGludCBuID0gMTsgbiA8IDMqMTY7ICsrbikKKyAgICAgIHsKKyAgICAgICAgX19i dWlsdGluX21lbXNldCAoZGF0YSwgMCwgc2l6ZW9mIChkYXRhKSk7CisgICAgICAgIGZvbyAo JmRhdGFbc3RhcnRdLCBuKTsKKyAgICAgICAgZm9yIChpbnQgaiA9IDA7IGogPCBuOyArK2op CisgICAgICAgICAgaWYgKGRhdGFbc3RhcnQgKyBqXSAhPSBqKQorICAgICAgICAgICAgX19i dWlsdGluX2Fib3J0ICgpOworICAgICAgfQorICByZXR1cm4gMDsKK30KKwpkaWZmIC0tZ2l0 IGEvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvcHIxMDUyMTktMy5jIGIvZ2Nj L3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvcHIxMDUyMTktMy5jCm5ldyBmaWxlIG1v ZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw MDAuLjQ0NDM1MmZjMDUxYjc4NzM2OWY2ZjFiZTYyMzZkMWZmMGZjMmQzOTIKLS0tIC9kZXYv bnVsbAorKysgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2NC9wcjEwNTIxOS0z LmMKQEAgLTAsMCArMSwxNSBAQAorLyogeyBkZy1kbyBjb21waWxlIH0gKi8KKy8qIHsgZGct c2tpcC1pZiAiaW5jb21wYXRpYmxlIG9wdGlvbnMiIHsgKi0qLSogfSB7ICItbWFyY2g9KiIg fSB7ICItbWFyY2g9YXJtdjguMi1hIiB9IH0gKi8KKy8qIHsgZGctc2tpcC1pZiAiaW5jb21w YXRpYmxlIG9wdGlvbnMiIHsgKi0qLSogfSB7ICItbXR1bmU9KiIgfSB7ICItbXR1bmU9dGh1 bmRlcngiIH0gfSAqLworLyogeyBkZy1za2lwLWlmICJpbmNvbXBhdGlibGUgb3B0aW9ucyIg eyAqLSotKiB9IHsgIi1tY3B1PSoiIH0gfSAqLworLyogeyBkZy1vcHRpb25zICItTzMgLW1h cmNoPWFybXY4LjItYSAtbXR1bmU9dGh1bmRlcnggLWZuby12ZWN0LWNvc3QtbW9kZWwgLWZk dW1wLXRyZWUtdmVjdC1hbGwiIH0gKi8KKy8qIFBSIDEwNTIxOS4gICovCitpbnQgZGF0YVsx MjhdOworCit2b2lkIGZvbyAodm9pZCkKK3sKKyAgZm9yIChpbnQgaSA9IDA7IGkgPCA5OyAr K2kpCisgICAgZGF0YVtpICsgMV0gPSBpOworfQorCisvKiB7IGRnLWZpbmFsIHsgc2Nhbi10 cmVlLWR1bXAgIkVQSUxPR1VFIFZFQ1RPUklaRUQiICJ2ZWN0IiB9IH0gKi8KZGlmZiAtLWdp dCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ByMTA1MjE5LmMgYi9nY2Mv dGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2NC9wcjEwNTIxOS5jCm5ldyBmaWxlIG1vZGUg MTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAu LmJiZGVmYjU0OWY2YTRlODAzODUyZjY5ZDIwY2UxZWY5MTUyYTUyNmMKLS0tIC9kZXYvbnVs bAorKysgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2NC9wcjEwNTIxOS5jCkBA IC0wLDAgKzEsMjggQEAKKy8qIHsgZGctZG8gcnVuIHsgdGFyZ2V0IGFhcmNoNjRfc3ZlMTI4 X2h3IH0gfSAqLworLyogeyBkZy1za2lwLWlmICJpbmNvbXBhdGlibGUgb3B0aW9ucyIgeyAq LSotKiB9IHsgIi1tYXJjaD0qIiB9IHsgIi1tYXJjaD1hcm12OC4yLWErc3ZlIiB9IH0gKi8K Ky8qIHsgZGctc2tpcC1pZiAiaW5jb21wYXRpYmxlIG9wdGlvbnMiIHsgKi0qLSogfSB7ICIt bXR1bmU9KiIgfSB7ICItbXR1bmU9dGh1bmRlcngiIH0gfSAqLworLyogeyBkZy1za2lwLWlm ICJpbmNvbXBhdGlibGUgb3B0aW9ucyIgeyAqLSotKiB9IHsgIi1tY3B1PSoiIH0gfSAqLwor LyogeyBkZy1za2lwLWlmICJpbmNvbXBhdGlibGUgb3B0aW9ucyIgeyAqLSotKiB9IHsgIi1t c3ZlLXZlY3Rvci1iaXRzPSoiIH0geyAiLW1zdmUtdmVjdG9yLWJpdHM9MTI4IiB9IH0gKi8K Ky8qIHsgZGctb3B0aW9ucyAiLU8zIC1tYXJjaD1hcm12OC4yLWErc3ZlIC1tc3ZlLXZlY3Rv ci1iaXRzPTEyOCAtbXR1bmU9dGh1bmRlcngiIH0gKi8KKy8qIFBSIDEwNTIxOS4gICovCitp bnQgYTsKK2NoYXIgYls2MF07CitzaG9ydCBjWzE4XTsKK3Nob3J0IGRbNF1bMTldOworbG9u ZyBsb25nIGY7Cit2b2lkIGUoaW50IGcsIGludCBoLCBzaG9ydCBrW11bMTldKSB7CisgIGZv ciAoc2lnbmVkIGkgPSAwOyBpIDwgMzsgaSArPSAyKQorICAgIGZvciAoc2lnbmVkIGogPSAx OyBqIDwgaCArIDE0OyBqKyspIHsKKyAgICAgIGJbaSAqIDE0ICsgal0gPSAxOworICAgICAg Y1tpICsgal0gPSBrWzJdW2pdOworICAgICAgYSA9IGcgPyBrW2ldW2pdIDogMDsKKyAgICB9 Cit9CitpbnQgbWFpbigpIHsKKyAgZSg5LCAxLCBkKTsKKyAgZm9yIChsb25nIGwgPSAwOyBs IDwgNjsgKytsKQorICAgIGZvciAobG9uZyBtID0gMDsgbSA8IDQ7ICsrbSkKKyAgICAgIGYg Xj0gYltsICsgbSAqIDRdOworICBpZiAoZikKKyAgICBfX2J1aWx0aW5fYWJvcnQgKCk7Cit9 CmRpZmYgLS1naXQgYS9nY2MvdHJlZS12ZWN0LWxvb3AuY2MgYi9nY2MvdHJlZS12ZWN0LWxv b3AuY2MKaW5kZXggZDdiYzM0NjM2YmQ1MmIyZjY3Y2RlY2QzZGMxNmZjZmY2ODRkYmEwNy4u YTIzZTYxODFkZWM4MTI2YmNiNjkxZWE5NDc0MDk1YmY2NTQ4Mzg2MyAxMDA2NDQKLS0tIGEv Z2NjL3RyZWUtdmVjdC1sb29wLmNjCisrKyBiL2djYy90cmVlLXZlY3QtbG9vcC5jYwpAQCAt Mjk0Miw2ICsyOTQyLDM4IEBAIHZlY3RfYW5hbHl6ZV9sb29wXzEgKGNsYXNzIGxvb3AgKmxv b3AsIHZlY19pbmZvX3NoYXJlZCAqc2hhcmVkLAogICByZXR1cm4gb3B0X2xvb3BfdmVjX2lu Zm86OnN1Y2Nlc3MgKGxvb3BfdmluZm8pOwogfQogCisvKiBGdW5jdGlvbiB2ZWN0X2VwaWxv Z3VlX3doZW5fcGVlbGluZ19mb3JfYWxpZ25tZW50CisKKyAgIFBSIDEwNTIxOTogSWYgd2Ug YXJlIHBlZWxpbmcgZm9yIGFsaWdubWVudCBpbiB0aGUgcHJvbG9ndWUgdGhlbiB3ZSBkbyBu b3QKKyAgIHZlY3Rvcml6ZSB0aGUgZXBpbG9ndWUgdW5sZXNzIHdlIGFyZSBjZXJ0YWluIHdl IHdpbGwgZW50ZXIgdGhlIG1haW4KKyAgIHZlY3Rvcml6ZWQgbG9vcC4gIFRoaXMgaXMgdG8g cHJldmVudCBlbnRlcmluZyB0aGUgdmVjdG9yaXplZCBlcGlsb2d1ZSBpbgorICAgY2FzZSB0 aGVyZSBhcmVuJ3QgZW5vdWdoIGl0ZXJhdGlvbnMgdG8gZW50ZXIgdGhlIG1haW4gbG9vcC4K KyovCisKK3N0YXRpYyBib29sCit2ZWN0X2VwaWxvZ3VlX3doZW5fcGVlbGluZ19mb3JfYWxp Z25tZW50IChsb29wX3ZlY19pbmZvIGxvb3BfdmluZm8pCit7CisgIGlmICh2ZWN0X3VzZV9s b29wX21hc2tfZm9yX2FsaWdubWVudF9wIChsb29wX3ZpbmZvKSkKKyAgICByZXR1cm4gdHJ1 ZTsKKworICBpbnQgcHJvbG9ndWVfcGVlbGluZyA9IExPT1BfVklORk9fUEVFTElOR19GT1Jf QUxJR05NRU5UIChsb29wX3ZpbmZvKTsKKyAgaWYgKHByb2xvZ3VlX3BlZWxpbmcgPiAwICYm IExPT1BfVklORk9fTklURVJTX0tOT1dOX1AgKGxvb3BfdmluZm8pKQorICAgIHsKKyAgICAg IHBvbHlfdWludDY0IG5pdGVyc19mb3JfbWFpbgorCT0gdXBwZXJfYm91bmQgKExPT1BfVklO Rk9fVkVDVF9GQUNUT1IgKGxvb3BfdmluZm8pLAorCQkgICAgICAgTE9PUF9WSU5GT19DT1NU X01PREVMX1RIUkVTSE9MRCAobG9vcF92aW5mbykpOworICAgICAgbml0ZXJzX2Zvcl9tYWlu CisJPSB1cHBlcl9ib3VuZCAobml0ZXJzX2Zvcl9tYWluLAorCQkgICAgICAgTE9PUF9WSU5G T19WRVJTSU9OSU5HX1RIUkVTSE9MRCAobG9vcF92aW5mbykpOworICAgICAgbml0ZXJzX2Zv cl9tYWluICs9IHByb2xvZ3VlX3BlZWxpbmc7CisgICAgICBpZiAobWF5YmVfbGUgKExPT1Bf VklORk9fSU5UX05JVEVSUyAobG9vcF92aW5mbyksIG5pdGVyc19mb3JfbWFpbikpCisJcmV0 dXJuIGZhbHNlOworICAgIH0KKyAgZWxzZSBpZiAocHJvbG9ndWVfcGVlbGluZyA8IDApCisg ICAgcmV0dXJuIGZhbHNlOworICByZXR1cm4gdHJ1ZTsKK30KKwogLyogRnVuY3Rpb24gdmVj dF9hbmFseXplX2xvb3AuCiAKICAgIEFwcGx5IGEgc2V0IG9mIGFuYWx5c2VzIG9uIExPT1As IGFuZCBjcmVhdGUgYSBsb29wX3ZlY19pbmZvIHN0cnVjdApAQCAtMzE1MSw3ICszMTgzLDgg QEAgdmVjdF9hbmFseXplX2xvb3AgKGNsYXNzIGxvb3AgKmxvb3AsIHZlY19pbmZvX3NoYXJl ZCAqc2hhcmVkKQogCQl9CiAJICAgIH0KIAkgIC8qIEZvciBub3cgb25seSBhbGxvdyBvbmUg ZXBpbG9ndWUgbG9vcC4gICovCi0JICBpZiAoZmlyc3RfbG9vcF92aW5mby0+ZXBpbG9ndWVf dmluZm9zLmlzX2VtcHR5ICgpKQorCSAgaWYgKGZpcnN0X2xvb3BfdmluZm8tPmVwaWxvZ3Vl X3ZpbmZvcy5pc19lbXB0eSAoKQorCSAgICAgICYmIHZlY3RfZXBpbG9ndWVfd2hlbl9wZWVs aW5nX2Zvcl9hbGlnbm1lbnQgKGZpcnN0X2xvb3BfdmluZm8pKQogCSAgICB7CiAJICAgICAg Zmlyc3RfbG9vcF92aW5mby0+ZXBpbG9ndWVfdmluZm9zLnNhZmVfcHVzaCAobG9vcF92aW5m byk7CiAJICAgICAgcG9seV91aW50NjQgdGggPSBMT09QX1ZJTkZPX1ZFUlNJT05JTkdfVEhS RVNIT0xEIChsb29wX3ZpbmZvKTsK --------------VqTJAgdUKN7uwftTzqI4hr8z--