From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 74ACC3857342 for ; Wed, 28 Sep 2022 15:05:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 74ACC3857342 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,352,1654588800"; d="scan'208";a="83659942" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 28 Sep 2022 07:05:43 -0800 IronPort-SDR: uJ/vOT5X0jwHoTXcl4jYi5+TnfldK30qIiRHW0ztfeiDxoPvyGP94RtdWVTALh7bfZyfLRA8Nf CVAlSfCHL6vaBB/xDE152qoZINf0HhzGILXoQGjTAL9UibJDnNtCK2oFF9eOyEIJUoR8MWkcM6 XUkuzSsaP5ewBZsNHmS3vx8X/OGNmsnfvNCSwZwXU8QxzRcCkkbUylv0/zbaAf6HdeKeniaNyi hYr86JiosSGsXpMTiPPINqGuKGfaW3Y0oYFIXidpYoBtpsYUsZLUQUYxoP0zGkJm8M8Ed0Qy2L gjY= Content-Type: multipart/mixed; boundary="------------2XjG4aqn2VoBPC7y00rPa35L" Message-ID: <87180de9-d0d4-b92f-405f-100aca3d5cf8@codesourcery.com> Date: Wed, 28 Sep 2022 16:05:38 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.3.0 Content-Language: en-GB From: Andrew Stubbs Subject: [PATCH] vect: while_ult for integer mask To: "gcc-patches@gcc.gnu.org" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --------------2XjG4aqn2VoBPC7y00rPa35L Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit This patch is a prerequisite for some amdgcn patches I'm working on to support shorter vector lengths (having fixed 64 lanes tends to miss optimizations, and masking is not supported everywhere yet). The problem is that, unlike AArch64, I'm not using different mask modes for different sized vectors, so all loops end up using the while_ultsidi pattern, regardless of vector length. In theory I could use SImode for V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so something else is needed. Moving to using vector masks in the backend is not a natural fit for GCN, and would be a huge task in any case. This patch adds an additional length operand so that we can distinguish the different uses in the back end and don't end up with more lanes enabled than there ought to be. I've made the extra operand conditional on the mode so that I don't have to modify the AArch64 backend; that uses while_ family of operators in a lot of places and uses iterators, so it would end up touching a lot of code just to add an inactive operand, plus I don't have a way to test it properly. I've confirmed that AArch64 builds and expands while_ult correctly in a simple example. OK for mainline? Thanks Andrew --------------2XjG4aqn2VoBPC7y00rPa35L Content-Type: text/plain; charset="UTF-8"; name="220928-while_ult-length.patch" Content-Disposition: attachment; filename="220928-while_ult-length.patch" Content-Transfer-Encoding: base64 dmVjdDogd2hpbGVfdWx0IGZvciBpbnRlZ2VyIG1hc2tzCgpBZGQgYSB2ZWN0b3IgbGVuZ3Ro IHBhcmFtZXRlciBuZWVkZWQgYnkgYW1kZ2NuIHdpdGhvdXQgYnJlYWtpbmcgYWFyY2g2NC4K CkFsbCBhbWRnY24gdmVjdG9yIG1hc2tzIGFyZSBESW1vZGUsIHJlZ2FyZGxlc3Mgb2YgdmVj dG9yIGxlbmd0aCwgc28gd2UgY2FuJ3QKdGVsbCB3aGF0IGxlbmd0aCBpcyBpbXBsaWVkIHNp bXBseSBmcm9tIHRoZSBvcGVyYXRvciBtb2RlLiAgKEV2ZW4gaWYgd2UgdXNlZApkaWZmZXJl bnQgaW50ZWdlciBtb2RlcyB0aGVyZSdzIG5vIG1vZGUgc21hbGwgZW5vdWdoIHRvIGRpZmZl cmVuY2lhdGUgYSAyIG9yCjQgbGFuZSBtYXNrKS4gIFdpdGhvdXQga25vd2luZyB0aGUgaW50 ZW5kZWQgbGVuZ3RoIHdlIGVuZCB1cCB1c2luZyBhIG1hc2sgd2l0aAp0b28gbWFueSBsYW5l cyBlbmFibGVkLCB3aGljaCBsZWFkcyB0byB1bmRlZmluZWQgYmVoYXZpb3VyLi4KClRoZSBl eHRyYSBvcGVyYW5kIGlzIG5vdCBhZGRlZCBmb3IgdmVjdG9yIG1hc2sgdHlwZXMgc28gQUFy Y2g2NCBkb2VzIG5vdCBuZWVkCnRvIGJlIGFkanVzdGVkLgoKZ2NjL0NoYW5nZUxvZzoKCgkq IGNvbmZpZy9nY24vZ2NuLXZhbHUubWQgKHdoaWxlX3VsdHNpZGkpOiBMaW1pdCBtYXNrIGxl bmd0aCB1c2luZwoJb3BlcmFuZCAzLgoJKiBkb2MvbWQudGV4aSAod2hpbGVfdWx0KTogRG9j dW1lbnQgbmV3IG9wZXJhbmQgMyB1c2FnZS4KCSogaW50ZXJuYWwtZm4uY2MgKGV4cGFuZF93 aGlsZV9vcHRhYl9mbik6IFNldCBvcGVyYW5kIDMgd2hlbiBsaHNfdHlwZQoJbWFwcyB0byBh IG5vbi12ZWN0b3IgbW9kZS4KCmRpZmYgLS1naXQgYS9nY2MvY29uZmlnL2djbi9nY24tdmFs dS5tZCBiL2djYy9jb25maWcvZ2NuL2djbi12YWx1Lm1kCmluZGV4IDNiZmRmODIxM2ZjLi5k ZWM4MWU4NjNmNyAxMDA2NDQKLS0tIGEvZ2NjL2NvbmZpZy9nY24vZ2NuLXZhbHUubWQKKysr IGIvZ2NjL2NvbmZpZy9nY24vZ2NuLXZhbHUubWQKQEAgLTMwNTIsNyArMzA1Miw4IEBAIChk ZWZpbmVfZXhwYW5kICJ2Y29uZHU8Vl9BTEw6bW9kZT48Vl9JTlQ6bW9kZT5fZXhlYyIKIChk ZWZpbmVfZXhwYW5kICJ3aGlsZV91bHRzaWRpIgogICBbKG1hdGNoX29wZXJhbmQ6REkgMCAi cmVnaXN0ZXJfb3BlcmFuZCIpCiAgICAobWF0Y2hfb3BlcmFuZDpTSSAxICIiKQotICAgKG1h dGNoX29wZXJhbmQ6U0kgMiAiIildCisgICAobWF0Y2hfb3BlcmFuZDpTSSAyICIiKQorICAg KG1hdGNoX29wZXJhbmQ6U0kgMyAiIildCiAgICIiCiAgIHsKICAgICBpZiAoR0VUX0NPREUg KG9wZXJhbmRzWzFdKSAhPSBDT05TVF9JTlQKQEAgLTMwNzcsNiArMzA3OCwxMSBAQCAoZGVm aW5lX2V4cGFuZCAid2hpbGVfdWx0c2lkaSIKIAkJCSAgICAgIDogfigodW5zaWduZWQgSE9T VF9XSURFX0lOVCktMSA8PCBkaWZmKSk7CiAJZW1pdF9tb3ZlX2luc24gKG9wZXJhbmRzWzBd LCBnZW5fcnR4X0NPTlNUX0lOVCAoVk9JRG1vZGUsIG1hc2spKTsKICAgICAgIH0KKyAgICBp ZiAoSU5UVkFMIChvcGVyYW5kc1szXSkgPCA2NCkKKyAgICAgIGVtaXRfaW5zbiAoZ2VuX2Fu ZGRpMyAob3BlcmFuZHNbMF0sIG9wZXJhbmRzWzBdLAorCQkJICAgICBnZW5fcnR4X0NPTlNU X0lOVCAoVk9JRG1vZGUsCisJCQkJCQl+KCh1bnNpZ25lZCBIT1NUX1dJREVfSU5UKS0xCisJ CQkJCQkgIDw8IElOVFZBTCAob3BlcmFuZHNbM10pKSkpKTsKICAgICBET05FOwogICB9KQog CmRpZmYgLS1naXQgYS9nY2MvZG9jL21kLnRleGkgYi9nY2MvZG9jL21kLnRleGkKaW5kZXgg ZDQ2OTYzZjQ2OGMuLmQ4ZTJhNWE4M2Y0IDEwMDY0NAotLS0gYS9nY2MvZG9jL21kLnRleGkK KysrIGIvZ2NjL2RvYy9tZC50ZXhpCkBAIC00OTUwLDkgKzQ5NTAsMTAgQEAgVGhpcyBwYXR0 ZXJuIGlzIG5vdCBhbGxvd2VkIHRvIEBjb2Rle0ZBSUx9LgogQGNpbmRleCBAY29kZXt3aGls ZV91bHRAdmFye219QHZhcntufX0gaW5zdHJ1Y3Rpb24gcGF0dGVybgogQGl0ZW0gQGNvZGV7 d2hpbGVfdWx0QHZhcnttfUB2YXJ7bn19CiBTZXQgb3BlcmFuZCAwIHRvIGEgbWFzayB0aGF0 IGlzIHRydWUgd2hpbGUgaW5jcmVtZW50aW5nIG9wZXJhbmQgMQotZ2l2ZXMgYSB2YWx1ZSB0 aGF0IGlzIGxlc3MgdGhhbiBvcGVyYW5kIDIuICBPcGVyYW5kIDAgaGFzIG1vZGUgQHZhcntu fQotYW5kIG9wZXJhbmRzIDEgYW5kIDIgYXJlIHNjYWxhciBpbnRlZ2VycyBvZiBtb2RlIEB2 YXJ7bX0uCi1UaGUgb3BlcmF0aW9uIGlzIGVxdWl2YWxlbnQgdG86CitnaXZlcyBhIHZhbHVl IHRoYXQgaXMgbGVzcyB0aGFuIG9wZXJhbmQgMiwgZm9yIGEgdmVjdG9yIGxlbmd0aCB1cCB0 byBvcGVyYW5kIDMuCitPcGVyYW5kIDAgaGFzIG1vZGUgQHZhcntufSBhbmQgb3BlcmFuZHMg MSB0byAzIGFyZSBzY2FsYXIgaW50ZWdlcnMgb2YgbW9kZQorQHZhcnttfS4gIE9wZXJhbmQg MyBzaG91bGQgYmUgb21pdHRlZCB3aGVuIEB2YXJ7bn0gaXMgYSB2ZWN0b3IgbW9kZS4gIFRo ZQorb3BlcmF0aW9uIGZvciB2ZWN0b3IgbW9kZXMgaXMgZXF1aXZhbGVudCB0bzoKIAogQHNt YWxsZXhhbXBsZQogb3BlcmFuZDBbMF0gPSBvcGVyYW5kMSA8IG9wZXJhbmQyOwpAQCAtNDk2 MCw2ICs0OTYxLDE0IEBAIGZvciAoaSA9IDE7IGkgPCBHRVRfTU9ERV9OVU5JVFMgKEB2YXJ7 bn0pOyBpKyspCiAgIG9wZXJhbmQwW2ldID0gb3BlcmFuZDBbaSAtIDFdICYmIChvcGVyYW5k MSArIGkgPCBvcGVyYW5kMik7CiBAZW5kIHNtYWxsZXhhbXBsZQogCitBbmQgZm9yIG5vbi12 ZWN0b3IgbW9kZXMgdGhlIG9wZXJhdGlvbiBpcyBlcXVpdmFsZW50IHRvOgorCitAc21hbGxl eGFtcGxlCitvcGVyYW5kMFswXSA9IG9wZXJhbmQxIDwgb3BlcmFuZDI7Citmb3IgKGkgPSAx OyBpIDwgb3BlcmFuZDM7IGkrKykKKyAgb3BlcmFuZDBbaV0gPSBvcGVyYW5kMFtpIC0gMV0g JiYgKG9wZXJhbmQxICsgaSA8IG9wZXJhbmQyKTsKK0BlbmQgc21hbGxleGFtcGxlCisKIEBj aW5kZXggQGNvZGV7Y2hlY2tfcmF3X3B0cnNAdmFye219fSBpbnN0cnVjdGlvbiBwYXR0ZXJu CiBAaXRlbSBAc2FtcHtjaGVja19yYXdfcHRyc0B2YXJ7bX19CiBDaGVjayB3aGV0aGVyLCBn aXZlbiB0d28gcG9pbnRlcnMgQHZhcnthfSBhbmQgQHZhcntifSBhbmQgYSBsZW5ndGggQHZh cntsZW59LApkaWZmIC0tZ2l0IGEvZ2NjL2ludGVybmFsLWZuLmNjIGIvZ2NjL2ludGVybmFs LWZuLmNjCmluZGV4IDY1MWQ5OWVhZWI5Li5jMzA2MjQwYzJhYyAxMDA2NDQKLS0tIGEvZ2Nj L2ludGVybmFsLWZuLmNjCisrKyBiL2djYy9pbnRlcm5hbC1mbi5jYwpAQCAtMzY2NCw3ICsz NjY0LDcgQEAgZXhwYW5kX2RpcmVjdF9vcHRhYl9mbiAoaW50ZXJuYWxfZm4gZm4sIGdjYWxs ICpzdG10LCBkaXJlY3Rfb3B0YWIgb3B0YWIsCiBzdGF0aWMgdm9pZAogZXhwYW5kX3doaWxl X29wdGFiX2ZuIChpbnRlcm5hbF9mbiwgZ2NhbGwgKnN0bXQsIGNvbnZlcnRfb3B0YWIgb3B0 YWIpCiB7Ci0gIGV4cGFuZF9vcGVyYW5kIG9wc1szXTsKKyAgZXhwYW5kX29wZXJhbmQgb3Bz WzRdOwogICB0cmVlIHJoc190eXBlWzJdOwogCiAgIHRyZWUgbGhzID0gZ2ltcGxlX2NhbGxf bGhzIChzdG10KTsKQEAgLTM2ODAsMTAgKzM2ODAsMjQgQEAgZXhwYW5kX3doaWxlX29wdGFi X2ZuIChpbnRlcm5hbF9mbiwgZ2NhbGwgKnN0bXQsIGNvbnZlcnRfb3B0YWIgb3B0YWIpCiAg ICAgICBjcmVhdGVfaW5wdXRfb3BlcmFuZCAoJm9wc1tpICsgMV0sIHJoc19ydHgsIFRZUEVf TU9ERSAocmhzX3R5cGVbaV0pKTsKICAgICB9CiAKKyAgaW50IG9wY250OworICBpZiAoIVZF Q1RPUl9NT0RFX1AgKFRZUEVfTU9ERSAobGhzX3R5cGUpKSkKKyAgICB7CisgICAgICAvKiBX aGVuIHRoZSBtYXNrIGlzIGFuIGludGVnZXIgbW9kZSB0aGUgZXhhY3QgdmVjdG9yIGxlbmd0 aCBtYXkgbm90CisJIGJlIGNsZWFyIHRvIHRoZSBiYWNrZW5kLCBzbyB3ZSBwYXNzIGl0IGlu IG9wZXJhbmRbM10uCisgICAgICAgICBVc2UgdGhlIHZlY3RvciBpbiBhcmcyIGZvciB0aGUg bW9zdCByZWxpYWJsZSBpbnRlbmRlZCBzaXplLiAgKi8KKyAgICAgIHRyZWUgdHlwZSA9IFRS RUVfVFlQRSAoZ2ltcGxlX2NhbGxfYXJnIChzdG10LCAyKSk7CisgICAgICBjcmVhdGVfaW50 ZWdlcl9vcGVyYW5kICgmb3BzWzNdLCBUWVBFX1ZFQ1RPUl9TVUJQQVJUUyAodHlwZSkpOwor ICAgICAgb3BjbnQgPSA0OworICAgIH0KKyAgZWxzZQorICAgIC8qIFRoZSBtYXNrIGhhcyBh IHZlY3RvciB0eXBlIHNvIHRoZSBsZW5ndGggb3BlcmFuZCBpcyB1bm5lY2Vzc2FyeS4gICov CisgICAgb3BjbnQgPSAzOworCiAgIGluc25fY29kZSBpY29kZSA9IGNvbnZlcnRfb3B0YWJf aGFuZGxlciAob3B0YWIsIFRZUEVfTU9ERSAocmhzX3R5cGVbMF0pLAogCQkJCQkgICBUWVBF X01PREUgKGxoc190eXBlKSk7CiAKLSAgZXhwYW5kX2luc24gKGljb2RlLCAzLCBvcHMpOwor ICBleHBhbmRfaW5zbiAoaWNvZGUsIG9wY250LCBvcHMpOwogICBpZiAoIXJ0eF9lcXVhbF9w IChsaHNfcnR4LCBvcHNbMF0udmFsdWUpKQogICAgIGVtaXRfbW92ZV9pbnNuIChsaHNfcnR4 LCBvcHNbMF0udmFsdWUpOwogfQo= --------------2XjG4aqn2VoBPC7y00rPa35L--