From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id 5EE6A3858D28 for ; Wed, 18 Jan 2023 10:48:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5EE6A3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x435.google.com with SMTP id r2so33483656wrv.7 for ; Wed, 18 Jan 2023 02:48:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=UMAdPNltZdygOv91HII69IKayI58kJhn/VReWuc/j5w=; b=ZRw3ALeDsW3890p/ZkbScei+lQeQp5bBmWyoVgYdPbID/N84WnLuTdlqu3ZouBb6/h oFOmwk4ct15jpq6+1rtHypyrPce70SjpWgPFUG7QsrVo0CJeCttbtIJe1Y4HgpCBuhm1 YJrMx5gmsK1xlvKKevWqO2uQ/aC1zbb6kpxuKwEIElSYbp3NlrkbpVE/AuR9+kL1uG9o O8/i8WH3tWV0ytu4GNuOc7x+wGF7xpBKoAl5H3bZK4RWUR+m6SRoZh51oExLVe6q5JVj 5t1hzdCa6uE+OQW3wexkp+yFial5IILJkBoGmA6LFWVQIJ4sI+gUo4Brdqdjw+ZA9WKK CJXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UMAdPNltZdygOv91HII69IKayI58kJhn/VReWuc/j5w=; b=i1CgZpNMxnC1syR//0QVGgVrgpqoCOtzD6+r+CTIGWMLKCc1Hy6IIyFbdk3N8Cru7v 9zOHtbjhrT4/W+hR1XTVeiyHsIjFEqMfploSejli4FZ1dD9Sdb/WnepUy5+jyydwbXcC QI6EdmFrRF3aO6/ySRX01hjdbw+ieoTMX3smyfvIxZhzddEkC13o0DLtMyJq4zunjvgc S5Zpg5ZLaUWS7HfVUFahDc8F5Q34BwEZwKSiENv2BFfG6kcIuKzF/Dujsq8qOvCboDMZ LzhB6ai7+8WPSWbAk+Ao2Bgf+mewfCRzJK5wSHuQ8eooveDriWYOPJXaVrlimRgpyvLd sz1A== X-Gm-Message-State: AFqh2ko9ydYADp0nEDzJwFUvbeRDyag9xVVRYG3na4VokrJXAyakM+wK fftNiiORRGb1DP9lLn9BNWQ/9yKxYik1EccrDSH7lA== X-Google-Smtp-Source: AMrXdXsdjsbwxYnUD0u6DquuTcS5SxNNhnKZXgRU9hlHQ5OMjM6FpVq0g7/yWaS2XSKhpW2GiTngKeEKsHqijC9pZ60= X-Received: by 2002:adf:e5c1:0:b0:242:1b69:786c with SMTP id a1-20020adfe5c1000000b002421b69786cmr176896wrn.486.1674038888056; Wed, 18 Jan 2023 02:48:08 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Wed, 18 Jan 2023 16:17:31 +0530 Message-ID: Subject: Re: [aarch64] Use wzr/xzr for assigning vector element to 0 To: Prathamesh Kulkarni , gcc Patches , richard.sandiford@arm.com Content-Type: multipart/mixed; boundary="000000000000437c3a05f2878d66" X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000437c3a05f2878d66 Content-Type: text/plain; charset="UTF-8" On Tue, 17 Jan 2023 at 18:29, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > Hi Richard, > > For the following (contrived) test: > > > > void foo(int32x4_t v) > > { > > v[3] = 0; > > return v; > > } > > > > -O2 code-gen: > > foo: > > fmov s1, wzr > > ins v0.s[3], v1.s[0] > > ret > > > > I suppose we can instead emit the following code-gen ? > > foo: > > ins v0.s[3], wzr > > ret > > > > combine produces: > > Failed to match this instruction: > > (set (reg:V4SI 95 [ v ]) > > (vec_merge:V4SI (const_vector:V4SI [ > > (const_int 0 [0]) repeated x4 > > ]) > > (reg:V4SI 97) > > (const_int 8 [0x8]))) > > > > So, I wrote the following pattern to match the above insn: > > (define_insn "aarch64_simd_vec_set_zero" > > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > > (vec_merge:VALL_F16 > > (match_operand:VALL_F16 1 "const_dup0_operand" "w") > > (match_operand:VALL_F16 3 "register_operand" "0") > > (match_operand:SI 2 "immediate_operand" "i")))] > > "TARGET_SIMD" > > { > > int elt = ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2]))); > > operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > > return "ins\\t%0.[%p2], wzr"; > > } > > ) > > > > which now matches the above insn produced by combine. > > However, in reload dump, it creates a new insn for assigning > > register to (const_vector (const_int 0)), > > which results in: > > (insn 19 8 13 2 (set (reg:V4SI 33 v1 [99]) > > (const_vector:V4SI [ > > (const_int 0 [0]) repeated x4 > > ])) "wzr-test.c":8:1 1269 {*aarch64_simd_movv4si} > > (nil)) > > (insn 13 19 14 2 (set (reg/i:V4SI 32 v0) > > (vec_merge:V4SI (reg:V4SI 33 v1 [99]) > > (reg:V4SI 32 v0 [97]) > > (const_int 8 [0x8]))) "wzr-test.c":8:1 1808 > > {aarch64_simd_vec_set_zerov4si} > > (nil)) > > > > and eventually the code-gen: > > foo: > > movi v1.4s, 0 > > ins v0.s[3], wzr > > ret > > > > To get rid of redundant assignment of 0 to v1, I tried to split the > > above pattern > > as in the attached patch. This works to emit code-gen: > > foo: > > ins v0.s[3], wzr > > ret > > > > However, I am not sure if this is the right approach. Could you suggest, > > if it'd be possible to get rid of UNSPEC_SETZERO in the patch ? > > The problem is with the "w" constraint on operand 1, which tells LRA > to force the zero into an FPR. It should work if you remove the > constraint. Ah indeed, sorry about that, changing the constrained works. Does the attached patch look OK after bootstrap+test ? Since we're in stage-4, shall it be OK to commit now, or queue it for stage-1 ? Thanks, Prathamesh > > Also, I think you'll need to use zr for the zero, so that > it uses xzr for 64-bit elements. > > I think this and the existing patterns ought to test > exact_log2 (INTVAL (operands[2])) >= 0 in the insn condition, > since there's no guarantee that RTL optimisations won't form > vec_merges that have other masks. > > Thanks, > Richard --000000000000437c3a05f2878d66 Content-Type: text/plain; charset="US-ASCII"; name="gnu-811-5.txt" Content-Disposition: attachment; filename="gnu-811-5.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_ld1j71fr0 W2FhcmNoNjRdIFVzZSB3enIveHpyIGZvciBhc3NpZ25pbmcgMCB0byB2ZWN0b3IgZWxlbWVudC4K CmdjYy9DaGFuZ2VMb2c6CgkqIGNvbmZpZy9hYWFyY2g2NC9hYXJjaDY0LXNpbWQubWQgKGFhcmNo NjRfc2ltZF92ZWNfc2V0X3plcm88bW9kZT4pOgoJTmV3IHBhdHRlcm4uCgkqIGNvbmZpZy9hYXJj aDY0L3ByZWRpY2F0ZXMubWQgKGNvbnN0X2R1cDBfb3BlcmFuZCk6IE5ldy4KCmRpZmYgLS1naXQg YS9nY2MvY29uZmlnL2FhcmNoNjQvYWFyY2g2NC1zaW1kLm1kIGIvZ2NjL2NvbmZpZy9hYXJjaDY0 L2FhcmNoNjQtc2ltZC5tZAppbmRleCAxMDQwODhmNjdkMi4uOGU1NGVlNGU4ODYgMTAwNjQ0Ci0t LSBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LXNpbWQubWQKKysrIGIvZ2NjL2NvbmZpZy9h YXJjaDY0L2FhcmNoNjQtc2ltZC5tZApAQCAtMTA4Myw2ICsxMDgzLDIwIEBACiAgIFsoc2V0X2F0 dHIgInR5cGUiICJuZW9uX2luczxxPiwgbmVvbl9mcm9tX2dwPHE+LCBuZW9uX2xvYWQxX29uZV9s YW5lPHE+IildCiApCiAKKyhkZWZpbmVfaW5zbiAiYWFyY2g2NF9zaW1kX3ZlY19zZXRfemVybzxt b2RlPiIKKyAgWyhzZXQgKG1hdGNoX29wZXJhbmQ6VkFMTF9GMTYgMCAicmVnaXN0ZXJfb3BlcmFu ZCIgIj13IikKKwkodmVjX21lcmdlOlZBTExfRjE2CisJICAgIChtYXRjaF9vcGVyYW5kOlZBTExf RjE2IDEgImNvbnN0X2R1cDBfb3BlcmFuZCIgImkiKQorCSAgICAobWF0Y2hfb3BlcmFuZDpWQUxM X0YxNiAzICJyZWdpc3Rlcl9vcGVyYW5kIiAiMCIpCisJICAgIChtYXRjaF9vcGVyYW5kOlNJIDIg ImltbWVkaWF0ZV9vcGVyYW5kIiAiaSIpKSldCisgICJUQVJHRVRfU0lNRCAmJiBleGFjdF9sb2cy IChJTlRWQUwgKG9wZXJhbmRzWzJdKSkgPj0gMCIKKyAgeworICAgIGludCBlbHQgPSBFTkRJQU5f TEFORV9OICg8bnVuaXRzPiwgZXhhY3RfbG9nMiAoSU5UVkFMIChvcGVyYW5kc1syXSkpKTsKKyAg ICBvcGVyYW5kc1syXSA9IEdFTl9JTlQgKChIT1NUX1dJREVfSU5UKSAxIDw8IGVsdCk7CisgICAg cmV0dXJuICJpbnNcXHQlMC48VmV0eXBlPlslcDJdLCA8dndjb3JlPnpyIjsKKyAgfQorKQorCiAo ZGVmaW5lX2luc24gIkBhYXJjaDY0X3NpbWRfdmVjX2NvcHlfbGFuZTxtb2RlPiIKICAgWyhzZXQg KG1hdGNoX29wZXJhbmQ6VkFMTF9GMTYgMCAicmVnaXN0ZXJfb3BlcmFuZCIgIj13IikKIAkodmVj X21lcmdlOlZBTExfRjE2CmRpZmYgLS1naXQgYS9nY2MvY29uZmlnL2FhcmNoNjQvcHJlZGljYXRl cy5tZCBiL2djYy9jb25maWcvYWFyY2g2NC9wcmVkaWNhdGVzLm1kCmluZGV4IGZmN2Y3M2QzZjMw Li45MDFmYTFiZDdmOSAxMDA2NDQKLS0tIGEvZ2NjL2NvbmZpZy9hYXJjaDY0L3ByZWRpY2F0ZXMu bWQKKysrIGIvZ2NjL2NvbmZpZy9hYXJjaDY0L3ByZWRpY2F0ZXMubWQKQEAgLTQ5LDYgKzQ5LDEz IEBACiAgIHJldHVybiBDT05TVF9JTlRfUCAob3ApICYmIElOX1JBTkdFIChJTlRWQUwgKG9wKSwg MSwgMyk7CiB9KQogCisoZGVmaW5lX3ByZWRpY2F0ZSAiY29uc3RfZHVwMF9vcGVyYW5kIgorICAo bWF0Y2hfY29kZSAiY29uc3RfdmVjdG9yIikKK3sKKyAgb3AgPSB1bndyYXBfY29uc3RfdmVjX2R1 cGxpY2F0ZSAob3ApOworICByZXR1cm4gQ09OU1RfSU5UX1AgKG9wKSAmJiBydHhfZXF1YWxfcCAo b3AsIGNvbnN0MF9ydHgpOworfSkKKwogKGRlZmluZV9wcmVkaWNhdGUgInN1YnJlZ19sb3dwYXJ0 X29wZXJhdG9yIgogICAoaW9yIChtYXRjaF9jb2RlICJ0cnVuY2F0ZSIpCiAgICAgICAgKGFuZCAo bWF0Y2hfY29kZSAic3VicmVnIikK --000000000000437c3a05f2878d66--