From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id 86DD03858D39 for ; Mon, 3 Apr 2023 16:34:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 86DD03858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x435.google.com with SMTP id q19so26932701wrc.5 for ; Mon, 03 Apr 2023 09:34:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1680539642; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=C+8+2o1GpwYcvokYSzC5e4zV3zTuGMJqopEMfGn7uxM=; b=bRz+saO9YpAhNQ5oMGxQsv/1QDtVG97wh34D6aKLxbsUJOfcxqnUmbwIDRgOHd2kjp XFh4HnaWzvBUB7KhWMvaldj0JFaJ2nrBxrkZAfRrEi0LPmC7IfQzIIKVjaruFKK16W6z OFt6RUJ1GA1OH8gJ0O7vRaDBwgkhaaEp+WzyzHpuPAf99r7RoVpuXVnoAdQfNPuLjemd QTGIbo9OObQna+hmLE1zyokFyFti+gmvt2r8QzCobPmpzFaMduuC5mA8OrQwLJNIxMfH pllL11G7D5d9aCAU7blbB9zoBdjeb64TaIuOD5Y2oKJ7ypXJs40LDj7U+HgzFNB/KDfL tEKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680539642; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=C+8+2o1GpwYcvokYSzC5e4zV3zTuGMJqopEMfGn7uxM=; b=7ZMk66yGVnZyxyI5CsLBZXe6LckQP9loTtjnq/ZDa4BRE7uyAR/qjiLO99izBImY7X /AWNEUz8ckcmbFhrQxKZjVz0G63kNcx1akUoA/948sy6k1UZlHw4GuRYP35giIML0B0p YTlOh/pmaw6OK/442CS6qei/Nh3rl84j0fnkBwdUjI4VjMHQMo+lGL9ktMYRYC8+B4R8 XL0Cqw5NttD/PLukga6O2CLkgfASwnqhw6btAo3ccJkoKVZCGk859zVU10pHpUWwe/ne 3dYnGDRvP9w91/2TQB64A3V0RGdGNc+NbIQqWroOlutUadw1Y7HttLNw1+3p+0uh7emH SrPA== X-Gm-Message-State: AAQBX9feYOLybxx1nwCxZ8EUrSIcual5dCFKSUt4bJm1UPLpvTXYAYnp SkfdCuLHFzR/NWzbuDhEggMyDxzAWk1fiR8hNq/Lnw== X-Google-Smtp-Source: AKy350YTqALyaCYsvTZfMxM5MiEaCmpDsTII+5d8/lG5gTPj+mkxNmAWJaTPAwQ5X5K56L8M/CsLktwsvRacLYFahlw= X-Received: by 2002:a5d:5152:0:b0:2c7:17b8:5759 with SMTP id u18-20020a5d5152000000b002c717b85759mr6162671wrt.3.1680539642152; Mon, 03 Apr 2023 09:34:02 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Mon, 3 Apr 2023 22:03:24 +0530 Message-ID: Subject: Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector To: Richard Biener Cc: Richard Sandiford , gcc Patches Content-Type: multipart/mixed; boundary="00000000000067003705f87120bf" X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --00000000000067003705f87120bf Content-Type: text/plain; charset="UTF-8" On Mon, 13 Mar 2023 at 13:03, Richard Biener wrote: > > On Fri, 10 Mar 2023, Richard Sandiford wrote: > > > Sorry for the slow reply. > > > > Prathamesh Kulkarni writes: > > > Unfortunately it regresses code-gen for the following case: > > > > > > svint32_t f(int32x4_t x) > > > { > > > return svdupq_s32 (x[0], x[1], x[2], x[3]); > > > } > > > > > > -O2 code-gen with trunk: > > > f: > > > dup z0.q, z0.q[0] > > > ret > > > > > > -O2 code-gen with patch: > > > f: > > > dup s1, v0.s[1] > > > mov v2.8b, v0.8b > > > ins v1.s[1], v0.s[3] > > > ins v2.s[1], v0.s[2] > > > zip1 v0.4s, v2.4s, v1.4s > > > dup z0.q, z0.q[0] > > > ret > > > > > > IIUC, svdupq_impl::expand uses aarch64_expand_vector_init > > > to initialize the "base 128-bit vector" and then use dupq to replicate it. > > > > > > Without patch, aarch64_expand_vector_init generates fallback code, and then > > > combine optimizes a sequence of vec_merge/vec_select pairs into an assignment: > > > > > > (insn 7 3 8 2 (set (reg:SI 99) > > > (vec_select:SI (reg/v:V4SI 97 [ x ]) > > > (parallel [ > > > (const_int 1 [0x1]) > > > ]))) "bar.c":6:10 2592 {aarch64_get_lanev4si} > > > (nil)) > > > > > > (insn 13 9 15 2 (set (reg:V4SI 102) > > > (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 99)) > > > (reg/v:V4SI 97 [ x ]) > > > (const_int 2 [0x2]))) "bar.c":6:10 1794 {aarch64_simd_vec_setv4si} > > > (expr_list:REG_DEAD (reg:SI 99) > > > (expr_list:REG_DEAD (reg/v:V4SI 97 [ x ]) > > > (nil)))) > > > > > > into: > > > Trying 7 -> 13: > > > 7: r99:SI=vec_select(r97:V4SI,parallel) > > > 13: r102:V4SI=vec_merge(vec_duplicate(r99:SI),r97:V4SI,0x2) > > > REG_DEAD r99:SI > > > REG_DEAD r97:V4SI > > > Successfully matched this instruction: > > > (set (reg:V4SI 102) > > > (reg/v:V4SI 97 [ x ])) > > > > > > which eventually results into: > > > (note 2 25 3 2 NOTE_INSN_DELETED) > > > (note 3 2 7 2 NOTE_INSN_FUNCTION_BEG) > > > (note 7 3 8 2 NOTE_INSN_DELETED) > > > (note 8 7 9 2 NOTE_INSN_DELETED) > > > (note 9 8 13 2 NOTE_INSN_DELETED) > > > (note 13 9 15 2 NOTE_INSN_DELETED) > > > (note 15 13 17 2 NOTE_INSN_DELETED) > > > (note 17 15 18 2 NOTE_INSN_DELETED) > > > (note 18 17 22 2 NOTE_INSN_DELETED) > > > (insn 22 18 23 2 (parallel [ > > > (set (reg/i:VNx4SI 32 v0) > > > (vec_duplicate:VNx4SI (reg:V4SI 108))) > > > (clobber (scratch:VNx16BI)) > > > ]) "bar.c":7:1 5202 {aarch64_vec_duplicate_vqvnx4si_le} > > > (expr_list:REG_DEAD (reg:V4SI 108) > > > (nil))) > > > (insn 23 22 0 2 (use (reg/i:VNx4SI 32 v0)) "bar.c":7:1 -1 > > > (nil)) > > > > > > I was wondering if we should add the above special case, of assigning > > > target = vec in aarch64_expand_vector_init, if initializer is { > > > vec[0], vec[1], ... } ? > > > > I'm not sure it will be easy to detect that. Won't the inputs to > > aarch64_expand_vector_init just be plain registers? It's not a > > good idea in general to search for definitions of registers > > during expansion. > > > > It would be nice to fix this by lowering svdupq into: > > > > (a) a constructor for a 128-bit vector > > (b) a duplication of the 128-bit vector to fill an SVE vector > > > > But I'm not sure what the best way of doing (b) would be. > > In RTL we can use vec_duplicate, but I don't think gimple > > has an equivalent construct. Maybe Richi has some ideas. > > On GIMPLE it would be > > _1 = { a, ... }; // (a) > _2 = { _1, ... }; // (b) > > but I'm not sure if (b), a VL CTOR of fixed len(?) sub-vectors is > possible? But at least a CTOR of vectors is what we use to > concat vectors. > > With the recent relaxing of VEC_PERM inputs it's also possible to > express (b) with a VEC_PERM: > > _2 = VEC_PERM <_1, _1, { 0, 1, 2, 3, 0, 1, 2, 3, ... }> > > but again I'm not sure if that repeating 0, 1, 2, 3 is expressible > for VL vectors (maybe we'd allow "wrapping" here, I'm not sure). > Hi, Thanks for the suggestions and sorry for late response in turn. The attached patch tries to fix the issue by explicitly constructing a CTOR from svdupq's arguments and then using VEC_PERM_EXPR with VL mask having encoded elements {0, 1, ... nargs-1}, npatterns == nargs, and nelts_per_pattern == 1, to replicate the base vector. So for example, for the above case, svint32_t f_32(int32x4_t x) { return svdupq_s32 (x[0], x[1], x[2], x[3]); } forwprop1 lowers it to: svint32_t _6; vector(4) int _8; : _1 = BIT_FIELD_REF ; _2 = BIT_FIELD_REF ; _3 = BIT_FIELD_REF ; _4 = BIT_FIELD_REF ; _8 = {_1, _2, _3, _4}; _6 = VEC_PERM_EXPR <_8, _8, { 0, 1, 2, 3, ... }>; return _6; which is then eventually optimized to: svint32_t _6; [local count: 1073741824]: _6 = VEC_PERM_EXPR ; return _6; code-gen: f_32: dup z0.q, z0.q[0] ret Does it look OK ? Thanks, Prathamesh > Richard. > > > We're planning to implement the ACLE's Neon-SVE bridge: > > https://github.com/ARM-software/acle/blob/main/main/acle.md#neon-sve-bridge > > and so we'll need (b) to implement the svdup_neonq functions. > > > > Thanks, > > Richard > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; > HRB 36809 (AG Nuernberg) --00000000000067003705f87120bf Content-Type: text/plain; charset="US-ASCII"; name="gnu-829-1.txt" Content-Disposition: attachment; filename="gnu-829-1.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lg11nvic0 W1NWRV0gRm9sZCBzdmxkMXJxIHRvIFZFQ19QRVJNX0VYUFIgaWYgZWxlbWVudHMgYXJlIG5vdCBj b25zdGFudC4KCmdjYy9DaGFuZ2VMb2c6CgkqIGNvbmZpZy9hYXJjaDY0L2FhcmNoNjQtc3ZlLWJ1 aWx0aW5zLWJhc2UuY2MKCShzdmR1cHFfaW1wbDo6Zm9sZF9ub25jb25zdF9kdXBxKTogTmV3IG1l dGhvZC4KCShzdmR1cHFfaW1wbDo6Zm9sZCk6IENhbGwgZm9sZF9ub25jb25zdF9kdXBxLgoKZ2Nj L3Rlc3RzdWl0ZS9DaGFuZ2VMb2c6CgkqIGdjYy50YXJnZXQvYWFyY2g2NC9zdmUvYWNsZS9nZW5l cmFsL2R1cHFfMTEuYzogTmV3IHRlc3QuCgpkaWZmIC0tZ2l0IGEvZ2NjL2NvbmZpZy9hYXJjaDY0 L2FhcmNoNjQtc3ZlLWJ1aWx0aW5zLWJhc2UuY2MgYi9nY2MvY29uZmlnL2FhcmNoNjQvYWFyY2g2 NC1zdmUtYnVpbHRpbnMtYmFzZS5jYwppbmRleCBjZDljYWNlM2M5Yi4uM2RlNzkwNjA2MTkgMTAw NjQ0Ci0tLSBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LXN2ZS1idWlsdGlucy1iYXNlLmNj CisrKyBiL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LXN2ZS1idWlsdGlucy1iYXNlLmNjCkBA IC04MTcsNiArODE3LDYyIEBAIHB1YmxpYzoKIAogY2xhc3Mgc3ZkdXBxX2ltcGwgOiBwdWJsaWMg cXVpZXQ8ZnVuY3Rpb25fYmFzZT4KIHsKK3ByaXZhdGU6CisgIGdpbXBsZSAqCisgIGZvbGRfbm9u Y29uc3RfZHVwcSAoZ2ltcGxlX2ZvbGRlciAmZiwgdW5zaWduZWQgZmFjdG9yKSBjb25zdAorICB7 CisgICAgLyogTG93ZXIgbGhzID0gc3ZkdXBxIChhcmcwLCBhcmcxLCAuLi4sIGFyZ059IGludG86 CisgICAgICAgdG1wID0ge2FyZzAsIGFyZzEsIC4uLiwgYXJnPE4tMT59CisgICAgICAgbGhzID0g VkVDX1BFUk1fRVhQUiAodG1wLCB0bXAsIHswLCAxLCAyLCBOLTEsIC4uLn0pICAqLworCisgICAg LyogVE9ETzogUmV2aXNpdCB0byBoYW5kbGUgZmFjdG9yIGJ5IHBhZGRpbmcgemVyb3MuICAqLwor ICAgIGlmIChmYWN0b3IgPiAxKQorICAgICAgcmV0dXJuIE5VTEw7CisKKyAgICBpZiAoQllURVNf QklHX0VORElBTikKKyAgICAgIHJldHVybiBOVUxMOworCisgICAgdHJlZSBsaHMgPSBnaW1wbGVf Y2FsbF9saHMgKGYuY2FsbCk7CisgICAgaWYgKFRSRUVfQ09ERSAobGhzKSAhPSBTU0FfTkFNRSkK KyAgICAgIHJldHVybiBOVUxMOworCisgICAgdHJlZSBsaHNfdHlwZSA9IFRSRUVfVFlQRSAobGhz KTsKKyAgICB0cmVlIGVsdF90eXBlID0gVFJFRV9UWVBFIChsaHNfdHlwZSk7CisgICAgc2NhbGFy X21vZGUgZWx0X21vZGUgPSBHRVRfTU9ERV9JTk5FUiAoVFlQRV9NT0RFIChlbHRfdHlwZSkpOwor ICAgIG1hY2hpbmVfbW9kZSB2cV9tb2RlID0gYWFyY2g2NF92cV9tb2RlIChlbHRfbW9kZSkucmVx dWlyZSAoKTsKKyAgICB0cmVlIHZxX3R5cGUgPSBidWlsZF92ZWN0b3JfdHlwZV9mb3JfbW9kZSAo ZWx0X3R5cGUsIHZxX21vZGUpOworCisgICAgdW5zaWduZWQgbmFyZ3MgPSBnaW1wbGVfY2FsbF9u dW1fYXJncyAoZi5jYWxsKTsKKyAgICB2ZWM8Y29uc3RydWN0b3JfZWx0LCB2YV9nYz4gKnY7Cisg ICAgdmVjX2FsbG9jICh2LCBuYXJncyk7CisgICAgZm9yICh1bnNpZ25lZCBpID0gMDsgaSA8IG5h cmdzOyBpKyspCisgICAgICBDT05TVFJVQ1RPUl9BUFBFTkRfRUxUICh2LCBOVUxMX1RSRUUsIGdp bXBsZV9jYWxsX2FyZyAoZi5jYWxsLCBpKSk7CisgICAgdHJlZSB2ZWMgPSBidWlsZF9jb25zdHJ1 Y3RvciAodnFfdHlwZSwgdik7CisKKyAgICB0cmVlIGFjY2Vzc190eXBlCisgICAgICA9IGJ1aWxk X2FsaWduZWRfdHlwZSAodnFfdHlwZSwgVFlQRV9BTElHTiAoZWx0X3R5cGUpKTsKKyAgICB0cmVl IHRtcCA9IG1ha2Vfc3NhX25hbWVfZm4gKGNmdW4sIGFjY2Vzc190eXBlLCAwKTsKKyAgICBnaW1w bGUgKmcgPSBnaW1wbGVfYnVpbGRfYXNzaWduICh0bXAsIHZlYyk7CisKKyAgICBnaW1wbGVfc2Vx IHN0bXRzID0gTlVMTDsKKyAgICBnaW1wbGVfc2VxX2FkZF9zdG10X3dpdGhvdXRfdXBkYXRlICgm c3RtdHMsIGcpOworCisgICAgaW50IHNvdXJjZV9uZWx0cyA9IFRZUEVfVkVDVE9SX1NVQlBBUlRT IChhY2Nlc3NfdHlwZSkudG9fY29uc3RhbnQgKCk7CisgICAgcG9seV91aW50NjQgbGhzX2xlbiA9 IFRZUEVfVkVDVE9SX1NVQlBBUlRTIChsaHNfdHlwZSk7CisgICAgdmVjX3Blcm1fYnVpbGRlciBz ZWwgKGxoc19sZW4sIHNvdXJjZV9uZWx0cywgMSk7CisgICAgZm9yIChpbnQgaSA9IDA7IGkgPCBz b3VyY2VfbmVsdHM7IGkrKykKKyAgICAgIHNlbC5xdWlja19wdXNoIChpKTsKKworICAgIHZlY19w ZXJtX2luZGljZXMgaW5kaWNlcyAoc2VsLCAxLCBzb3VyY2VfbmVsdHMpOworICAgIHRyZWUgbWFz a190eXBlID0gYnVpbGRfdmVjdG9yX3R5cGUgKHNzaXpldHlwZSwgbGhzX2xlbik7CisgICAgdHJl ZSBtYXNrID0gdmVjX3Blcm1faW5kaWNlc190b190cmVlIChtYXNrX3R5cGUsIGluZGljZXMpOwor CisgICAgZ2ltcGxlICpnMiA9IGdpbXBsZV9idWlsZF9hc3NpZ24gKGxocywgVkVDX1BFUk1fRVhQ UiwgdG1wLCB0bXAsIG1hc2spOworICAgIGdpbXBsZV9zZXFfYWRkX3N0bXRfd2l0aG91dF91cGRh dGUgKCZzdG10cywgZzIpOworICAgIGdzaV9yZXBsYWNlX3dpdGhfc2VxIChmLmdzaSwgc3RtdHMs IGZhbHNlKTsKKyAgICByZXR1cm4gZzI7CisgIH0KKwogcHVibGljOgogICBnaW1wbGUgKgogICBm b2xkIChnaW1wbGVfZm9sZGVyICZmKSBjb25zdCBvdmVycmlkZQpAQCAtODMyLDcgKzg4OCw3IEBA IHB1YmxpYzoKICAgICAgIHsKIAl0cmVlIGVsdCA9IGdpbXBsZV9jYWxsX2FyZyAoZi5jYWxsLCBp KTsKIAlpZiAoIUNPTlNUQU5UX0NMQVNTX1AgKGVsdCkpCi0JICByZXR1cm4gTlVMTDsKKwkgIHJl dHVybiBmb2xkX25vbmNvbnN0X2R1cHEgKGYsIGZhY3Rvcik7CiAJYnVpbGRlci5xdWlja19wdXNo IChlbHQpOwogCWZvciAodW5zaWduZWQgaW50IGogPSAxOyBqIDwgZmFjdG9yOyArK2opCiAJICBi dWlsZGVyLnF1aWNrX3B1c2ggKGJ1aWxkX3plcm9fY3N0IChUUkVFX1RZUEUgKHZlY190eXBlKSkp OwpkaWZmIC0tZ2l0IGEvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvc3ZlL2FjbGUv Z2VuZXJhbC9kdXBxXzExLmMgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2NC9zdmUv YWNsZS9nZW5lcmFsL2R1cHFfMTEuYwpuZXcgZmlsZSBtb2RlIDEwMDY0NAppbmRleCAwMDAwMDAw MDAwMC4uZjE5ZjhkZWIxZTUKLS0tIC9kZXYvbnVsbAorKysgYi9nY2MvdGVzdHN1aXRlL2djYy50 YXJnZXQvYWFyY2g2NC9zdmUvYWNsZS9nZW5lcmFsL2R1cHFfMTEuYwpAQCAtMCwwICsxLDMxIEBA CisvKiB7IGRnLWRvIGNvbXBpbGUgfSAqLworLyogeyBkZy1vcHRpb25zICItTzMgLWZkdW1wLXRy ZWUtb3B0aW1pemVkIiB9ICovCisKKyNpbmNsdWRlIDxhcm1fc3ZlLmg+CisjaW5jbHVkZSA8YXJt X25lb24uaD4KKworc3ZpbnQ4X3QgZl9zOChpbnQ4eDE2X3QgeCkKK3sKKyAgcmV0dXJuIHN2ZHVw cV9zOCAoeFswXSwgeFsxXSwgeFsyXSwgeFszXSwgeFs0XSwgeFs1XSwgeFs2XSwgeFs3XSwKKwkJ ICAgIHhbOF0sIHhbOV0sIHhbMTBdLCB4WzExXSwgeFsxMl0sIHhbMTNdLCB4WzE0XSwgeFsxNV0p OworfQorCitzdmludDE2X3QgZl9zMTYoaW50MTZ4OF90IHgpCit7CisgIHJldHVybiBzdmR1cHFf czE2ICh4WzBdLCB4WzFdLCB4WzJdLCB4WzNdLCB4WzRdLCB4WzVdLCB4WzZdLCB4WzddKTsKK30K Kworc3ZpbnQzMl90IGZfczMyKGludDMyeDRfdCB4KQoreworICByZXR1cm4gc3ZkdXBxX3MzMiAo eFswXSwgeFsxXSwgeFsyXSwgeFszXSk7Cit9CisKK3N2aW50NjRfdCBmX3M2NChpbnQ2NHgyX3Qg eCkKK3sKKyAgcmV0dXJuIHN2ZHVwcV9zNjQgKHhbMF0sIHhbMV0pOworfQorCisvKiB7IGRnLWZp bmFsIHsgc2Nhbi10cmVlLWR1bXAgIlZFQ19QRVJNX0VYUFIiICJvcHRpbWl6ZWQiIH0gfSAqLwor LyogeyBkZy1maW5hbCB7IHNjYW4tdHJlZS1kdW1wLW5vdCAic3ZkdXBxIiAib3B0aW1pemVkIiB9 IH0gKi8KKworLyogeyBkZy1maW5hbCB7IHNjYW4tYXNzZW1ibGVyLXRpbWVzIHtcdGR1cFx0elsw LTldK1wucSwgelswLTldK1wucVxbMFxdXG59IDQgfSB9ICovCg== --00000000000067003705f87120bf--