From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by sourceware.org (Postfix) with ESMTPS id 01DFF3858D1E for ; Tue, 2 May 2023 10:23:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 01DFF3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wm1-x32e.google.com with SMTP id 5b1f17b1804b1-3f315712406so162017225e9.0 for ; Tue, 02 May 2023 03:23:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1683022986; x=1685614986; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=uQg37sis1TKOv7M2M8UOAj5UerbKB4g2NsMxbjctt5o=; b=XgkJTD8pvbIkdGnF4QudeYTB3+GmhWJr1es5m8lW5I7KtZVP+pWOeyAPeG3ZTACrbO LBgvVE7uXBhFAGP9tYXAZrD6LHyP6JPG5PXkioZZbNpEjs5jp00zJiQ8B2YvSCI20M2C UhrDlScbOabnAD7Twx9xEb4G7cYoyT1hL3sOOpc88JV1D2cAqCFA9nuB6TwuZ1OP+aRz vXduXphut7C112bv3SQylCr749UljdsFn3mmXBmMxLf+VmwodIj9uli6Gl05fBGbd6Sv DhI17AD8O6wpkexJEb1R97u2mcrdZvuCSViG1aJ1PNpmWOhKT7zIPRb/GVLLk7+XLeED Cjnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683022986; x=1685614986; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uQg37sis1TKOv7M2M8UOAj5UerbKB4g2NsMxbjctt5o=; b=KaFirucsoXyOfzCyyVnpefDHbGIBRJB+vLCd10QP8s68+Ib8CKtlRtVp2bOOirONkV VBS+p6DH8P5uVpTvQbbcb/66NmFoiOKipKuFQsHL8ovsUFo2QhErpzp2hUFZ73aE/Qqh hSd4/K75jFwNWiDelHeBP39JiYt5r89vWJAXcEqcBuNnGSndedC9xzKO3w3tK0TeC52T qKRiLKQXsbPuy2bzWiq0i6HGRT2C8fhfbdgqczrRg6OwZFYdANDcrYaTr3IiDCFh+52+ bc40Lx7216k1+JVDF+HwFZyFJ56hpvz/wlibXZJmUXwBpwpe0+kRBZg5ClruL70cjtBW l/vQ== X-Gm-Message-State: AC+VfDyLc/fDiFFsgHRf+1QDzNx7MzFpDaKJMDIIMunPhJSFU0CbaHbr 4L4YKTtQVxaUSrC5DZSp5N1Q/wjQnEnjdG/ZvgGTZg== X-Google-Smtp-Source: ACHHUZ5XyTlzTVMuNvkJ3ROBFJwma786RETfrV0PGja4vvD+bannEPDLSWHN10JCZ2xD90OzjlTiYHvNbmsTyxLVwCU= X-Received: by 2002:a5d:68d1:0:b0:306:2ba4:909d with SMTP id p17-20020a5d68d1000000b003062ba4909dmr4721849wrw.11.1683022986626; Tue, 02 May 2023 03:23:06 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Tue, 2 May 2023 15:52:29 +0530 Message-ID: Subject: Re: [aarch64] Code-gen for vector initialization involving constants To: Prathamesh Kulkarni , gcc Patches , richard.sandiford@arm.com Content-Type: multipart/mixed; boundary="000000000000447c7405fab3532a" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000447c7405fab3532a Content-Type: text/plain; charset="UTF-8" On Tue, 2 May 2023 at 14:56, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Tue, 25 Apr 2023 at 16:29, Richard Sandiford > > wrote: > >> > >> Prathamesh Kulkarni writes: > >> > Hi Richard, > >> > While digging thru aarch64_expand_vector_init, I noticed it gives > >> > priority to loading a constant first: > >> > /* Initialise a vector which is part-variable. We want to first try > >> > to build those lanes which are constant in the most efficient way we > >> > can. */ > >> > > >> > which results in suboptimal code-gen for following case: > >> > int16x8_t f_s16(int16_t x) > >> > { > >> > return (int16x8_t) { x, x, x, x, x, x, x, 1 }; > >> > } > >> > > >> > code-gen trunk: > >> > f_s16: > >> > movi v0.8h, 0x1 > >> > ins v0.h[0], w0 > >> > ins v0.h[1], w0 > >> > ins v0.h[2], w0 > >> > ins v0.h[3], w0 > >> > ins v0.h[4], w0 > >> > ins v0.h[5], w0 > >> > ins v0.h[6], w0 > >> > ret > >> > > >> > The attached patch tweaks the following condition: > >> > if (n_var == n_elts && n_elts <= 16) > >> > { > >> > ... > >> > } > >> > > >> > to pass if maxv >= 80% of n_elts, with 80% being an > >> > arbitrary "high enough" threshold. The intent is to dup > >> > the most repeating variable if it it's repetition > >> > is "high enough" and insert constants which should be "better" than > >> > loading constant first and inserting variables like in the above case. > >> > >> I'm not too keen on the 80%. Like you say, it seems a bit arbitrary. > >> > >> The case above can also be handled by relaxing n_var == n_elts to > >> n_var >= n_elts - 1, so that if there's just one constant element, > >> we look for duplicated variable elements. If there are none > >> (maxv == 1), but there is a constant element, we can duplicate > >> the constant element into a register. > >> > >> The case when there's more than one constant element needs more thought > >> (and testcases :-)). E.g. after a certain point, it would probably be > >> better to load the variable and constant parts separately and blend them > >> using TBL. It also matters whether the constants are equal or not. > >> > >> There are also cases that could be handled using EXT. > >> > >> Plus, if we're inserting many variable elements that are already > >> in GPRs, we can probably do better by coalescing them into bigger > >> GPR values and inserting them as wider elements. > >> > >> Because of things like that, I think we should stick to the > >> single-constant case for now. > > Hi Richard, > > Thanks for the suggestions. The attached patch only handles the single > > constant case. > > Bootstrap+test in progress on aarch64-linux-gnu. > > Does it look OK ? > > > > Thanks, > > Prathamesh > >> > >> Thanks, > >> Richard > > > > [aarch64] Improve code-gen for vector initialization with single constant element. > > > > gcc/ChangeLog: > > * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak condition > > if (n_var == n_elts && n_elts <= 16) to allow a single constant, > > and if maxv == 1, use constant element for duplicating into register. > > > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vec-init-single-const.c: New test. > > > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > > index 2b0de7ca038..f46750133a6 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals) > > and matches[X][1] with the count of duplicate elements (if X is the > > earliest element which has duplicates). */ > > > > - if (n_var == n_elts && n_elts <= 16) > > + if ((n_var >= n_elts - 1) && n_elts <= 16) > > { > > int matches[16][2] = {0}; > > for (int i = 0; i < n_elts; i++) > > @@ -22227,6 +22227,18 @@ aarch64_expand_vector_init (rtx target, rtx vals) > > vector register. For big-endian we want that position to hold > > the last element of VALS. */ > > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0; > > + > > + /* If we have a single constant element, use that for duplicating > > + instead. */ > > + if (n_var == n_elts - 1) > > + for (int i = 0; i < n_elts; i++) > > + if (CONST_INT_P (XVECEXP (vals, 0, i)) > > + || CONST_DOUBLE_P (XVECEXP (vals, 0, i))) > > + { > > + maxelement = i; > > + break; > > + } > > + > > rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement)); > > aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode)); > > We don't want to force the constant into a register though. OK right, sorry. With the attached patch, for the following test-case: int64x2_t f_s64(int64_t x) { return (int64x2_t) { x, 1 }; } it loads constant from memory (same code-gen as without patch). f_s64: adrp x1, .LC0 ldr q0, [x1, #:lo12:.LC0] ins v0.d[0], x0 ret Does the patch look OK ? Thanks, Prathamesh > > > } > > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c > > new file mode 100644 > > index 00000000000..517f47b13ec > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c > > @@ -0,0 +1,66 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > +/* { dg-final { check-function-bodies "**" "" "" } } */ > > + > > +#include > > + > > +/* > > +** f_s8: > > +** ... > > +** dup v[0-9]+\.16b, w[0-9]+ > > +** movi v[0-9]+\.8b, 0x1 > > +** ins v[0-9]+\.b\[15\], v[0-9]+\.b\[0\] > > +** ... > > +** ret > > +*/ > > + > > +int8x16_t f_s8(int8_t x) > > +{ > > + return (int8x16_t) { x, x, x, x, x, x, x, x, > > + x, x, x, x, x, x, x, 1 }; > > +} > > + > > +/* > > +** f_s16: > > +** ... > > +** dup v[0-9]+\.8h, w[0-9]+ > > +** movi v[0-9]+\.4h, 0x1 > > +** ins v[0-9]+\.h\[7\], v[0-9]+\.h\[0\] > > +** ... > > +** ret > > +*/ > > + > > +int16x8_t f_s16(int16_t x) > > +{ > > + return (int16x8_t) { x, x, x, x, x, x, x, 1 }; > > +} > > + > > +/* > > +** f_s32: > > +** ... > > +** movi v[0-9]\.2s, 0x1 > > +** dup v[0-9]\.4s, w[0-9]+ > > +** ins v[0-9]+\.s\[3\], v[0-9]+\.s\[0\] > > +** ... > > +** ret > > +*/ > > + > > +int32x4_t f_s32(int32_t x) > > +{ > > + return (int32x4_t) { x, x, x, 1 }; > > +} > > + > > +/* > > +** f_s64: > > +** ... > > +** fmov d[0-9]+, x[0-9]+ > > +** mov x[0-9]+, 1 > > +** ins v[0-9]+\.d\[1\], x[0-9]+ > > +** ... > > +** ret > > +*/ > > + > > +int64x2_t f_s64(int64_t x) > > +{ > > + return (int64x2_t) { x, 1 }; > > +} --000000000000447c7405fab3532a Content-Type: text/plain; charset="US-ASCII"; name="gnu-780-4.txt" Content-Disposition: attachment; filename="gnu-780-4.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lh647f7y0 W2FhcmNoNjRdIEltcHJvdmUgY29kZS1nZW4gZm9yIHZlY3RvciBpbml0aWFsaXphdGlvbiB3aXRo IHNpbmdsZSBjb25zdGFudCBlbGVtZW50LgoKZ2NjL0NoYW5nZUxvZzoKCSogY29uZmlnL2FhcmNo NjQvYWFyYzY0LmNjIChhYXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5pdCk6IFR3ZWFrIGNvbmRpdGlv bgoJaWYgKG5fdmFyID09IG5fZWx0cyAmJiBuX2VsdHMgPD0gMTYpIHRvIGFsbG93IGEgc2luZ2xl IGNvbnN0YW50LAoJYW5kIGlmIG1heHYgPT0gMSwgdXNlIGNvbnN0YW50IGVsZW1lbnQgZm9yIGR1 cGxpY2F0aW5nIGludG8gcmVnaXN0ZXIuCgpnY2MvdGVzdHN1aXRlL0NoYW5nZUxvZzoKCSogZ2Nj LnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LXNpbmdsZS1jb25zdC5jOiBOZXcgdGVzdC4KCmRpZmYg LS1naXQgYS9nY2MvY29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jYyBiL2djYy9jb25maWcvYWFyY2g2 NC9hYXJjaDY0LmNjCmluZGV4IDJiMGRlN2NhMDM4Li45NzMwOWRkZWM0ZiAxMDA2NDQKLS0tIGEv Z2NjL2NvbmZpZy9hYXJjaDY0L2FhcmNoNjQuY2MKKysrIGIvZ2NjL2NvbmZpZy9hYXJjaDY0L2Fh cmNoNjQuY2MKQEAgLTIyMTY3LDcgKzIyMTY3LDcgQEAgYWFyY2g2NF9leHBhbmRfdmVjdG9yX2lu aXQgKHJ0eCB0YXJnZXQsIHJ0eCB2YWxzKQogICAgICBhbmQgbWF0Y2hlc1tYXVsxXSB3aXRoIHRo ZSBjb3VudCBvZiBkdXBsaWNhdGUgZWxlbWVudHMgKGlmIFggaXMgdGhlCiAgICAgIGVhcmxpZXN0 IGVsZW1lbnQgd2hpY2ggaGFzIGR1cGxpY2F0ZXMpLiAgKi8KIAotICBpZiAobl92YXIgPT0gbl9l bHRzICYmIG5fZWx0cyA8PSAxNikKKyAgaWYgKChuX3ZhciA+PSBuX2VsdHMgLSAxKSAmJiBuX2Vs dHMgPD0gMTYpCiAgICAgewogICAgICAgaW50IG1hdGNoZXNbMTZdWzJdID0gezB9OwogICAgICAg Zm9yIChpbnQgaSA9IDA7IGkgPCBuX2VsdHM7IGkrKykKQEAgLTIyMjI3LDggKzIyMjI3LDI2IEBA IGFhcmNoNjRfZXhwYW5kX3ZlY3Rvcl9pbml0IChydHggdGFyZ2V0LCBydHggdmFscykKIAkgICAg IHZlY3RvciByZWdpc3Rlci4gIEZvciBiaWctZW5kaWFuIHdlIHdhbnQgdGhhdCBwb3NpdGlvbiB0 byBob2xkCiAJICAgICB0aGUgbGFzdCBlbGVtZW50IG9mIFZBTFMuICAqLwogCSAgbWF4ZWxlbWVu dCA9IEJZVEVTX0JJR19FTkRJQU4gPyBuX2VsdHMgLSAxIDogMDsKLQkgIHJ0eCB4ID0gZm9yY2Vf cmVnIChpbm5lcl9tb2RlLCBYVkVDRVhQICh2YWxzLCAwLCBtYXhlbGVtZW50KSk7Ci0JICBhYXJj aDY0X2VtaXRfbW92ZSAodGFyZ2V0LCBsb3dwYXJ0X3N1YnJlZyAobW9kZSwgeCwgaW5uZXJfbW9k ZSkpOworCisJICAvKiBJZiB3ZSBoYXZlIGEgc2luZ2xlIGNvbnN0YW50IGVsZW1lbnQsIHVzZSB0 aGF0IGZvciBkdXBsaWNhdGluZworCSAgICAgaW5zdGVhZC4gICovCisJICBpZiAobl92YXIgPT0g bl9lbHRzIC0gMSkKKwkgICAgZm9yIChpbnQgaSA9IDA7IGkgPCBuX2VsdHM7IGkrKykKKwkgICAg ICBpZiAoQ09OU1RfSU5UX1AgKFhWRUNFWFAgKHZhbHMsIDAsIGkpKQorCQkgIHx8IENPTlNUX0RP VUJMRV9QIChYVkVDRVhQICh2YWxzLCAwLCBpKSkpCisJCXsKKwkJICBtYXhlbGVtZW50ID0gaTsK KwkJICBicmVhazsKKwkJfQorCisJICBydHggbWF4dmFsID0gWFZFQ0VYUCAodmFscywgMCwgbWF4 ZWxlbWVudCk7CisJICBpZiAoIShDT05TVF9JTlRfUCAobWF4dmFsKSB8fCBDT05TVF9ET1VCTEVf UCAobWF4dmFsKSkpCisJICAgIHsKKwkgICAgICBydHggeCA9IGZvcmNlX3JlZyAoaW5uZXJfbW9k ZSwgWFZFQ0VYUCAodmFscywgMCwgbWF4ZWxlbWVudCkpOworCSAgICAgIGFhcmNoNjRfZW1pdF9t b3ZlICh0YXJnZXQsIGxvd3BhcnRfc3VicmVnIChtb2RlLCB4LCBpbm5lcl9tb2RlKSk7CisJICAg IH0KKwkgIGVsc2UKKwkgICAgYWFyY2g2NF9lbWl0X21vdmUgKHRhcmdldCwgZ2VuX3ZlY19kdXBs aWNhdGUgKG1vZGUsIG1heHZhbCkpOwogCX0KICAgICAgIGVsc2UKIAl7CmRpZmYgLS1naXQgYS9n Y2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2NC92ZWMtaW5pdC1zaW5nbGUtY29uc3QuYyBi L2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LXNpbmdsZS1jb25zdC5j Cm5ldyBmaWxlIG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwLi42ODJmZDQzNDM5YQotLS0g L2Rldi9udWxsCisrKyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0 LXNpbmdsZS1jb25zdC5jCkBAIC0wLDAgKzEsNjYgQEAKKy8qIHsgZGctZG8gY29tcGlsZSB9ICov CisvKiB7IGRnLW9wdGlvbnMgIi1PMiIgfSAqLworLyogeyBkZy1maW5hbCB7IGNoZWNrLWZ1bmN0 aW9uLWJvZGllcyAiKioiICIiICIiIH0gfSAqLworCisjaW5jbHVkZSA8YXJtX25lb24uaD4KKwor LyoKKyoqIGZfczg6CisqKgkuLi4KKyoqCWR1cAl2WzAtOV0rXC4xNmIsIHdbMC05XSsKKyoqCW1v dmkJdlswLTldK1wuOGIsIDB4MQorKioJaW5zCXZbMC05XStcLmJcWzE1XF0sIHZbMC05XStcLmJc WzBcXQorKioJLi4uCisqKglyZXQKKyovCisKK2ludDh4MTZfdCBmX3M4KGludDhfdCB4KQorewor ICByZXR1cm4gKGludDh4MTZfdCkgeyB4LCB4LCB4LCB4LCB4LCB4LCB4LCB4LAorICAgICAgICAg ICAgICAgICAgICAgICB4LCB4LCB4LCB4LCB4LCB4LCB4LCAxIH07Cit9CisKKy8qCisqKiBmX3Mx NjoKKyoqCS4uLgorKioJZHVwCXZbMC05XStcLjhoLCB3WzAtOV0rCisqKgltb3ZpCXZbMC05XStc LjRoLCAweDEKKyoqCWlucwl2WzAtOV0rXC5oXFs3XF0sIHZbMC05XStcLmhcWzBcXQorKioJLi4u CisqKglyZXQKKyovCisKK2ludDE2eDhfdCBmX3MxNihpbnQxNl90IHgpCit7CisgIHJldHVybiAo aW50MTZ4OF90KSB7IHgsIHgsIHgsIHgsIHgsIHgsIHgsIDEgfTsKK30KKworLyoKKyoqIGZfczMy OgorKioJLi4uCisqKgltb3ZpCXZbMC05XVwuMnMsIDB4MQorKioJZHVwCXZbMC05XVwuNHMsIHdb MC05XSsKKyoqCWlucwl2WzAtOV0rXC5zXFszXF0sIHZbMC05XStcLnNcWzBcXQorKioJLi4uCisq KglyZXQKKyovCisKK2ludDMyeDRfdCBmX3MzMihpbnQzMl90IHgpCit7CisgIHJldHVybiAoaW50 MzJ4NF90KSB7IHgsIHgsIHgsIDEgfTsKK30KKworLyoKKyoqIGZfczY0OgorKioJLi4uCisqKglh ZHJwCXhbMC05XSssIC5MQ1swLTldKworKioJbGRyCXFbMC05XSssIFxbeFswLTldKywgIzpsbzEy Oi5MQ1swLTldK1xdCisqKglpbnMJdlswLTldK1wuZFxbMFxdLCB4WzAtOV0rCisqKgkuLi4KKyoq CXJldAorKi8KKworaW50NjR4Ml90IGZfczY0KGludDY0X3QgeCkKK3sKKyAgcmV0dXJuIChpbnQ2 NHgyX3QpIHsgeCwgMSB9OworfQo= --000000000000447c7405fab3532a--