From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by sourceware.org (Postfix) with ESMTPS id E63483858D28 for ; Fri, 19 May 2023 10:56:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E63483858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x430.google.com with SMTP id ffacd0b85a97d-30957dd7640so249444f8f.3 for ; Fri, 19 May 2023 03:56:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1684493806; x=1687085806; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=Dctp9dmiKH+WxYeJ60XoUEjoUz/jyIpnA8EjX1H9NFA=; b=RT0QSL5UzZGN1RRdi0MziuL9JbKETI5yjgQ+0JSRPVeI0smU5eKSY4FX0s8p7GeEWT zyxLs6Y2G87nyIdXH8Gk0R/0AWsKIW7MemSRF5vVgPS2qPYxorPGHdHIK7WmdAm/kGS9 A+ZOigv/buiOQqNfA2owDiksT+ZTs53O/OzMh3eY90wYoseJFiBmYeF1Er3TOm8gWpMt pqijzFkvgLJykq1Z89zqJ2iqsP5MBCIwkVuwExtYOcQ5XLENDd/neHsgZJxd18B1nJrn 6OsSFrBJhIn3e7kB+34NkLiIN7PBM2430Ih/1GnzDQjZsecVmMnDCR4J2LY0jEOrK1yr VDrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684493806; x=1687085806; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Dctp9dmiKH+WxYeJ60XoUEjoUz/jyIpnA8EjX1H9NFA=; b=Zb8jZXYLECecab9LDWAKQeNc0bTwX6A2jfYE/hBaFHVNaYvZE9T5Ld//Sm7hnZhO37 /HB7arMed25XyDLwshLJYDxwjmZuvbgpB2upEfPfrNEb+nI083xa7NQH0dLE3FRdmcto 8NWoxMAPaZokYKpy9Mt51Hj0gl37QVjSmbJwWyvTYqfYTUe95KD0c4XfEwOU0ot+WK3Q OKTZohfiA7QSUqx/gzmmhy2GBqi1bEKpjh1s+wur587BhEc5Xl1TyJPJqp0oApkOUc7U zJyCUUIbWMRFlJek8hqhWtYtS5dHndPgvvcsrUfWtPn3o9FLnEdKrdtmh1Ql2p/1wOUM dZ7w== X-Gm-Message-State: AC+VfDwLc42uxW09NhR3UunITW9nDrHf2IJ6jL3/k8ldXra0tLE1ylTu rFLwp8LDm5R/FFKUi9JtSNqRKq+2pEsgnsf4DF2gXw== X-Google-Smtp-Source: ACHHUZ44Wcygc+vtTjl2OpU+Zxta2ScqYK+z8aEajwhsK+mC3/DX+95wfYhtjzzOdd/XLMVGLIP+2Rlwdxf3Lyx5bOA= X-Received: by 2002:adf:ea06:0:b0:306:44d0:2bcd with SMTP id q6-20020adfea06000000b0030644d02bcdmr1556250wrm.9.1684493806407; Fri, 19 May 2023 03:56:46 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Fri, 19 May 2023 16:26:09 +0530 Message-ID: Subject: Re: [aarch64] Code-gen for vector initialization involving constants To: Prathamesh Kulkarni , gcc Patches , richard.sandiford@arm.com Content-Type: multipart/mixed; boundary="000000000000f541af05fc09c6cd" X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000f541af05fc09c6cd Content-Type: text/plain; charset="UTF-8" On Thu, 18 May 2023 at 22:04, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Thu, 18 May 2023 at 13:37, Richard Sandiford > > wrote: > >> > >> Prathamesh Kulkarni writes: > >> > On Tue, 16 May 2023 at 00:29, Richard Sandiford > >> > wrote: > >> >> > >> >> Prathamesh Kulkarni writes: > >> >> > Hi Richard, > >> >> > After committing the interleave+zip1 patch for vector initialization, > >> >> > it seems to regress the s32 case for this patch: > >> >> > > >> >> > int32x4_t f_s32(int32_t x) > >> >> > { > >> >> > return (int32x4_t) { x, x, x, 1 }; > >> >> > } > >> >> > > >> >> > code-gen: > >> >> > f_s32: > >> >> > movi v30.2s, 0x1 > >> >> > fmov s31, w0 > >> >> > dup v0.2s, v31.s[0] > >> >> > ins v30.s[0], v31.s[0] > >> >> > zip1 v0.4s, v0.4s, v30.4s > >> >> > ret > >> >> > > >> >> > instead of expected code-gen: > >> >> > f_s32: > >> >> > movi v31.2s, 0x1 > >> >> > dup v0.4s, w0 > >> >> > ins v0.s[3], v31.s[0] > >> >> > ret > >> >> > > >> >> > Cost for fallback sequence: 16 > >> >> > Cost for interleave and zip sequence: 12 > >> >> > > >> >> > For the above case, the cost for interleave+zip1 sequence is computed as: > >> >> > halves[0]: > >> >> > (set (reg:V2SI 96) > >> >> > (vec_duplicate:V2SI (reg/v:SI 93 [ x ]))) > >> >> > cost = 8 > >> >> > > >> >> > halves[1]: > >> >> > (set (reg:V2SI 97) > >> >> > (const_vector:V2SI [ > >> >> > (const_int 1 [0x1]) repeated x2 > >> >> > ])) > >> >> > (set (reg:V2SI 97) > >> >> > (vec_merge:V2SI (vec_duplicate:V2SI (reg/v:SI 93 [ x ])) > >> >> > (reg:V2SI 97) > >> >> > (const_int 1 [0x1]))) > >> >> > cost = 8 > >> >> > > >> >> > followed by: > >> >> > (set (reg:V4SI 95) > >> >> > (unspec:V4SI [ > >> >> > (subreg:V4SI (reg:V2SI 96) 0) > >> >> > (subreg:V4SI (reg:V2SI 97) 0) > >> >> > ] UNSPEC_ZIP1)) > >> >> > cost = 4 > >> >> > > >> >> > So the total cost becomes > >> >> > max(costs[0], costs[1]) + zip1_insn_cost > >> >> > = max(8, 8) + 4 > >> >> > = 12 > >> >> > > >> >> > While the fallback rtl sequence is: > >> >> > (set (reg:V4SI 95) > >> >> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ]))) > >> >> > cost = 8 > >> >> > (set (reg:SI 98) > >> >> > (const_int 1 [0x1])) > >> >> > cost = 4 > >> >> > (set (reg:V4SI 95) > >> >> > (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 98)) > >> >> > (reg:V4SI 95) > >> >> > (const_int 8 [0x8]))) > >> >> > cost = 4 > >> >> > > >> >> > So total cost = 8 + 4 + 4 = 16, and we choose the interleave+zip1 sequence. > >> >> > > >> >> > I think the issue is probably that for the interleave+zip1 sequence we take > >> >> > max(costs[0], costs[1]) to reflect that both halves are interleaved, > >> >> > but for the fallback seq we use seq_cost, which assumes serial execution > >> >> > of insns in the sequence. > >> >> > For above fallback sequence, > >> >> > set (reg:V4SI 95) > >> >> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ]))) > >> >> > and > >> >> > (set (reg:SI 98) > >> >> > (const_int 1 [0x1])) > >> >> > could be executed in parallel, which would make it's cost max(8, 4) + 4 = 12. > >> >> > >> >> Agreed. > >> >> > >> >> A good-enough substitute for this might be to ignore scalar moves > >> >> (for both alternatives) when costing for speed. > >> > Thanks for the suggestions. Just wondering for aarch64, if there's an easy > >> > way we can check if insn is a scalar move, similar to riscv's scalar_move_insn_p > >> > that checks if get_attr_type(insn) is TYPE_VIMOVXV or TYPE_VFMOVFV ? > >> > >> It should be enough to check that the pattern is a SET: > >> > >> (a) whose SET_DEST has a scalar mode and > >> (b) whose SET_SRC an aarch64_mov_operand > > Hi Richard, > > Thanks for the suggestions, the attached patch calls seq_cost to compute > > cost for sequence and then subtracts cost of each scalar move insn from it. > > Does that look OK ? > > The patch is under bootstrap+test on aarch64-linux-gnu. > > Yeah, the patch looks reasonable (some comments below). The testing > for this kind of patch is more than a formality though, so it would > be good to wait to see if the tests pass. > > > [...] > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > > index 29dbacfa917..7efd896d364 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -22332,6 +22332,32 @@ aarch64_unzip_vector_init (machine_mode mode, rtx vals, bool even_p) > > return gen_rtx_PARALLEL (new_mode, vec); > > } > > > > +/* Return true if INSN is a scalar move. */ > > + > > +static bool > > +scalar_move_insn_p (rtx_insn *insn) > > +{ > > + rtx set = single_set (insn); > > + if (!set) > > + return false; > > + rtx src = SET_SRC (set); > > + rtx dest = SET_DEST (set); > > + return is_a(GET_MODE (dest)) && aarch64_mov_operand_p (src, GET_MODE (src)); > > Long line. > > > +} > > + > > +/* Ignore cost for scalar moves from cost of sequence. This function is called > > + for calculating sequence costs in aarch64_expand_vector_init. */ > > + > > +static unsigned > > +seq_cost_ignore_scalar_moves (rtx_insn *seq, bool speed) > > Maybe more readable as "ignoring" rather than "ignore". > > > +{ > > + unsigned cost = seq_cost (seq, speed); > > + for (; seq; seq = NEXT_INSN (seq)) > > + if (scalar_move_insn_p (seq)) > > + cost -= insn_cost (seq, speed); > > + return cost; > > +} > > + > > seq_cost uses set_rtx_cost rather than insn_cost for single sets. > > To avoid that kind of inconsistency, I think it'd better to duplicate and > adjust seq_cost. Then scalar_move_insn_p can be passed the single set. > > > /* Expand a vector initialization sequence, such that TARGET is > > initialized to contain VALS. */ > > > > @@ -22367,7 +22393,7 @@ aarch64_expand_vector_init (rtx target, rtx vals) > > halves[i] = gen_rtx_SUBREG (mode, tmp_reg, 0); > > rtx_insn *rec_seq = get_insns (); > > end_sequence (); > > - costs[i] = seq_cost (rec_seq, !optimize_size); > > + costs[i] = seq_cost_ignore_scalar_moves (rec_seq, !optimize_size); > > emit_insn (rec_seq); > > } > > > > @@ -22384,7 +22410,7 @@ aarch64_expand_vector_init (rtx target, rtx vals) > > start_sequence (); > > aarch64_expand_vector_init_fallback (target, vals); > > rtx_insn *fallback_seq = get_insns (); > > - unsigned fallback_seq_cost = seq_cost (fallback_seq, !optimize_size); > > + unsigned fallback_seq_cost = seq_cost_ignore_scalar_moves (fallback_seq, !optimize_size); > > Long line. Hi Richard, Thanks for the suggestions. Does the attached patch look OK ? Boostrap+test in progress on aarch64-linux-gnu. Thanks, Prathamesh > > Thanks, > Richard > > > end_sequence (); > > > > emit_insn (seq_total_cost < fallback_seq_cost ? seq : fallback_seq); --000000000000f541af05fc09c6cd Content-Type: text/plain; charset="US-ASCII"; name="ignore-scalar-move-2.txt" Content-Disposition: attachment; filename="ignore-scalar-move-2.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lhufz67l0 W2FhcmNoNjRdIElnbm9yZSBjb3N0IG9mIHNjYWxhciBtb3ZlcyBmb3Igc2VxIGluIHZlY3RvciBp bml0aWFsaXphdGlvbi4KCmdjYy9DaGFuZ2VMb2c6CgkqIGNvbmZpZy9hYXJjaDY0L2FhcmNoNjQu Y2MgKHNjYWxhcl9tb3ZlX2luc25fcCk6IE5ldyBmdW5jdGlvbi4gCgkoc2VxX2Nvc3RfaWdub3Jp bmdfc2NhbGFyX21vdmVzKTogTGlrZXdpc2UuCgkoYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXQp OiBDYWxsIHNlcV9jb3N0X2lnbm9yaW5nX3NjYWxhcl9tb3Zlcy4KCmRpZmYgLS1naXQgYS9nY2Mv Y29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jYyBiL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LmNj CmluZGV4IDI5ZGJhY2ZhOTE3Li5lNjExYTdjY2EyNSAxMDA2NDQKLS0tIGEvZ2NjL2NvbmZpZy9h YXJjaDY0L2FhcmNoNjQuY2MKKysrIGIvZ2NjL2NvbmZpZy9hYXJjaDY0L2FhcmNoNjQuY2MKQEAg LTIyMzMyLDYgKzIyMzMyLDQzIEBAIGFhcmNoNjRfdW56aXBfdmVjdG9yX2luaXQgKG1hY2hpbmVf bW9kZSBtb2RlLCBydHggdmFscywgYm9vbCBldmVuX3ApCiAgIHJldHVybiBnZW5fcnR4X1BBUkFM TEVMIChuZXdfbW9kZSwgdmVjKTsKIH0KIAorLyogUmV0dXJuIHRydWUgaWYgSU5TTiBpcyBhIHNj YWxhciBtb3ZlLiAgKi8KKworc3RhdGljIGJvb2wKK3NjYWxhcl9tb3ZlX2luc25fcCAoY29uc3Qg cnR4X2luc24gKmluc24pCit7CisgIHJ0eCBzZXQgPSBzaW5nbGVfc2V0IChpbnNuKTsKKyAgaWYg KCFzZXQpCisgICAgcmV0dXJuIGZhbHNlOworICBydHggc3JjID0gU0VUX1NSQyAoc2V0KTsKKyAg cnR4IGRlc3QgPSBTRVRfREVTVCAoc2V0KTsKKyAgcmV0dXJuIGlzX2E8c2NhbGFyX21vZGU+KEdF VF9NT0RFIChkZXN0KSkKKwkgJiYgYWFyY2g2NF9tb3Zfb3BlcmFuZF9wIChzcmMsIEdFVF9NT0RF IChzcmMpKTsKK30KKworLyogU2ltaWxhciB0byBzZXFfY29zdCwgYnV0IGlnbm9yZSBjb3N0IGZv ciBzY2FsYXIgbW92ZXMuIFRoaXMgZnVuY3Rpb24KKyAgIGlzIGNhbGxlZCBmcm9tIGFhcmNoNjRf ZXhwYW5kX3ZlY3Rvcl9pbml0LiAgKi8KKworc3RhdGljIHVuc2lnbmVkCitzZXFfY29zdF9pZ25v cmluZ19zY2FsYXJfbW92ZXMgKGNvbnN0IHJ0eF9pbnNuICpzZXEsIGJvb2wgc3BlZWQpCit7Cisg IHVuc2lnbmVkIGNvc3QgPSAwOworICBydHggc2V0OworCisgIGZvciAoOyBzZXE7IHNlcSA9IE5F WFRfSU5TTiAoc2VxKSkKKyAgICBpZiAoTk9OREVCVUdfSU5TTl9QIChzZXEpCisJJiYgIXNjYWxh cl9tb3ZlX2luc25fcCAoc2VxKSkKKyAgICAgIHsKKwlpbnQgdGhpc19jb3N0ID0gaW5zbl9jb3N0 IChDT05TVF9DQVNUX1JUWF9JTlNOIChzZXEpLCBzcGVlZCk7CisJaWYgKHRoaXNfY29zdCA+IDAp CisJICBjb3N0ICs9IHRoaXNfY29zdDsKKwllbHNlCisJICBjb3N0Kys7CisgICAgICB9CisKKyAg cmV0dXJuIGNvc3Q7Cit9CisKIC8qIEV4cGFuZCBhIHZlY3RvciBpbml0aWFsaXphdGlvbiBzZXF1 ZW5jZSwgc3VjaCB0aGF0IFRBUkdFVCBpcwogICAgaW5pdGlhbGl6ZWQgdG8gY29udGFpbiBWQUxT LiAgKi8KIApAQCAtMjIzNjcsNyArMjI0MDQsNyBAQCBhYXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5p dCAocnR4IHRhcmdldCwgcnR4IHZhbHMpCiAgICAgICBoYWx2ZXNbaV0gPSBnZW5fcnR4X1NVQlJF RyAobW9kZSwgdG1wX3JlZywgMCk7CiAgICAgICBydHhfaW5zbiAqcmVjX3NlcSA9IGdldF9pbnNu cyAoKTsKICAgICAgIGVuZF9zZXF1ZW5jZSAoKTsKLSAgICAgIGNvc3RzW2ldID0gc2VxX2Nvc3Qg KHJlY19zZXEsICFvcHRpbWl6ZV9zaXplKTsKKyAgICAgIGNvc3RzW2ldID0gc2VxX2Nvc3RfaWdu b3Jpbmdfc2NhbGFyX21vdmVzIChyZWNfc2VxLCAhb3B0aW1pemVfc2l6ZSk7CiAgICAgICBlbWl0 X2luc24gKHJlY19zZXEpOwogICAgIH0KIApAQCAtMjIzODQsNyArMjI0MjEsOCBAQCBhYXJjaDY0 X2V4cGFuZF92ZWN0b3JfaW5pdCAocnR4IHRhcmdldCwgcnR4IHZhbHMpCiAgIHN0YXJ0X3NlcXVl bmNlICgpOwogICBhYXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5pdF9mYWxsYmFjayAodGFyZ2V0LCB2 YWxzKTsKICAgcnR4X2luc24gKmZhbGxiYWNrX3NlcSA9IGdldF9pbnNucyAoKTsKLSAgdW5zaWdu ZWQgZmFsbGJhY2tfc2VxX2Nvc3QgPSBzZXFfY29zdCAoZmFsbGJhY2tfc2VxLCAhb3B0aW1pemVf c2l6ZSk7CisgIHVuc2lnbmVkIGZhbGxiYWNrX3NlcV9jb3N0CisgICAgPSBzZXFfY29zdF9pZ25v cmluZ19zY2FsYXJfbW92ZXMgKGZhbGxiYWNrX3NlcSwgIW9wdGltaXplX3NpemUpOwogICBlbmRf c2VxdWVuY2UgKCk7CiAKICAgZW1pdF9pbnNuIChzZXFfdG90YWxfY29zdCA8IGZhbGxiYWNrX3Nl cV9jb3N0ID8gc2VxIDogZmFsbGJhY2tfc2VxKTsK --000000000000f541af05fc09c6cd--