From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by sourceware.org (Postfix) with ESMTPS id D922A3858D28 for ; Wed, 24 May 2023 19:51:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D922A3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x42a.google.com with SMTP id ffacd0b85a97d-30aa1eb95a0so1218464f8f.1 for ; Wed, 24 May 2023 12:51:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1684957861; x=1687549861; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=6Ganu0+sOF/57x4iqm8zRlm/2E3300YD/55YwfCkg4w=; b=EfPL7AUjGL3e1rR8WLyV3PL2Bcq/8OCgmWRW2DtCQ6G7fq2kpSFCAa08nfhH+KHbYT TOgKHWhZFd2KiB13d2Kvdbi0Lqc8SGA3CvfJJs0eB+aHt7hgGg7NzJ9jDz1/Ewk19FXY 8V62mYmkr+3ircWCYtrCREODbbKAnxZ0HfmG7MyQPYfVFIDJqHy7IhBDDGcL1LiVu7se 1w/tLy5bv/p8yrp37Un1vqyIYepk0nGnls75UGwuYX2+lnHcCTj6em+0hyW24MHesENh HYjjM2Gf9ApdjuE7a7HMbgZQBEErxFisZRfP2cDu2h3V+VEiq12her6pbMwqrfP0G4TD d4jA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684957861; x=1687549861; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6Ganu0+sOF/57x4iqm8zRlm/2E3300YD/55YwfCkg4w=; b=Tjntsamb0y9NpRC1x5/c4zSTxqn8uzriups0MvG46N2UnvrI7VJQFQ8YmSn5RNjvyf 2b8KMA3XzUtBQJQzmWhbrcSNmb1A4O1KLKsmFNaBEZmU+Gzm9s2Q6qVlXpJlcaHeO7Je Rpc7U1C3hR0ojoV+VTyY2xiNdY6DOhcqqI1uVMZaBF3/sxISc8XolbYNgc/XendZpXLR U7+Eh+/Tc1Y7RGk8RXCsKoYI63FVxJuzTuEcqsfTt1IABfUr+xc6St43+Tsh6jErQ++O g6eJp++kSXI6PP1FY40rhkgv+qFh05Cy5Siwac350lBpbDcxiB1wRWb3s1G4Zm/emgAc 6jLw== X-Gm-Message-State: AC+VfDxSlVYqn8SB650tLr944//pIXf/TlF2oS0IaMXdEaOIyyuj/SqR yKllMbott4h0qxuC8/aXgkOx6YfBhAlOoLivx1kRPg== X-Google-Smtp-Source: ACHHUZ4kYMmkATxekCjSnkB0fqrZY2/HMCkuW6WVE8kMYnUfx9ibh/g1h1Vd85RXGJdpvUkAlU8/T2t4AC68vlY/4OY= X-Received: by 2002:a5d:564e:0:b0:307:7e68:3a47 with SMTP id j14-20020a5d564e000000b003077e683a47mr589295wrw.37.1684957861517; Wed, 24 May 2023 12:51:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Thu, 25 May 2023 01:20:24 +0530 Message-ID: Subject: Re: [aarch64] Code-gen for vector initialization involving constants To: Prathamesh Kulkarni , gcc Patches , richard.sandiford@arm.com Content-Type: multipart/mixed; boundary="000000000000cc4db405fc75d242" X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_STOCKGEN,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000cc4db405fc75d242 Content-Type: text/plain; charset="UTF-8" On Wed, 24 May 2023 at 15:40, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Mon, 22 May 2023 at 14:18, Richard Sandiford > > wrote: > >> > >> Prathamesh Kulkarni writes: > >> > Hi Richard, > >> > Thanks for the suggestions. Does the attached patch look OK ? > >> > Boostrap+test in progress on aarch64-linux-gnu. > >> > >> Like I say, please wait for the tests to complete before sending an RFA. > >> It saves a review cycle if the tests don't in fact pass. > > Right, sorry, will post patches after completion of testing henceforth. > >> > >> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > >> > index 29dbacfa917..e611a7cca25 100644 > >> > --- a/gcc/config/aarch64/aarch64.cc > >> > +++ b/gcc/config/aarch64/aarch64.cc > >> > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode mode, rtx vals, bool even_p) > >> > return gen_rtx_PARALLEL (new_mode, vec); > >> > } > >> > > >> > +/* Return true if INSN is a scalar move. */ > >> > + > >> > +static bool > >> > +scalar_move_insn_p (const rtx_insn *insn) > >> > +{ > >> > + rtx set = single_set (insn); > >> > + if (!set) > >> > + return false; > >> > + rtx src = SET_SRC (set); > >> > + rtx dest = SET_DEST (set); > >> > + return is_a(GET_MODE (dest)) > >> > + && aarch64_mov_operand_p (src, GET_MODE (src)); > >> > >> Formatting: > >> > >> return (is_a(GET_MODE (dest)) > >> && aarch64_mov_operand_p (src, GET_MODE (src))); > >> > >> OK with that change if the tests pass, thanks. > > Unfortunately, the patch regressed vec-init-21.c: > > > > int8x16_t f_s8(int8_t x, int8_t y) > > { > > return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6, > > 7, 8, 9, 10, 11, 12, 13, 14 }; > > } > > > > -O3 code-gen trunk: > > f_s8: > > adrp x2, .LC0 > > ldr q0, [x2, #:lo12:.LC0] > > ins v0.b[0], w0 > > ins v0.b[1], w1 > > ret > > > > -O3 code-gen patch: > > f_s8: > > adrp x2, .LC0 > > ldr d31, [x2, #:lo12:.LC0] > > adrp x2, .LC1 > > ldr d0, [x2, #:lo12:.LC1] > > ins v31.b[0], w0 > > ins v0.b[0], w1 > > zip1 v0.16b, v31.16b, v0.16b > > ret > > > > With trunk, it chooses the fallback sequence because both fallback > > and zip1 sequence had cost = 20, however with patch applied, > > we end up with zip1 sequence cost = 24 and fallback sequence > > cost = 28. > > > > This happens because of using insn_cost instead of > > set_rtx_cost for the following expression: > > (set (reg:QI 100) > > (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)) > > set_rtx_cost returns 0 for above expression but insn_cost returns 4. > > Yeah, was wondering why you'd dropped the set_rtx_cost thing, > but decided not to question it since using insn_cost seemed > reasonable if it worked. [reposting because my reply got blocked for moderator approval] The attached patch uses set_rtx_cost for single_set and insn_cost otherwise for non debug insns similar to seq_cost. > > > This expression template appears twice in fallback sequence, which raises > > the cost to 28 from 20, while it appears once in each half of zip1 sequence, > > which raises the cost to 24 from 20, and so it now prefers zip1 sequence > > instead. > > > > I assumed this expression would be ignored because it looks like a scalar move, > > but that doesn't seem to be the case ? > > aarch64_classify_symbolic_expression returns > > SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0) > > and thus aarch64_mov_operand_p returns false. > > Ah, I guess it should be aarch64_mov_operand instead. Confusing that > they're so different... Thanks, using aarch64_mov_operand worked. > > > Another issue with the zip1 sequence above is using same register x2 > > for loading another half of constant in: > > adrp x2, .LC1 > > > > I guess this will create an output dependency from adrp x2, .LC0 -> > > adrp x2, .LC1 > > and anti-dependency from ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC1 > > essentially forcing almost the entire sequence (except ins > > instructions) to execute sequentially ? > > I'd expect modern cores to handle that via renaming. Ah right, thanks for the clarification. For some reason, it seems git diff is not formatting the patch correctly :/ Or perhaps I am doing something wrongly. For eg, it shows: + return is_a(GET_MODE (dest)) + && aarch64_mov_operand (src, GET_MODE (src)); but after applying the patch, it's formatted correctly with "&&aarch64..." right below is_a, both on column 10. Similarly, for following hunk in seq_cost_ignoring_scalar_moves: + if (NONDEBUG_INSN_P (seq) + && !scalar_move_insn_p (seq)) After applying patch, "&&" is below N, and not '('. Both N and "&&" are on col 9. And for the following just below: + { + if (rtx set = single_set (seq)) diff shows only one space difference between '{' and the following if, but after applying the patch it's formatted correctly, with two spaces after the curly brace. This is the entire file aarch64.cc with patch applied: https://people.linaro.org/~prathamesh.kulkarni/aarch64.cc Does the formatting look OK for scalar_move_insn_p and seq_cost_ignoring_scalar_moves in above aarch64.cc ? Patch passes bootstrap+test on aarch64-linux-gnu with no regressions. OK to commit ? Thanks, Prathamesh > > Thanks, > Richard --000000000000cc4db405fc75d242 Content-Type: text/plain; charset="US-ASCII"; name="ignore-scalar-move-3.txt" Content-Disposition: attachment; filename="ignore-scalar-move-3.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_li24am7i0 W2FhcmNoNjRdIElnbm9yZSBjb3N0IG9mIHNjYWxhciBtb3ZlcyBmb3Igc2VxIGluIHZlY3RvciBp bml0aWFsaXphdGlvbi4KCmdjYy9DaGFuZ2VMb2c6CgkqIGNvbmZpZy9hYXJjaDY0L2FhcmNoNjQu Y2MgKHNjYWxhcl9tb3ZlX2luc25fcCk6IE5ldyBmdW5jdGlvbi4gCgkoc2VxX2Nvc3RfaWdub3Jp bmdfc2NhbGFyX21vdmVzKTogTGlrZXdpc2UuCgkoYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXQp OiBDYWxsIHNlcV9jb3N0X2lnbm9yaW5nX3NjYWxhcl9tb3Zlcy4KCmRpZmYgLS1naXQgYS9nY2Mv Y29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jYyBiL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LmNj CmluZGV4IGQ2ZmM5NDAxNWZhLi41OThmMmY4NjQxNyAxMDA2NDQKLS0tIGEvZ2NjL2NvbmZpZy9h YXJjaDY0L2FhcmNoNjQuY2MKKysrIGIvZ2NjL2NvbmZpZy9hYXJjaDY0L2FhcmNoNjQuY2MKQEAg LTIyMzMyLDYgKzIyMzMyLDQ3IEBAIGFhcmNoNjRfdW56aXBfdmVjdG9yX2luaXQgKG1hY2hpbmVf bW9kZSBtb2RlLCBydHggdmFscywgYm9vbCBldmVuX3ApCiAgIHJldHVybiBnZW5fcnR4X1BBUkFM TEVMIChuZXdfbW9kZSwgdmVjKTsKIH0KIAorLyogUmV0dXJuIHRydWUgaWYgSU5TTiBpcyBhIHNj YWxhciBtb3ZlLiAgKi8KKworc3RhdGljIGJvb2wKK3NjYWxhcl9tb3ZlX2luc25fcCAoY29uc3Qg cnR4X2luc24gKmluc24pCit7CisgIHJ0eCBzZXQgPSBzaW5nbGVfc2V0IChpbnNuKTsKKyAgaWYg KCFzZXQpCisgICAgcmV0dXJuIGZhbHNlOworICBydHggc3JjID0gU0VUX1NSQyAoc2V0KTsKKyAg cnR4IGRlc3QgPSBTRVRfREVTVCAoc2V0KTsKKyAgcmV0dXJuIGlzX2E8c2NhbGFyX21vZGU+KEdF VF9NT0RFIChkZXN0KSkKKwkgJiYgYWFyY2g2NF9tb3Zfb3BlcmFuZCAoc3JjLCBHRVRfTU9ERSAo c3JjKSk7Cit9CisKKy8qIFNpbWlsYXIgdG8gc2VxX2Nvc3QsIGJ1dCBpZ25vcmUgY29zdCBmb3Ig c2NhbGFyIG1vdmVzLiAgVGhpcyBmdW5jdGlvbgorICAgaXMgY2FsbGVkIGZyb20gYWFyY2g2NF9l eHBhbmRfdmVjdG9yX2luaXQuICAqLworCitzdGF0aWMgdW5zaWduZWQKK3NlcV9jb3N0X2lnbm9y aW5nX3NjYWxhcl9tb3ZlcyAoY29uc3QgcnR4X2luc24gKnNlcSwgYm9vbCBzcGVlZCkKK3sKKyAg dW5zaWduZWQgY29zdCA9IDA7CisKKyAgZm9yICg7IHNlcTsgc2VxID0gTkVYVF9JTlNOIChzZXEp KQorICAgIGlmIChOT05ERUJVR19JTlNOX1AgKHNlcSkKKwkmJiAhc2NhbGFyX21vdmVfaW5zbl9w IChzZXEpKQorICAgICAgeworCWlmIChydHggc2V0ID0gc2luZ2xlX3NldCAoc2VxKSkKKwkgIGNv c3QgKz0gc2V0X3J0eF9jb3N0IChzZXQsIHNwZWVkKTsKKwllbHNlCisJICB7CisJICAgIGludCB0 aGlzX2Nvc3QgPSBpbnNuX2Nvc3QgKENPTlNUX0NBU1RfUlRYX0lOU04gKHNlcSksIHNwZWVkKTsK KwkgICAgaWYgKHRoaXNfY29zdCA+IDApCisJICAgICAgY29zdCArPSB0aGlzX2Nvc3Q7CisJICAg IGVsc2UKKwkgICAgICBjb3N0Kys7CisJICB9CisgICAgICB9CisKKyAgcmV0dXJuIGNvc3Q7Cit9 CisKIC8qIEV4cGFuZCBhIHZlY3RvciBpbml0aWFsaXphdGlvbiBzZXF1ZW5jZSwgc3VjaCB0aGF0 IFRBUkdFVCBpcwogICAgaW5pdGlhbGl6ZWQgdG8gY29udGFpbiBWQUxTLiAgKi8KIApAQCAtMjIz NjcsNyArMjI0MDgsNyBAQCBhYXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5pdCAocnR4IHRhcmdldCwg cnR4IHZhbHMpCiAgICAgICBoYWx2ZXNbaV0gPSBnZW5fcnR4X1NVQlJFRyAobW9kZSwgdG1wX3Jl ZywgMCk7CiAgICAgICBydHhfaW5zbiAqcmVjX3NlcSA9IGdldF9pbnNucyAoKTsKICAgICAgIGVu ZF9zZXF1ZW5jZSAoKTsKLSAgICAgIGNvc3RzW2ldID0gc2VxX2Nvc3QgKHJlY19zZXEsICFvcHRp bWl6ZV9zaXplKTsKKyAgICAgIGNvc3RzW2ldID0gc2VxX2Nvc3RfaWdub3Jpbmdfc2NhbGFyX21v dmVzIChyZWNfc2VxLCAhb3B0aW1pemVfc2l6ZSk7CiAgICAgICBlbWl0X2luc24gKHJlY19zZXEp OwogICAgIH0KIApAQCAtMjIzODQsNyArMjI0MjUsOCBAQCBhYXJjaDY0X2V4cGFuZF92ZWN0b3Jf aW5pdCAocnR4IHRhcmdldCwgcnR4IHZhbHMpCiAgIHN0YXJ0X3NlcXVlbmNlICgpOwogICBhYXJj aDY0X2V4cGFuZF92ZWN0b3JfaW5pdF9mYWxsYmFjayAodGFyZ2V0LCB2YWxzKTsKICAgcnR4X2lu c24gKmZhbGxiYWNrX3NlcSA9IGdldF9pbnNucyAoKTsKLSAgdW5zaWduZWQgZmFsbGJhY2tfc2Vx X2Nvc3QgPSBzZXFfY29zdCAoZmFsbGJhY2tfc2VxLCAhb3B0aW1pemVfc2l6ZSk7CisgIHVuc2ln bmVkIGZhbGxiYWNrX3NlcV9jb3N0CisgICAgPSBzZXFfY29zdF9pZ25vcmluZ19zY2FsYXJfbW92 ZXMgKGZhbGxiYWNrX3NlcSwgIW9wdGltaXplX3NpemUpOwogICBlbmRfc2VxdWVuY2UgKCk7CiAK ICAgZW1pdF9pbnNuIChzZXFfdG90YWxfY29zdCA8IGZhbGxiYWNrX3NlcV9jb3N0ID8gc2VxIDog ZmFsbGJhY2tfc2VxKTsK --000000000000cc4db405fc75d242--