From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by sourceware.org (Postfix) with ESMTPS id 2485A3858D37 for ; Thu, 25 May 2023 06:48:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2485A3858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x429.google.com with SMTP id ffacd0b85a97d-3093a6311dcso1677743f8f.1 for ; Wed, 24 May 2023 23:48:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1684997301; x=1687589301; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=W1WfN22orPnV5aQdk+jJUR6Rak29a20xSo0Nrbddsqg=; b=d/myEy1I5V+rnOkh+/1n27AyMpIEvvuMqxKDvjDRGBco523ingZ3LCDFWk6QgqOwxF WN2E7HADVOJMxpJc3TUGr72VT4jWeGgjNlz2sR2eUSNIrgBHBT0Y8YXUE0yUkM1u4Q1c 54r+P+3Aicfajyu7UIZSqB0rPf82dD18058hKfM7fLrPciMEZz1qoHp/MqolEiVjY6sS Y/NMSXlw87SxONT+FfpT3R2QrkFYM8k3SjVIHpdPLhDLfdODdwSgt5ixVx+gufmujt1c qQK76rbGt6vAI5X2J70VFoCk0LKVZ4RNkhdF3jihILrFGfqtC7yZusuzrLC+zJ6j96jk gLww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684997301; x=1687589301; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W1WfN22orPnV5aQdk+jJUR6Rak29a20xSo0Nrbddsqg=; b=kU74jcWypBAgZPan+kms656lv5XYbxD2t9Bsxbamd6Yc2u70ewa55HoF9l8HfnL1AP Mr6D7QFtQW3JNL4wninzCYuk36vV9aFrA4ioqvGHPeFD/0ueyamgLEV3KHHaB1tJPL0R Bqnu+01Db5mHdSQBjmO27BFxRmBmWTPat/EaM3RY1orxkVlD580W3y7BQbb6dCe47/mg g0ujZgZD+dx1xp9b9wT/zzMW9IJ/LbfAIOlpCFkBnljjMJDhpJ+Pxiz0qMImuhsNTF/x JFNcK43GqkzxOZ42bzZfuUsZlKg6YcAQ7PreTnVAKxeITe/0QfQJ3XK6oj8AuB9Tn0bp UMVA== X-Gm-Message-State: AC+VfDzh9Rx7TgLyk9RN9fWWOdMwmTqA37roNDwdyZl7+Hxmajm1qTy0 wk9OM5lgHrQOAyqnTOT5lppVUgEZx74BA9dy+i/s2A== X-Google-Smtp-Source: ACHHUZ5WlJXlZ+51uSmdLlco/0IDmu2mJqm1up7ZfO4dpBrNoQYn1/0uwQi+jP9S7EvK3k45yPox5uTf7/rLfvUBXwg= X-Received: by 2002:a05:6000:1192:b0:304:a40c:43c6 with SMTP id g18-20020a056000119200b00304a40c43c6mr1668569wrx.11.1684997301320; Wed, 24 May 2023 23:48:21 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Thu, 25 May 2023 12:17:45 +0530 Message-ID: Subject: Re: [aarch64] Code-gen for vector initialization involving constants To: Prathamesh Kulkarni , gcc Patches , richard.sandiford@arm.com Content-Type: multipart/mixed; boundary="00000000000097e81c05fc7f01d3" X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_STOCKGEN,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --00000000000097e81c05fc7f01d3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 25 May 2023 at 01:28, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Wed, 24 May 2023 at 15:40, Richard Sandiford > > wrote: > >> > >> Prathamesh Kulkarni writes: > >> > On Mon, 22 May 2023 at 14:18, Richard Sandiford > >> > wrote: > >> >> > >> >> Prathamesh Kulkarni writes: > >> >> > Hi Richard, > >> >> > Thanks for the suggestions. Does the attached patch look OK ? > >> >> > Boostrap+test in progress on aarch64-linux-gnu. > >> >> > >> >> Like I say, please wait for the tests to complete before sending an= RFA. > >> >> It saves a review cycle if the tests don't in fact pass. > >> > Right, sorry, will post patches after completion of testing hencefor= th. > >> >> > >> >> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/a= arch64.cc > >> >> > index 29dbacfa917..e611a7cca25 100644 > >> >> > --- a/gcc/config/aarch64/aarch64.cc > >> >> > +++ b/gcc/config/aarch64/aarch64.cc > >> >> > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode = mode, rtx vals, bool even_p) > >> >> > return gen_rtx_PARALLEL (new_mode, vec); > >> >> > } > >> >> > > >> >> > +/* Return true if INSN is a scalar move. */ > >> >> > + > >> >> > +static bool > >> >> > +scalar_move_insn_p (const rtx_insn *insn) > >> >> > +{ > >> >> > + rtx set =3D single_set (insn); > >> >> > + if (!set) > >> >> > + return false; > >> >> > + rtx src =3D SET_SRC (set); > >> >> > + rtx dest =3D SET_DEST (set); > >> >> > + return is_a(GET_MODE (dest)) > >> >> > + && aarch64_mov_operand_p (src, GET_MODE (src)); > >> >> > >> >> Formatting: > >> >> > >> >> return (is_a(GET_MODE (dest)) > >> >> && aarch64_mov_operand_p (src, GET_MODE (src))); > >> >> > >> >> OK with that change if the tests pass, thanks. > >> > Unfortunately, the patch regressed vec-init-21.c: > >> > > >> > int8x16_t f_s8(int8_t x, int8_t y) > >> > { > >> > return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6, > >> > 7, 8, 9, 10, 11, 12, 13, 14 }; > >> > } > >> > > >> > -O3 code-gen trunk: > >> > f_s8: > >> > adrp x2, .LC0 > >> > ldr q0, [x2, #:lo12:.LC0] > >> > ins v0.b[0], w0 > >> > ins v0.b[1], w1 > >> > ret > >> > > >> > -O3 code-gen patch: > >> > f_s8: > >> > adrp x2, .LC0 > >> > ldr d31, [x2, #:lo12:.LC0] > >> > adrp x2, .LC1 > >> > ldr d0, [x2, #:lo12:.LC1] > >> > ins v31.b[0], w0 > >> > ins v0.b[0], w1 > >> > zip1 v0.16b, v31.16b, v0.16b > >> > ret > >> > > >> > With trunk, it chooses the fallback sequence because both fallback > >> > and zip1 sequence had cost =3D 20, however with patch applied, > >> > we end up with zip1 sequence cost =3D 24 and fallback sequence > >> > cost =3D 28. > >> > > >> > This happens because of using insn_cost instead of > >> > set_rtx_cost for the following expression: > >> > (set (reg:QI 100) > >> > (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)) > >> > set_rtx_cost returns 0 for above expression but insn_cost returns 4. > >> > >> Yeah, was wondering why you'd dropped the set_rtx_cost thing, > >> but decided not to question it since using insn_cost seemed > >> reasonable if it worked. > > The attached patch uses set_rtx_cost for single_set and insn_cost > > otherwise for non debug insns similar to seq_cost. > > FWIW, I think with the aarch64_mov_operand fix, the old way of using > insn_cost for everything would have worked too. But either way is fine. > > >> > This expression template appears twice in fallback sequence, which r= aises > >> > the cost to 28 from 20, while it appears once in each half of zip1 s= equence, > >> > which raises the cost to 24 from 20, and so it now prefers zip1 sequ= ence > >> > instead. > >> > > >> > I assumed this expression would be ignored because it looks like a s= calar move, > >> > but that doesn't seem to be the case ? > >> > aarch64_classify_symbolic_expression returns > >> > SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0) > >> > and thus aarch64_mov_operand_p returns false. > >> > >> Ah, I guess it should be aarch64_mov_operand instead. Confusing that > >> they're so different... > > Thanks, using aarch64_mov_operand worked. > >> > >> > Another issue with the zip1 sequence above is using same register x2 > >> > for loading another half of constant in: > >> > adrp x2, .LC1 > >> > > >> > I guess this will create an output dependency from adrp x2, .LC0 -> > >> > adrp x2, .LC1 > >> > and anti-dependency from ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC= 1 > >> > essentially forcing almost the entire sequence (except ins > >> > instructions) to execute sequentially ? > >> > >> I'd expect modern cores to handle that via renaming. > > Ah right, thanks for the clarification. > > > > For some reason, it seems git diff is not formatting the patch correctl= y :/ > > Or perhaps I am doing something wrongly. > > No, I think it's fine. It's just tabs vs. spaces. A leading > "+" followed by a tab is still only indented 8 columns, whereas > "+" followed by 6 spaces is indented 7 columns. So indentation > can look a bit weird in the diff. > > I was accounting for that though. :) > > > For eg, it shows: > > + return is_a(GET_MODE (dest)) > > + && aarch64_mov_operand (src, GET_MODE (src)); > > but after applying the patch, it's formatted correctly with "&& > > aarch64..." right below is_a, both on column 10. > > Yeah, the indentation itself was OK. But there's an =E2=80=9Cemacs rule= =E2=80=9D > that says that parens should be used when splitting an expression > over multiple lines like this. So: > > ------- > Formatting: > > return (is_a(GET_MODE (dest)) > && aarch64_mov_operand_p (src, GET_MODE (src))); > ------- > > was about adding the parens. > > > + for (; seq; seq =3D NEXT_INSN (seq)) > > + if (NONDEBUG_INSN_P (seq) > > + && !scalar_move_insn_p (seq)) > > + { > > + if (rtx set =3D single_set (seq)) > > + cost +=3D set_rtx_cost (set, speed); > > + else > > + { > > + int this_cost =3D insn_cost (CONST_CAST_RTX_INSN (seq), speed= ); > > + if (this_cost > 0) > > + cost +=3D this_cost; > > + else > > + cost++; > > + } > > + } > > I think it'd be better to do the single_set first, and pass the set > to scalar_move_insn_p. I.e. > > for (; seq; seq =3D NEXT_INSN (seq)) > if (NONDEBUG_INSN_P (seq)) > { > if (rtx set =3D single_set (seq)) > { > if (!scalar_move_insn_p (set)) > cost +=3D set_rtx_cost (set, speed); > } > else > { > int this_cost =3D insn_cost (CONST_CAST_RTX_INSN (seq), speed= ); > if (this_cost > 0) > cost +=3D this_cost; > else > cost++; > } > } > > Then scalar_move_insn_p can just be the last three statements, > adjusted as follows: > > rtx src =3D SET_SRC (set); > rtx dest =3D SET_DEST (set); > return (is_a (GET_MODE (dest)) > && aarch64_mov_operand (src, GET_MODE (dest))); > > Note the space after ">", and that the mode passed to aarch64_mov_operand > is the destination mode (since the source mode might be VOIDmode). > > OK with that change, thanks. Hi Richard, Thanks for the suggestions, and sorry for being a bit daft in my previous replies. Does the attached patch look OK ? Bootstrapped+tested on aarch64-linux-gnu. Thanks, Prathamesh > > Thanks, > Richard --00000000000097e81c05fc7f01d3 Content-Type: text/plain; charset="US-ASCII"; name="ignore-scalar-move-4.txt" Content-Disposition: attachment; filename="ignore-scalar-move-4.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_li2rg3xp0 W2FhcmNoNjRdIElnbm9yZSBjb3N0IG9mIHNjYWxhciBtb3ZlcyBmb3Igc2VxIGluIHZlY3RvciBp bml0aWFsaXphdGlvbi4KCmdjYy9DaGFuZ2VMb2c6CgkqIGNvbmZpZy9hYXJjaDY0L2FhcmNoNjQu Y2MgKHNjYWxhcl9tb3ZlX2luc25fcCk6IE5ldyBmdW5jdGlvbi4gCgkoc2VxX2Nvc3RfaWdub3Jp bmdfc2NhbGFyX21vdmVzKTogTGlrZXdpc2UuCgkoYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXQp OiBDYWxsIHNlcV9jb3N0X2lnbm9yaW5nX3NjYWxhcl9tb3Zlcy4KCmRpZmYgLS1naXQgYS9nY2Mv Y29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jYyBiL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LmNj CmluZGV4IGQ2ZmM5NDAxNWZhLi5kYjdjYTRjMjhjMyAxMDA2NDQKLS0tIGEvZ2NjL2NvbmZpZy9h YXJjaDY0L2FhcmNoNjQuY2MKKysrIGIvZ2NjL2NvbmZpZy9hYXJjaDY0L2FhcmNoNjQuY2MKQEAg LTIyMzMyLDYgKzIyMzMyLDQ2IEBAIGFhcmNoNjRfdW56aXBfdmVjdG9yX2luaXQgKG1hY2hpbmVf bW9kZSBtb2RlLCBydHggdmFscywgYm9vbCBldmVuX3ApCiAgIHJldHVybiBnZW5fcnR4X1BBUkFM TEVMIChuZXdfbW9kZSwgdmVjKTsKIH0KIAorLyogUmV0dXJuIHRydWUgaWYgSU5TTiBpcyBhIHNj YWxhciBtb3ZlLiAgKi8KKworc3RhdGljIGJvb2wKK3NjYWxhcl9tb3ZlX2luc25fcCAocnR4IHNl dCkKK3sKKyAgcnR4IHNyYyA9IFNFVF9TUkMgKHNldCk7CisgIHJ0eCBkZXN0ID0gU0VUX0RFU1Qg KHNldCk7CisgIHJldHVybiAoaXNfYTxzY2FsYXJfbW9kZT4gKEdFVF9NT0RFIChkZXN0KSkKKwkg ICYmIGFhcmNoNjRfbW92X29wZXJhbmQgKHNyYywgR0VUX01PREUgKGRlc3QpKSk7Cit9CisKKy8q IFNpbWlsYXIgdG8gc2VxX2Nvc3QsIGJ1dCBpZ25vcmUgY29zdCBmb3Igc2NhbGFyIG1vdmVzLiAg VGhpcyBmdW5jdGlvbgorICAgaXMgY2FsbGVkIGZyb20gYWFyY2g2NF9leHBhbmRfdmVjdG9yX2lu aXQuICAqLworCitzdGF0aWMgdW5zaWduZWQKK3NlcV9jb3N0X2lnbm9yaW5nX3NjYWxhcl9tb3Zl cyAoY29uc3QgcnR4X2luc24gKnNlcSwgYm9vbCBzcGVlZCkKK3sKKyAgdW5zaWduZWQgY29zdCA9 IDA7CisKKyAgZm9yICg7IHNlcTsgc2VxID0gTkVYVF9JTlNOIChzZXEpKQorICAgIGlmIChOT05E RUJVR19JTlNOX1AgKHNlcSkpCisgICAgICB7CisJaWYgKHJ0eCBzZXQgPSBzaW5nbGVfc2V0IChz ZXEpKQorCSAgeworCSAgICBpZiAoIXNjYWxhcl9tb3ZlX2luc25fcCAoc2V0KSkKKwkgICAgICBj b3N0ICs9IHNldF9ydHhfY29zdCAoc2V0LCBzcGVlZCk7CisJICB9CisJZWxzZQorCSAgeworCSAg ICBpbnQgdGhpc19jb3N0ID0gaW5zbl9jb3N0IChDT05TVF9DQVNUX1JUWF9JTlNOIChzZXEpLCBz cGVlZCk7CisJICAgIGlmICh0aGlzX2Nvc3QgPiAwKQorCSAgICAgIGNvc3QgKz0gdGhpc19jb3N0 OworCSAgICBlbHNlCisJICAgICAgY29zdCsrOworCSAgfQorICAgICAgfQorCisgIHJldHVybiBj b3N0OworfQorCiAvKiBFeHBhbmQgYSB2ZWN0b3IgaW5pdGlhbGl6YXRpb24gc2VxdWVuY2UsIHN1 Y2ggdGhhdCBUQVJHRVQgaXMKICAgIGluaXRpYWxpemVkIHRvIGNvbnRhaW4gVkFMUy4gICovCiAK QEAgLTIyMzY3LDcgKzIyNDA3LDcgQEAgYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXQgKHJ0eCB0 YXJnZXQsIHJ0eCB2YWxzKQogICAgICAgaGFsdmVzW2ldID0gZ2VuX3J0eF9TVUJSRUcgKG1vZGUs IHRtcF9yZWcsIDApOwogICAgICAgcnR4X2luc24gKnJlY19zZXEgPSBnZXRfaW5zbnMgKCk7CiAg ICAgICBlbmRfc2VxdWVuY2UgKCk7Ci0gICAgICBjb3N0c1tpXSA9IHNlcV9jb3N0IChyZWNfc2Vx LCAhb3B0aW1pemVfc2l6ZSk7CisgICAgICBjb3N0c1tpXSA9IHNlcV9jb3N0X2lnbm9yaW5nX3Nj YWxhcl9tb3ZlcyAocmVjX3NlcSwgIW9wdGltaXplX3NpemUpOwogICAgICAgZW1pdF9pbnNuIChy ZWNfc2VxKTsKICAgICB9CiAKQEAgLTIyMzg0LDcgKzIyNDI0LDggQEAgYWFyY2g2NF9leHBhbmRf dmVjdG9yX2luaXQgKHJ0eCB0YXJnZXQsIHJ0eCB2YWxzKQogICBzdGFydF9zZXF1ZW5jZSAoKTsK ICAgYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXRfZmFsbGJhY2sgKHRhcmdldCwgdmFscyk7CiAg IHJ0eF9pbnNuICpmYWxsYmFja19zZXEgPSBnZXRfaW5zbnMgKCk7Ci0gIHVuc2lnbmVkIGZhbGxi YWNrX3NlcV9jb3N0ID0gc2VxX2Nvc3QgKGZhbGxiYWNrX3NlcSwgIW9wdGltaXplX3NpemUpOwor ICB1bnNpZ25lZCBmYWxsYmFja19zZXFfY29zdAorICAgID0gc2VxX2Nvc3RfaWdub3Jpbmdfc2Nh bGFyX21vdmVzIChmYWxsYmFja19zZXEsICFvcHRpbWl6ZV9zaXplKTsKICAgZW5kX3NlcXVlbmNl ICgpOwogCiAgIGVtaXRfaW5zbiAoc2VxX3RvdGFsX2Nvc3QgPCBmYWxsYmFja19zZXFfY29zdCA/ IHNlcSA6IGZhbGxiYWNrX3NlcSk7Cg== --00000000000097e81c05fc7f01d3--