From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id C8A713858D28 for ; Tue, 2 May 2023 05:41:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C8A713858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x435.google.com with SMTP id ffacd0b85a97d-2f27a9c7970so3022952f8f.2 for ; Mon, 01 May 2023 22:41:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1683006113; x=1685598113; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=GpY4EwmVY+h9cRVuwXp4xok+UhxFjc4o+PKv5/uQG+8=; b=FR74TqOiGz85RbQLOndjhGmu9qQczY8tDZePNuiisO3+h5m9xHBekLI3lqPXD80saV C0fR6EynIds8Ukh4u+MWjAAEW2UZ1n6cEaGKGSgNGW9soOxUjmANVMJD4ie+IFQCo3aB in56cT6QrSX60bnAeS6EwPg9QsV1ZC9yZ5NgyAXFdzplqoj/KMET7Ync/eHGIpcvtxPC PSu85hQm2DtQIUlRdsivvP5dRcCA2K4kRy8o0mbn3pv4+WXzb9h8iVMIVspjf8pBGThB D1APVNpe7rC1jdWkd+e+bhkNY5LQxkzhcG4rvy8cAu4sjvgzdz5whoZiswVr4lThhIoq PG2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683006113; x=1685598113; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GpY4EwmVY+h9cRVuwXp4xok+UhxFjc4o+PKv5/uQG+8=; b=mGMT3AMNBxH3CkdeAWtOUZhyF/te3snv1wXlKv/Yk/3pYGSPBexZPTb8C7eamHCLwL gGlxTjBksIhscpgPnG1co0ZrbjA6CbxRurN61453JQ8U8lVnLXgLEUZyCzzpfKhl5QGV Or/QgNUgOFp7YUdK4nZBjWubBd4jdy0Wv7kEoC/JVSRxDzv3fyyvVoV6a/s/bi66+3t2 qs+lozZM+zOuaQ7Dtu9wZIer+cxG7NjgEZoeiqDBPlIUD+KcoxcItUEeSLSfRgKsobJo bYeO4Plj1ac2bXFIx7tMQqFPO57UjX/6PpFA39rxqNqhamkpfqBv48ZUvJUoGoLbZArB im1g== X-Gm-Message-State: AC+VfDyLczj6n73xRMtpUA4H9kP73mclkSpgByqHRqXwtKXyc7C8q/Up T1FtDyTqVFPOltLpmHUgvlmJIsR+wR0d8sSTQ72FnQ== X-Google-Smtp-Source: ACHHUZ7EEHTlfdEJy9WH4mr8wowPkApQNnJLwpKvO988JcGSpWQ8QINq1TGD1vnsuTegiZ2gkOCkZmOXv0PllIhRekU= X-Received: by 2002:adf:e0ca:0:b0:2ef:8c85:771b with SMTP id m10-20020adfe0ca000000b002ef8c85771bmr9154045wri.51.1683006113475; Mon, 01 May 2023 22:41:53 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Tue, 2 May 2023 11:11:17 +0530 Message-ID: Subject: Re: [aarch64] Code-gen for vector initialization involving constants To: Prathamesh Kulkarni , gcc Patches , richard.sandiford@arm.com Content-Type: multipart/mixed; boundary="0000000000008ce65105faaf6569" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000008ce65105faaf6569 Content-Type: text/plain; charset="UTF-8" On Tue, 25 Apr 2023 at 16:29, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > Hi Richard, > > While digging thru aarch64_expand_vector_init, I noticed it gives > > priority to loading a constant first: > > /* Initialise a vector which is part-variable. We want to first try > > to build those lanes which are constant in the most efficient way we > > can. */ > > > > which results in suboptimal code-gen for following case: > > int16x8_t f_s16(int16_t x) > > { > > return (int16x8_t) { x, x, x, x, x, x, x, 1 }; > > } > > > > code-gen trunk: > > f_s16: > > movi v0.8h, 0x1 > > ins v0.h[0], w0 > > ins v0.h[1], w0 > > ins v0.h[2], w0 > > ins v0.h[3], w0 > > ins v0.h[4], w0 > > ins v0.h[5], w0 > > ins v0.h[6], w0 > > ret > > > > The attached patch tweaks the following condition: > > if (n_var == n_elts && n_elts <= 16) > > { > > ... > > } > > > > to pass if maxv >= 80% of n_elts, with 80% being an > > arbitrary "high enough" threshold. The intent is to dup > > the most repeating variable if it it's repetition > > is "high enough" and insert constants which should be "better" than > > loading constant first and inserting variables like in the above case. > > I'm not too keen on the 80%. Like you say, it seems a bit arbitrary. > > The case above can also be handled by relaxing n_var == n_elts to > n_var >= n_elts - 1, so that if there's just one constant element, > we look for duplicated variable elements. If there are none > (maxv == 1), but there is a constant element, we can duplicate > the constant element into a register. > > The case when there's more than one constant element needs more thought > (and testcases :-)). E.g. after a certain point, it would probably be > better to load the variable and constant parts separately and blend them > using TBL. It also matters whether the constants are equal or not. > > There are also cases that could be handled using EXT. > > Plus, if we're inserting many variable elements that are already > in GPRs, we can probably do better by coalescing them into bigger > GPR values and inserting them as wider elements. > > Because of things like that, I think we should stick to the > single-constant case for now. Hi Richard, Thanks for the suggestions. The attached patch only handles the single constant case. Bootstrap+test in progress on aarch64-linux-gnu. Does it look OK ? Thanks, Prathamesh > > Thanks, > Richard --0000000000008ce65105faaf6569 Content-Type: text/plain; charset="US-ASCII"; name="gnu-780-3.txt" Content-Disposition: attachment; filename="gnu-780-3.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lh5ubsc60 W2FhcmNoNjRdIEltcHJvdmUgY29kZS1nZW4gZm9yIHZlY3RvciBpbml0aWFsaXphdGlvbiB3aXRo IHNpbmdsZSBjb25zdGFudCBlbGVtZW50LgoKZ2NjL0NoYW5nZUxvZzoKCSogY29uZmlnL2FhcmNo NjQvYWFyYzY0LmNjIChhYXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5pdCk6IFR3ZWFrIGNvbmRpdGlv bgoJaWYgKG5fdmFyID09IG5fZWx0cyAmJiBuX2VsdHMgPD0gMTYpIHRvIGFsbG93IGEgc2luZ2xl IGNvbnN0YW50LAoJYW5kIGlmIG1heHYgPT0gMSwgdXNlIGNvbnN0YW50IGVsZW1lbnQgZm9yIGR1 cGxpY2F0aW5nIGludG8gcmVnaXN0ZXIuCgpnY2MvdGVzdHN1aXRlL0NoYW5nZUxvZzoKCSogZ2Nj LnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LXNpbmdsZS1jb25zdC5jOiBOZXcgdGVzdC4KCmRpZmYg LS1naXQgYS9nY2MvY29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jYyBiL2djYy9jb25maWcvYWFyY2g2 NC9hYXJjaDY0LmNjCmluZGV4IDJiMGRlN2NhMDM4Li5mNDY3NTAxMzNhNiAxMDA2NDQKLS0tIGEv Z2NjL2NvbmZpZy9hYXJjaDY0L2FhcmNoNjQuY2MKKysrIGIvZ2NjL2NvbmZpZy9hYXJjaDY0L2Fh cmNoNjQuY2MKQEAgLTIyMTY3LDcgKzIyMTY3LDcgQEAgYWFyY2g2NF9leHBhbmRfdmVjdG9yX2lu aXQgKHJ0eCB0YXJnZXQsIHJ0eCB2YWxzKQogICAgICBhbmQgbWF0Y2hlc1tYXVsxXSB3aXRoIHRo ZSBjb3VudCBvZiBkdXBsaWNhdGUgZWxlbWVudHMgKGlmIFggaXMgdGhlCiAgICAgIGVhcmxpZXN0 IGVsZW1lbnQgd2hpY2ggaGFzIGR1cGxpY2F0ZXMpLiAgKi8KIAotICBpZiAobl92YXIgPT0gbl9l bHRzICYmIG5fZWx0cyA8PSAxNikKKyAgaWYgKChuX3ZhciA+PSBuX2VsdHMgLSAxKSAmJiBuX2Vs dHMgPD0gMTYpCiAgICAgewogICAgICAgaW50IG1hdGNoZXNbMTZdWzJdID0gezB9OwogICAgICAg Zm9yIChpbnQgaSA9IDA7IGkgPCBuX2VsdHM7IGkrKykKQEAgLTIyMjI3LDYgKzIyMjI3LDE4IEBA IGFhcmNoNjRfZXhwYW5kX3ZlY3Rvcl9pbml0IChydHggdGFyZ2V0LCBydHggdmFscykKIAkgICAg IHZlY3RvciByZWdpc3Rlci4gIEZvciBiaWctZW5kaWFuIHdlIHdhbnQgdGhhdCBwb3NpdGlvbiB0 byBob2xkCiAJICAgICB0aGUgbGFzdCBlbGVtZW50IG9mIFZBTFMuICAqLwogCSAgbWF4ZWxlbWVu dCA9IEJZVEVTX0JJR19FTkRJQU4gPyBuX2VsdHMgLSAxIDogMDsKKworCSAgLyogSWYgd2UgaGF2 ZSBhIHNpbmdsZSBjb25zdGFudCBlbGVtZW50LCB1c2UgdGhhdCBmb3IgZHVwbGljYXRpbmcKKwkg ICAgIGluc3RlYWQuICAqLworCSAgaWYgKG5fdmFyID09IG5fZWx0cyAtIDEpCisJICAgIGZvciAo aW50IGkgPSAwOyBpIDwgbl9lbHRzOyBpKyspCisJICAgICAgaWYgKENPTlNUX0lOVF9QIChYVkVD RVhQICh2YWxzLCAwLCBpKSkKKwkJICB8fCBDT05TVF9ET1VCTEVfUCAoWFZFQ0VYUCAodmFscywg MCwgaSkpKQorCQl7CisJCSAgbWF4ZWxlbWVudCA9IGk7CisJCSAgYnJlYWs7CisJCX0KKwogCSAg cnR4IHggPSBmb3JjZV9yZWcgKGlubmVyX21vZGUsIFhWRUNFWFAgKHZhbHMsIDAsIG1heGVsZW1l bnQpKTsKIAkgIGFhcmNoNjRfZW1pdF9tb3ZlICh0YXJnZXQsIGxvd3BhcnRfc3VicmVnIChtb2Rl LCB4LCBpbm5lcl9tb2RlKSk7CiAJfQpkaWZmIC0tZ2l0IGEvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFy Z2V0L2FhcmNoNjQvdmVjLWluaXQtc2luZ2xlLWNvbnN0LmMgYi9nY2MvdGVzdHN1aXRlL2djYy50 YXJnZXQvYWFyY2g2NC92ZWMtaW5pdC1zaW5nbGUtY29uc3QuYwpuZXcgZmlsZSBtb2RlIDEwMDY0 NAppbmRleCAwMDAwMDAwMDAwMC4uNTE3ZjQ3YjEzZWMKLS0tIC9kZXYvbnVsbAorKysgYi9nY2Mv dGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2NC92ZWMtaW5pdC1zaW5nbGUtY29uc3QuYwpAQCAt MCwwICsxLDY2IEBACisvKiB7IGRnLWRvIGNvbXBpbGUgfSAqLworLyogeyBkZy1vcHRpb25zICIt TzIiIH0gKi8KKy8qIHsgZGctZmluYWwgeyBjaGVjay1mdW5jdGlvbi1ib2RpZXMgIioqIiAiIiAi IiB9IH0gKi8KKworI2luY2x1ZGUgPGFybV9uZW9uLmg+CisKKy8qCisqKiBmX3M4OgorKioJLi4u CisqKglkdXAJdlswLTldK1wuMTZiLCB3WzAtOV0rCisqKgltb3ZpCXZbMC05XStcLjhiLCAweDEK KyoqCWlucwl2WzAtOV0rXC5iXFsxNVxdLCB2WzAtOV0rXC5iXFswXF0KKyoqCS4uLgorKioJcmV0 CisqLworCitpbnQ4eDE2X3QgZl9zOChpbnQ4X3QgeCkKK3sKKyAgcmV0dXJuIChpbnQ4eDE2X3Qp IHsgeCwgeCwgeCwgeCwgeCwgeCwgeCwgeCwKKyAgICAgICAgICAgICAgICAgICAgICAgeCwgeCwg eCwgeCwgeCwgeCwgeCwgMSB9OworfQorCisvKgorKiogZl9zMTY6CisqKgkuLi4KKyoqCWR1cAl2 WzAtOV0rXC44aCwgd1swLTldKworKioJbW92aQl2WzAtOV0rXC40aCwgMHgxCisqKglpbnMJdlsw LTldK1wuaFxbN1xdLCB2WzAtOV0rXC5oXFswXF0KKyoqCS4uLgorKioJcmV0CisqLworCitpbnQx Nng4X3QgZl9zMTYoaW50MTZfdCB4KQoreworICByZXR1cm4gKGludDE2eDhfdCkgeyB4LCB4LCB4 LCB4LCB4LCB4LCB4LCAxIH07Cit9CisKKy8qCisqKiBmX3MzMjoKKyoqCS4uLgorKioJbW92aQl2 WzAtOV1cLjJzLCAweDEKKyoqCWR1cAl2WzAtOV1cLjRzLCB3WzAtOV0rCisqKglpbnMJdlswLTld K1wuc1xbM1xdLCB2WzAtOV0rXC5zXFswXF0KKyoqCS4uLgorKioJcmV0CisqLworCitpbnQzMng0 X3QgZl9zMzIoaW50MzJfdCB4KQoreworICByZXR1cm4gKGludDMyeDRfdCkgeyB4LCB4LCB4LCAx IH07Cit9CisKKy8qCisqKiBmX3M2NDoKKyoqCS4uLgorKioJZm1vdglkWzAtOV0rLCB4WzAtOV0r CisqKgltb3YJeFswLTldKywgMQorKioJaW5zCXZbMC05XStcLmRcWzFcXSwgeFswLTldKworKioJ Li4uCisqKglyZXQKKyovCisKK2ludDY0eDJfdCBmX3M2NChpbnQ2NF90IHgpCit7CisgIHJldHVy biAoaW50NjR4Ml90KSB7IHgsIDEgfTsKK30K --0000000000008ce65105faaf6569--