From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by sourceware.org (Postfix) with ESMTPS id B961E38432D6 for ; Fri, 3 Feb 2023 07:17:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B961E38432D6 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wm1-x329.google.com with SMTP id k8-20020a05600c1c8800b003dc57ea0dfeso5375091wms.0 for ; Thu, 02 Feb 2023 23:17:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=BMdgzj6i/QkTXpS7Zl2LXM3k62weX7czt4HeYspyb7k=; b=asYC4G/DRa2bb4WbYGOFNwP79yDm/YwehPz2NI96iu37eGQaFIILfve9y2W6IZDgj/ ybwxvzrXJDNpV5WvzVfiuQC5M8G2d7aJckcC329jITzP1nnSCDZ22MdliL6gI+RhrQLF 55XyEf4D1uk5diVV1NvyTV89t4QkoWyZcE+uWrsWqIsnNK8tfu/1L7DJ2J0vJIuVAHjv 1uHAcrf2+Iz05nclJyIyRqCjm9p8mGI101zt1G54k62xYSKkAtfCn3wulFNCNr4Fte2Y ZgZ9wV0ZZG0cIYWr4MB4goqSJhzJ7f/7f8+A9orq+vn211eyihq+YYAB8s4Qjh8UYpu/ Gmeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=BMdgzj6i/QkTXpS7Zl2LXM3k62weX7czt4HeYspyb7k=; b=Pqr75jdYjDN5jT4JhQRlYaObfPRhC6+BisQuDPrWY0keX56JDmzfcfF2RoK3HQIsNx Vfmanab2IbaANdSpuzgsrU6vmdxf5prA5mFcVxnpzzTkD2e3MwVCOdokO6aq40ulCrPp nf1I+I1oMrdScF5xx0PT4nTcwNV+05AKyx9llJ44GdFso1ZcHtGEXMAQOlGopm9R8Nyv YqgQDgJvp1Iz/IIuEFuuF/9Uad+A+n0KCekHIcp+vY3m7XNp0JvKctzy+ML81L82CtGS qeXQUI0jyvhRt4vTsW9KrHJniIUKW5gie8hoOCrrWgYJsWG2K1OTWbbRx0+AZ5B7LJTq rDRw== X-Gm-Message-State: AO0yUKViijh1BSyKyGKzjUiDkuonZSO3V+/ynTBEQG5p7k6NfEyEihXA 1sC9+kxKPKrOujLpEjB1J+oPQUhpTe7NVpRE8Li/kmkdfEinqbvx X-Google-Smtp-Source: AK7set9LyMhJNPVpt1IYwqxSN4TxmB7rAkj138GBbBCZJYMO7J6MAwPzsq/9w+NFlO/J/vEgeTdCjNOIIrwxv+h27S0= X-Received: by 2002:a05:600c:4f4d:b0:3d1:e4ed:2719 with SMTP id m13-20020a05600c4f4d00b003d1e4ed2719mr267886wmq.147.1675408630936; Thu, 02 Feb 2023 23:17:10 -0800 (PST) MIME-Version: 1.0 From: Prathamesh Kulkarni Date: Fri, 3 Feb 2023 12:46:33 +0530 Message-ID: Subject: [aarch64] Code-gen for vector initialization involving constants To: gcc Patches , Richard Sandiford Content-Type: multipart/mixed; boundary="0000000000004d2de805f3c678b2" X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000004d2de805f3c678b2 Content-Type: text/plain; charset="UTF-8" Hi Richard, While digging thru aarch64_expand_vector_init, I noticed it gives priority to loading a constant first: /* Initialise a vector which is part-variable. We want to first try to build those lanes which are constant in the most efficient way we can. */ which results in suboptimal code-gen for following case: int16x8_t f_s16(int16_t x) { return (int16x8_t) { x, x, x, x, x, x, x, 1 }; } code-gen trunk: f_s16: movi v0.8h, 0x1 ins v0.h[0], w0 ins v0.h[1], w0 ins v0.h[2], w0 ins v0.h[3], w0 ins v0.h[4], w0 ins v0.h[5], w0 ins v0.h[6], w0 ret The attached patch tweaks the following condition: if (n_var == n_elts && n_elts <= 16) { ... } to pass if maxv >= 80% of n_elts, with 80% being an arbitrary "high enough" threshold. The intent is to dup the most repeating variable if it it's repetition is "high enough" and insert constants which should be "better" than loading constant first and inserting variables like in the above case. Alternatively, I suppose we can remove threshold and for constants, generate both sequences and check which one is more efficient ? code-gen with patch: f_s16: dup v0.8h, w0 movi v1.4h, 0x1 ins v0.h[7], v1.h[0] ret The patch is lightly tested to verify that vec[t]-init-*.c tests pass with bootstrap+test in progress. Does this look OK ? Thanks, Prathamesh --0000000000004d2de805f3c678b2 Content-Type: text/plain; charset="US-ASCII"; name="gnu-780-2.txt" Content-Disposition: attachment; filename="gnu-780-2.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_ldo5dedl0 ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LmNjIGIvZ2NjL2NvbmZpZy9h YXJjaDY0L2FhcmNoNjQuY2MKaW5kZXggYWNjMGNmZTVmOTQuLmRmMzM1MDljNmU0IDEwMDY0NAot LS0gYS9nY2MvY29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jYworKysgYi9nY2MvY29uZmlnL2FhcmNo NjQvYWFyY2g2NC5jYwpAQCAtMjIwNzksMzAgKzIyMDc5LDM2IEBAIGFhcmNoNjRfZXhwYW5kX3Zl Y3Rvcl9pbml0IChydHggdGFyZ2V0LCBydHggdmFscykKICAgICAgYW5kIG1hdGNoZXNbWF1bMV0g d2l0aCB0aGUgY291bnQgb2YgZHVwbGljYXRlIGVsZW1lbnRzIChpZiBYIGlzIHRoZQogICAgICBl YXJsaWVzdCBlbGVtZW50IHdoaWNoIGhhcyBkdXBsaWNhdGVzKS4gICovCiAKLSAgaWYgKG5fdmFy ID09IG5fZWx0cyAmJiBuX2VsdHMgPD0gMTYpCisgIGludCBtYXRjaGVzWzE2XVsyXSA9IHswfTsK KyAgZm9yIChpbnQgaSA9IDA7IGkgPCBuX2VsdHM7IGkrKykKICAgICB7Ci0gICAgICBpbnQgbWF0 Y2hlc1sxNl1bMl0gPSB7MH07Ci0gICAgICBmb3IgKGludCBpID0gMDsgaSA8IG5fZWx0czsgaSsr KQorICAgICAgZm9yIChpbnQgaiA9IDA7IGogPD0gaTsgaisrKQogCXsKLQkgIGZvciAoaW50IGog PSAwOyBqIDw9IGk7IGorKykKKwkgIGlmIChydHhfZXF1YWxfcCAoWFZFQ0VYUCAodmFscywgMCwg aSksIFhWRUNFWFAgKHZhbHMsIDAsIGopKSkKIAkgICAgewotCSAgICAgIGlmIChydHhfZXF1YWxf cCAoWFZFQ0VYUCAodmFscywgMCwgaSksIFhWRUNFWFAgKHZhbHMsIDAsIGopKSkKLQkJewotCQkg IG1hdGNoZXNbaV1bMF0gPSBqOwotCQkgIG1hdGNoZXNbal1bMV0rKzsKLQkJICBicmVhazsKLQkJ fQorCSAgICAgIG1hdGNoZXNbaV1bMF0gPSBqOworCSAgICAgIG1hdGNoZXNbal1bMV0rKzsKKwkg ICAgICBicmVhazsKIAkgICAgfQogCX0KLSAgICAgIGludCBtYXhlbGVtZW50ID0gMDsKLSAgICAg IGludCBtYXh2ID0gMDsKLSAgICAgIGZvciAoaW50IGkgPSAwOyBpIDwgbl9lbHRzOyBpKyspCi0J aWYgKG1hdGNoZXNbaV1bMV0gPiBtYXh2KQotCSAgewotCSAgICBtYXhlbGVtZW50ID0gaTsKLQkg ICAgbWF4diA9IG1hdGNoZXNbaV1bMV07Ci0JICB9CisgICAgfQogCisgIGludCBtYXhlbGVtZW50 ID0gMDsKKyAgaW50IG1heHYgPSAwOworICBmb3IgKGludCBpID0gMDsgaSA8IG5fZWx0czsgaSsr KQorICAgIGlmIChtYXRjaGVzW2ldWzFdID4gbWF4dikKKyAgICAgIHsKKwltYXhlbGVtZW50ID0g aTsKKwltYXh2ID0gbWF0Y2hlc1tpXVsxXTsKKyAgICAgIH0KKworICBydHggbWF4X2VsZW0gPSBY VkVDRVhQICh2YWxzLCAwLCBtYXhlbGVtZW50KTsgCisgIGlmIChuX2VsdHMgPD0gMTYKKyAgICAg ICYmICgobl92YXIgPT0gbl9lbHRzKQorCSAgIHx8IChtYXh2ID49IChpbnQpKDAuOCAqIG5fZWx0 cykKKwkgICAgICAgJiYgIUNPTlNUX0lOVF9QIChtYXhfZWxlbSkKKwkgICAgICAgJiYgIUNPTlNU X0RPVUJMRV9QIChtYXhfZWxlbSkpKSkKKyAgICB7CiAgICAgICAvKiBDcmVhdGUgYSBkdXBsaWNh dGUgb2YgdGhlIG1vc3QgY29tbW9uIGVsZW1lbnQsIHVubGVzcyBhbGwgZWxlbWVudHMKIAkgYXJl IGVxdWFsbHkgdXNlbGVzcyB0byB1cywgaW4gd2hpY2ggY2FzZSBqdXN0IGltbWVkaWF0ZWx5IHNl dCB0aGUKIAkgdmVjdG9yIHJlZ2lzdGVyIHVzaW5nIHRoZSBmaXJzdCBlbGVtZW50LiAgKi8KZGlm ZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LTE4LmMg Yi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2NC92ZWMtaW5pdC0xOC5jCm5ldyBmaWxl IG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwLi5lMjBiODEzNTU5ZQotLS0gL2Rldi9udWxs CisrKyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LTE4LmMKQEAg LTAsMCArMSw1MyBAQAorLyogeyBkZy1kbyBjb21waWxlIH0gKi8KKy8qIHsgZGctb3B0aW9ucyAi LU8zIiB9ICovCisvKiB7IGRnLWZpbmFsIHsgY2hlY2stZnVuY3Rpb24tYm9kaWVzICIqKiIgIiIg IiIgfSB9ICovCisKKyNpbmNsdWRlIDxhcm1fbmVvbi5oPgorCisvKgorKiogZjFfczE2OgorKioJ Li4uCisqKglkdXAJdlswLTldK1wuOGgsIHdbMC05XSsKKyoqCW1vdmkJdlswLTldK1wuNGgsIDB4 MQorKioJaW5zCXZbMC05XStcLmhcWzdcXSwgdlswLTldK1wuaFxbMFxdCisqKgkuLi4KKyoqCXJl dAorKi8KKworaW50MTZ4OF90IGYxX3MxNihpbnQxNl90IHgpCit7CisgIHJldHVybiAoaW50MTZ4 OF90KSB7eCwgeCwgeCwgeCwgeCwgeCwgeCwgMX07Cit9CisKKy8qCisqKiBmMl9zMTY6CisqKgku Li4KKyoqCWR1cAl2WzAtOV0rXC44aCwgd1swLTldKworKioJbW92aQl2WzAtOV0rXC40aCwgMHgx CisqKgltb3ZpCXZbMC05XStcLjRoLCAweDIKKyoqCWlucwl2WzAtOV0rXC5oXFs2XF0sIHZbMC05 XStcLmhcWzBcXQorKioJaW5zCXZbMC05XStcLmhcWzdcXSwgdlswLTldK1wuaFxbMFxdCisqKgku Li4KKyoqCXJldAorKi8KKworaW50MTZ4OF90IGYyX3MxNihpbnQxNl90IHgpCit7CisgIHJldHVy biAoaW50MTZ4OF90KSB7IHgsIHgsIHgsIHgsIHgsIHgsIDEsIDIgfTsKK30KKworLyoKKyoqIGYz X3MxNjoKKyoqCS4uLgorKioJbW92aQl2WzAtOV0rXC44aCwgMHgxCisqKglpbnMJdlswLTldK1wu aFxbMFxdLCB3MAorKioJaW5zCXZbMC05XStcLmhcWzFcXSwgdzAKKyoqCWlucwl2WzAtOV0rXC5o XFsyXF0sIHcwCisqKgkuLi4KKyoqCXJldAorKi8KKworaW50MTZ4OF90IGYzX3MxNihpbnQxNl90 IHgpCit7CisgIHJldHVybiAoaW50MTZ4OF90KSB7eCwgeCwgeCwgMSwgMSwgMSwgMSwgMX07Cit9 Cg== --0000000000004d2de805f3c678b2--