From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) by sourceware.org (Postfix) with ESMTPS id 117D63858D1E for ; Tue, 29 Nov 2022 14:39:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 117D63858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-lf1-x12f.google.com with SMTP id g7so22367000lfv.5 for ; Tue, 29 Nov 2022 06:39:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=f4CySLri3FxUmRccEGRLw6wSyBQfm6HI1uBSylEPPbU=; b=xuYNeCrmSw/9PacxzXuDZw5QpZEUC73FGFsn+6D7R+rK1SP1knsfahdM1VEbP5Fxed 4a/ZoLe4z12gJSE0fWeahaSP/lWWVJXEBTKN9cOavpvcomOwSNJeHko4ZTF8crZexvUQ BfYsU1druI0gs/dPjv7sSZx26hPU47HKUg0wpF0+LCt/flZOYut1HXgL6ItigrsObKHq 8gQiYV52Jrrb6Tsb3fL4Sl751ySnPlhhfT/JXcn1/dlsA4A7WG0RN/+WOex9IBICwbNc uIP8dIHBxlnqwP9uv2fyApbbvXWE8r9iu9ICLcK50lZUlPVDTy2LL3dmmq8HQir/mk8e /sAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=f4CySLri3FxUmRccEGRLw6wSyBQfm6HI1uBSylEPPbU=; b=BAtp7EAg+1MKYxegcCmgYL/a736c+XL1ARLc4Mr+9RV8fEZj7wJqH47hWonueZG3wn US/3AgkpVfTw9DN1eoRP9nBlfFeuNDsfgNg7zaCQ99MK7G20RHBWm2oICq4Itk8koiTm 3yKw2qAALvwmwAPo5Wno8uRhVtCIyaaHOPDNqVUhzobWW2Xzqut949Lo47i+Uk50JDlS qaoZ5oooyky4WkHZ4zVzZTl/QO30K5Ct6q3x292XKnVVqk8Fyg1zsrl7w9RGoyqFTBP4 DuWmQYOzu7cpIQLINpdijSyyuxAQu5BpunS3ifXPTyo3imsDQlrdAIXqGzPRMlPsa108 Acrg== X-Gm-Message-State: ANoB5pkMxGsOn5nwyv9bj9OHaqGMswFCh33o7jL8j7Wg9ehYFb5m2uYL DHajSF8a6SDzr5mlbFL9jx38sgawGXg8uTogF2OaOASsiv0= X-Google-Smtp-Source: AA0mqf5hBhqEbGfxUQ5CJpbdtapyhIzfOJWY2Xz3UMZIr4Y2H9qSHu6wtDpBVi0+/OkKm6pnQE6H3h7oFh/M7rLx4L4= X-Received: by 2002:a05:6512:340a:b0:4ae:d9b4:bd31 with SMTP id i10-20020a056512340a00b004aed9b4bd31mr13904052lfr.645.1669732784091; Tue, 29 Nov 2022 06:39:44 -0800 (PST) MIME-Version: 1.0 From: Prathamesh Kulkarni Date: Tue, 29 Nov 2022 20:09:07 +0530 Message-ID: Subject: [aarch64] Use dup and zip1 for interleaving elements in initializing vector To: gcc Patches , Richard Sandiford Content-Type: multipart/mixed; boundary="00000000000077778605ee9cf55c" X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --00000000000077778605ee9cf55c Content-Type: text/plain; charset="UTF-8" Hi, For the following test-case: int16x8_t foo(int16_t x, int16_t y) { return (int16x8_t) { x, y, x, y, x, y, x, y }; } Code gen at -O3: foo: dup v0.8h, w0 ins v0.h[1], w1 ins v0.h[3], w1 ins v0.h[5], w1 ins v0.h[7], w1 ret For 16 elements, it results in 8 ins instructions which might not be optimal perhaps. I guess, the above code-gen would be equivalent to the following ? dup v0.8h, w0 dup v1.8h, w1 zip1 v0.8h, v0.8h, v1.8h I have attached patch to do the same, if number of elements >= 8, which should be possibly better compared to current code-gen ? Patch passes bootstrap+test on aarch64-linux-gnu. Does the patch look OK ? Thanks, Prathamesh --00000000000077778605ee9cf55c Content-Type: text/plain; charset="US-ASCII"; name="gnu-781-3.txt" Content-Disposition: attachment; filename="gnu-781-3.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lb1y2l610 ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LmNjIGIvZ2NjL2NvbmZpZy9h YXJjaDY0L2FhcmNoNjQuY2MKaW5kZXggYzkxZGY2ZjUwMDYuLmU1ZGVhNzBlMzYzIDEwMDY0NAot LS0gYS9nY2MvY29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jYworKysgYi9nY2MvY29uZmlnL2FhcmNo NjQvYWFyY2g2NC5jYwpAQCAtMjIwMjgsNiArMjIwMjgsMzkgQEAgYWFyY2g2NF9leHBhbmRfdmVj dG9yX2luaXQgKHJ0eCB0YXJnZXQsIHJ0eCB2YWxzKQogICAgICAgcmV0dXJuOwogICAgIH0KIAor ICAvKiBDaGVjayBmb3IgaW50ZXJsZWF2aW5nIGNhc2UuCisgICAgIEZvciBlZyBpZiBpbml0aWFs aXplciBpcyAoaW50MTZ4OF90KSB7eCwgeSwgeCwgeSwgeCwgeSwgeCwgeX0uCisgICAgIEdlbmVy YXRlIGZvbGxvd2luZyBjb2RlOgorICAgICBkdXAgdjAuaCwgeAorICAgICBkdXAgdjEuaCwgeQor ICAgICB6aXAxIHYwLmgsIHYwLmgsIHYxLmgKKyAgICAgZm9yICJsYXJnZSBlbm91Z2giIGluaXRp YWxpemVyLiAgKi8KKworICBpZiAobl9lbHRzID49IDgpCisgICAgeworICAgICAgaW50IGk7Cisg ICAgICBmb3IgKGkgPSAyOyBpIDwgbl9lbHRzOyBpKyspCisJaWYgKCFydHhfZXF1YWxfcCAoWFZF Q0VYUCAodmFscywgMCwgaSksIFhWRUNFWFAgKHZhbHMsIDAsIGkgJSAyKSkpCisJICBicmVhazsK KworICAgICAgaWYgKGkgPT0gbl9lbHRzKQorCXsKKwkgIG1hY2hpbmVfbW9kZSBtb2RlID0gR0VU X01PREUgKHRhcmdldCk7CisJICBydHggZGVzdFsyXTsKKworCSAgZm9yIChpbnQgaSA9IDA7IGkg PCAyOyBpKyspCisJICAgIHsKKwkgICAgICBydHggeCA9IGNvcHlfdG9fbW9kZV9yZWcgKEdFVF9N T0RFX0lOTkVSIChtb2RlKSwgWFZFQ0VYUCAodmFscywgMCwgaSkpOworCSAgICAgIGRlc3RbaV0g PSBnZW5fcmVnX3J0eCAobW9kZSk7CisJICAgICAgYWFyY2g2NF9lbWl0X21vdmUgKGRlc3RbaV0s IGdlbl92ZWNfZHVwbGljYXRlIChtb2RlLCB4KSk7CisJICAgIH0KKworCSAgcnR2ZWMgdiA9IGdl bl9ydHZlYyAoMiwgZGVzdFswXSwgZGVzdFsxXSk7CisJICBlbWl0X3NldF9pbnNuICh0YXJnZXQs IGdlbl9ydHhfVU5TUEVDIChtb2RlLCB2LCBVTlNQRUNfWklQMSkpOworCSAgcmV0dXJuOworCX0K KyAgICB9CisKICAgZW51bSBpbnNuX2NvZGUgaWNvZGUgPSBvcHRhYl9oYW5kbGVyICh2ZWNfc2V0 X29wdGFiLCBtb2RlKTsKICAgZ2NjX2Fzc2VydCAoaWNvZGUgIT0gQ09ERV9GT1Jfbm90aGluZyk7 CiAKZGlmZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L2ludGVybGVh dmUtaW5pdC0xLmMgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2NC9pbnRlcmxlYXZl LWluaXQtMS5jCm5ldyBmaWxlIG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwLi5lZTc3NTA0 ODU4OQotLS0gL2Rldi9udWxsCisrKyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0 L2ludGVybGVhdmUtaW5pdC0xLmMKQEAgLTAsMCArMSwzNyBAQAorLyogeyBkZy1kbyBjb21waWxl IH0gKi8KKy8qIHsgZGctb3B0aW9ucyAiLU8zIiB9ICovCisvKiB7IGRnLWZpbmFsIHsgY2hlY2st ZnVuY3Rpb24tYm9kaWVzICIqKiIgIiIgIiIgfSB9ICovCisKKyNpbmNsdWRlIDxhcm1fbmVvbi5o PgorCisvKgorKiogZm9vOgorKioJLi4uCisqKglkdXAJdlswLTldK1wuOGgsIHdbMC05XSsKKyoq CWR1cAl2WzAtOV0rXC44aCwgd1swLTldKworKioJemlwMQl2WzAtOV0rXC44aCwgdlswLTldK1wu OGgsIHZbMC05XStcLjhoCisqKgkuLi4KKyoqCXJldAorKi8KKworaW50MTZ4OF90IGZvbyhpbnQx Nl90IHgsIGludCB5KQoreworICBpbnQxNng4X3QgdiA9IChpbnQxNng4X3QpIHt4LCB5LCB4LCB5 LCB4LCB5LCB4LCB5fTsgCisgIHJldHVybiB2OworfQorCisvKgorKiogZm9vMjoKKyoqCS4uLgor KioJZHVwCXZbMC05XStcLjhoLCB3WzAtOV0rCisqKgltb3ZpCXZbMC05XStcLjhoLCAweDEKKyoq CXppcDEJdlswLTldK1wuOGgsIHZbMC05XStcLjhoLCB2WzAtOV0rXC44aAorKioJLi4uCisqKgly ZXQKKyovCisKK2ludDE2eDhfdCBmb28yKGludDE2X3QgeCkgCit7CisgIGludDE2eDhfdCB2ID0g KGludDE2eDhfdCkge3gsIDEsIHgsIDEsIHgsIDEsIHgsIDF9OyAKKyAgcmV0dXJuIHY7Cit9Cg== --00000000000077778605ee9cf55c--