From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by sourceware.org (Postfix) with ESMTPS id 9BA1F3858D1E for ; Tue, 29 Nov 2022 17:06:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9BA1F3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-lf1-x136.google.com with SMTP id p8so23019743lfu.11 for ; Tue, 29 Nov 2022 09:06:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=6ZntkempNNp9h1hQlKPNAf+Ky7tJV2doZBSL/SStwSI=; b=v8dh7Cx/0fNeqGVANqVblAgZefE8UrPNmqIy6zqZo4wfBaRfj8BOXNNVCSUazvRfMk V+sUlv1s6UeObsmgBUaeH0KTeyCYjV4YNbPmfYG8DJJdbkzVj7Y7LkDoUqO0H5Py6wuK xMbsen2aw0gFMEyQuV6Splc4TLoD2O2xwYrTZXgqth2reNqbzU5+mGr/Sxyinx6lSev1 nciy9RZY9zY8/LNSCjGOD3nXuGAMxyiQjf5BUVJJE2gq06lSGqlxUN8S8CFwGrH4A8Kf oyeIcet4EXbwIwuTm4Oo/IY5ItmjU5i3zJ9d/B17M6RBa1dbx5mZIYNbHOaVbOZOLqTJ J51A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6ZntkempNNp9h1hQlKPNAf+Ky7tJV2doZBSL/SStwSI=; b=H6I6MOmGGIvI5Mtb50TzVW9l4+C7ZwLSNFO6A8dtxXyoI++r1LOexFt1yqpC1H2fXB 0uIHAhG0TX8eqP3toCRUFnIE72+tslLSBChA32q4FeJWP9CJh3JXWJK8hLqUpnU/dOk8 NK4tEZOCx64O8wGI7BXmLfgT7dRX1AC1fn/H3vDJtpdWX5D+qpmLJmlBfgXlst+vm2BK V9tb50OS2uBSuaN3KGh0uzKibZjYzafBdqyYR91gvnvRBn+EXoZTe9ljZUnz7REeJtRc 5JygMXFc6ooR98R6SaqZG3zFhPJzu4zTqFDrqNF5hBofG0w8dmGuglBibETcq+xxHkNm G8uA== X-Gm-Message-State: ANoB5pnP1IccbtmeCpTu/CzbY1b++sspqMvX5VI2BnYubRWFWMZOfTyN 6QOKWxRUn15dSvyA2Kqq3NzZvtS+bCy540mlYR9p/Q== X-Google-Smtp-Source: AA0mqf5tIuZUuwyXKr2lNumxnzx4ozIS6T+X1lMchv1f6xKefgSPOV4wPpUx/jQa5qbCFkAWjXPP5T3/eiO5p1bwsjU= X-Received: by 2002:a19:6a0c:0:b0:4b4:c67:1e8a with SMTP id u12-20020a196a0c000000b004b40c671e8amr18409452lfu.126.1669741615990; Tue, 29 Nov 2022 09:06:55 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Tue, 29 Nov 2022 22:36:18 +0530 Message-ID: Subject: Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector To: Andrew Pinski Cc: gcc Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 29 Nov 2022 at 20:43, Andrew Pinski wrote: > > On Tue, Nov 29, 2022 at 6:40 AM Prathamesh Kulkarni via Gcc-patches > wrote: > > > > Hi, > > For the following test-case: > > > > int16x8_t foo(int16_t x, int16_t y) > > { > > return (int16x8_t) { x, y, x, y, x, y, x, y }; > > } > > (Not to block this patch) > Seems like this trick can be done even with less than perfect initializer too: > e.g. > int16x8_t foo(int16_t x, int16_t y) > { > return (int16x8_t) { x, y, x, y, x, y, x, 0 }; > } > > Which should generate something like: > dup v0.8h, w0 > dup v1.8h, w1 > zip1 v0.8h, v0.8h, v1.8h > ins v0.h[7], wzr Hi Andrew, Nice catch, thanks for the suggestions! More generally, code-gen with constants involved seems to be sub-optimal. For example: int16x8_t foo(int16_t x) { return (int16x8_t) { x, x, x, x, x, x, x, 1 }; } results in: foo: movi v0.8h, 0x1 ins v0.h[0], w0 ins v0.h[1], w0 ins v0.h[2], w0 ins v0.h[3], w0 ins v0.h[4], w0 ins v0.h[5], w0 ins v0.h[6], w0 ret which I suppose could instead be the following ? foo: dup v0.8h, w0 mov w1, 0x1 ins v0.h[7], w1 ret I will try to address this in follow up patch. Thanks, Prathamesh > > Thanks, > Andrew Pinski > > > > > > Code gen at -O3: > > foo: > > dup v0.8h, w0 > > ins v0.h[1], w1 > > ins v0.h[3], w1 > > ins v0.h[5], w1 > > ins v0.h[7], w1 > > ret > > > > For 16 elements, it results in 8 ins instructions which might not be > > optimal perhaps. > > I guess, the above code-gen would be equivalent to the following ? > > dup v0.8h, w0 > > dup v1.8h, w1 > > zip1 v0.8h, v0.8h, v1.8h > > > > I have attached patch to do the same, if number of elements >= 8, > > which should be possibly better compared to current code-gen ? > > Patch passes bootstrap+test on aarch64-linux-gnu. > > Does the patch look OK ? > > > > Thanks, > > Prathamesh