From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by sourceware.org (Postfix) with ESMTPS id 8CE2C3858D39 for ; Mon, 3 Apr 2023 18:13:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8CE2C3858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x431.google.com with SMTP id r11so30272545wrr.12 for ; Mon, 03 Apr 2023 11:13:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1680545583; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=CdTgYE6352dBlg7HoDaRxkmGkByRVdhpYNKnpoZQSyk=; b=C/+HrffLLDqrh5LXfoF7aBbzCxkyZCcFsJqu99LJIBg8i96lBwqqXwzMmVhqXGtY8R 003JvQQqXOK4cVI1zdAp5OYUZGF6SAbJcjMHfjzCgSAhpuum0aTjwiMZBlFd6c9SFHLG LdVg8n2zNSSYH3O0Yu4qat9u9gtmKpOts0I2Ef0P6pxOdVpzY0eP5ixUy45QrN8Fyl2m ykUSl34ZB/qtU/WBLTDf50yL62tgUnzB205L6+TQT6MOV5lcE/8Sk/QqYe/On7YELcoe +5U81ifCF/C7Cd9PXyJtCfNFgVzUMwzWoU6JBxSTruMHV8ia1wRx9sxa8Uy+759rU/yc 99dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680545583; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CdTgYE6352dBlg7HoDaRxkmGkByRVdhpYNKnpoZQSyk=; b=ee8gcpApK7sIPjnl3DD2/msd1lQXXa7BkXpQYQVTjHZXoDcYCQIFIFM8Ejud4gVSAi Z4inXYcItoGFIMHck9wEK5V0bys14cSkw+GLC9/3ygJIPWhsvAmq2Nyg91mjAkuj/UPK AsVGZ2JnP0nPhXefM/4Q5vZ2FXZvmpoOUrB2EhJg6uyi3ntE+H45DZpUijIFjP97equE n8jJmim6w1raDldceWyhJ5kZJ4BnruAsA0adukr9P2hs3RXpkWi1SykO7wflLymLMgR4 KWusJDGKXD2VXQlJZD6TZ+6bCmLcBJ/nxD53KQXZdOleYz+wbVPI4RigKY4T92FgYQF2 0w1Q== X-Gm-Message-State: AAQBX9f7GoG2lr4L1UiaJ25OVjmV6F/dAeK+kZy95JY2eRVFPkQAfkTX LfF7YEFQWzrJKF5txjgjWWhtbXMdyzjR+4obbgg+IyrADw4agZlbqAc= X-Google-Smtp-Source: AKy350Y3Fx2PdkkwZ9LO3v1uH7n8QRVDiJloeNwp3QdDKegWbAnPdltY8KkeBuWlve5fYk2uEzoosXeknJqcdYlvbo0= X-Received: by 2002:adf:dc4a:0:b0:2cf:e388:7803 with SMTP id m10-20020adfdc4a000000b002cfe3887803mr6442055wrj.3.1680545583401; Mon, 03 Apr 2023 11:13:03 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Mon, 3 Apr 2023 23:42:26 +0530 Message-ID: Subject: Re: [aarch64] Code-gen for vector initialization involving constants To: gcc Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 13 Feb 2023 at 11:58, Prathamesh Kulkarni wrote: > > On Fri, 3 Feb 2023 at 12:46, Prathamesh Kulkarni > wrote: > > > > Hi Richard, > > While digging thru aarch64_expand_vector_init, I noticed it gives > > priority to loading a constant first: > > /* Initialise a vector which is part-variable. We want to first try > > to build those lanes which are constant in the most efficient way we > > can. */ > > > > which results in suboptimal code-gen for following case: > > int16x8_t f_s16(int16_t x) > > { > > return (int16x8_t) { x, x, x, x, x, x, x, 1 }; > > } > > > > code-gen trunk: > > f_s16: > > movi v0.8h, 0x1 > > ins v0.h[0], w0 > > ins v0.h[1], w0 > > ins v0.h[2], w0 > > ins v0.h[3], w0 > > ins v0.h[4], w0 > > ins v0.h[5], w0 > > ins v0.h[6], w0 > > ret > > > > The attached patch tweaks the following condition: > > if (n_var == n_elts && n_elts <= 16) > > { > > ... > > } > > > > to pass if maxv >= 80% of n_elts, with 80% being an > > arbitrary "high enough" threshold. The intent is to dup > > the most repeating variable if it it's repetition > > is "high enough" and insert constants which should be "better" than > > loading constant first and inserting variables like in the above case. > > > > Alternatively, I suppose we can remove threshold and for constants, > > generate both sequences and check which one is more > > efficient ? > > > > code-gen with patch: > > f_s16: > > dup v0.8h, w0 > > movi v1.4h, 0x1 > > ins v0.h[7], v1.h[0] > > ret > > > > The patch is lightly tested to verify that vec[t]-init-*.c tests pass > > with bootstrap+test > > in progress. > > Does this look OK ? > Hi Richard, > ping https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611243.html Hi Richard, ping * 2: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611243.html Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > Prathamesh