From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 247E53858D1E for ; Tue, 25 Apr 2023 10:59:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 247E53858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BB9FA2F4; Tue, 25 Apr 2023 04:00:07 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4092C3F5A1; Tue, 25 Apr 2023 03:59:23 -0700 (PDT) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni ,gcc Patches , richard.sandiford@arm.com Cc: gcc Patches Subject: Re: [aarch64] Code-gen for vector initialization involving constants References: Date: Tue, 25 Apr 2023 11:59:21 +0100 In-Reply-To: (Prathamesh Kulkarni's message of "Fri, 3 Feb 2023 12:46:33 +0530") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-24.5 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Prathamesh Kulkarni writes: > Hi Richard, > While digging thru aarch64_expand_vector_init, I noticed it gives > priority to loading a constant first: > /* Initialise a vector which is part-variable. We want to first try > to build those lanes which are constant in the most efficient way we > can. */ > > which results in suboptimal code-gen for following case: > int16x8_t f_s16(int16_t x) > { > return (int16x8_t) { x, x, x, x, x, x, x, 1 }; > } > > code-gen trunk: > f_s16: > movi v0.8h, 0x1 > ins v0.h[0], w0 > ins v0.h[1], w0 > ins v0.h[2], w0 > ins v0.h[3], w0 > ins v0.h[4], w0 > ins v0.h[5], w0 > ins v0.h[6], w0 > ret > > The attached patch tweaks the following condition: > if (n_var == n_elts && n_elts <= 16) > { > ... > } > > to pass if maxv >= 80% of n_elts, with 80% being an > arbitrary "high enough" threshold. The intent is to dup > the most repeating variable if it it's repetition > is "high enough" and insert constants which should be "better" than > loading constant first and inserting variables like in the above case. I'm not too keen on the 80%. Like you say, it seems a bit arbitrary. The case above can also be handled by relaxing n_var == n_elts to n_var >= n_elts - 1, so that if there's just one constant element, we look for duplicated variable elements. If there are none (maxv == 1), but there is a constant element, we can duplicate the constant element into a register. The case when there's more than one constant element needs more thought (and testcases :-)). E.g. after a certain point, it would probably be better to load the variable and constant parts separately and blend them using TBL. It also matters whether the constants are equal or not. There are also cases that could be handled using EXT. Plus, if we're inserting many variable elements that are already in GPRs, we can probably do better by coalescing them into bigger GPR values and inserting them as wider elements. Because of things like that, I think we should stick to the single-constant case for now. Thanks, Richard