From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id BE2393A77C15 for ; Wed, 28 Apr 2021 14:30:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BE2393A77C15 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 62714ED1; Wed, 28 Apr 2021 07:30:36 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E05F73F694; Wed, 28 Apr 2021 07:30:30 -0700 (PDT) From: Richard Sandiford To: Jonathan Wright via Gcc-patches Mail-Followup-To: Jonathan Wright via Gcc-patches , Jonathan Wright , richard.sandiford@arm.com Subject: Re: [PATCH 4/20] aarch64: Use RTL builtins for [su]paddl[q] intrinsics References: Date: Wed, 28 Apr 2021 15:30:29 +0100 In-Reply-To: (Jonathan Wright via Gcc-patches's message of "Wed, 28 Apr 2021 13:51:15 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Apr 2021 14:30:38 -0000 Jonathan Wright via Gcc-patches writes: > Hi, > > As subject, this patch rewrites the [su]paddl[q] Neon intrinsics to use > RTL builtins rather than inline assembly code, allowing for better > scheduling and optimization. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > Ok for master? OK, thanks. For the record=E2=80=A6 > __extension__ extern __inline uint64x1_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vpaddl_u32 (uint32x2_t __a) > { > - uint64x1_t __result; > - __asm__ ("uaddlp %0.1d,%1.2s" > - : "=3Dw"(__result) > - : "w"(__a) > - : /* No clobbers */); > - return __result; > + return (uint64x1_t) __builtin_aarch64_uaddlpv2si_uu (__a); > } =E2=80=A6I wasn't sure for this whether it would be better to use (uint64x1= _t) {=E2=80=A6} instead of a scalar-to-vector conversion, since that seems to be the more common style in the rest of arm_neon.h. But there are already instances of this kind of conversion too, and if anything it should be more efficient than creating a distinct vector object. Richard