From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 4E6F4384402D for ; Fri, 10 Jun 2022 07:00:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4E6F4384402D Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B861011FB; Fri, 10 Jun 2022 00:00:21 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E540F3F66F; Fri, 10 Jun 2022 00:00:20 -0700 (PDT) From: Richard Sandiford To: Andrew Carlotti via Gcc-patches Mail-Followup-To: Andrew Carlotti via Gcc-patches , Andrew Carlotti , richard.sandiford@arm.com Subject: Re: [PATCH] aarch64: Lower vcombine to GIMPLE References: Date: Fri, 10 Jun 2022 08:00:18 +0100 In-Reply-To: (Andrew Carlotti via Gcc-patches's message of "Tue, 7 Jun 2022 17:23:56 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-59.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jun 2022 07:00:24 -0000 Andrew Carlotti via Gcc-patches writes: > Hi all, > > This lowers vcombine intrinsics to a GIMPLE vector constructor, which enables better optimisation during GIMPLE passes. > > Bootstrapped and tested on aarch64-none-linux-gnu, and tested for aarch64_be-none-linux-gnu via cross-compilation. > > > gcc/ > > * config/aarch64/aarch64-builtins.c > (aarch64_general_gimple_fold_builtin): Add combine. > > gcc/testsuite/ > > * gcc.target/aarch64/advsimd-intrinsics/combine.c: > New test. > > --- > > diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc > index 5217dbdb2ac78bba0a669d22af6d769d1fe91a3d..9d52fb8c5a48c9b743defb340a85fb20a1c8f014 100644 > --- a/gcc/config/aarch64/aarch64-builtins.cc > +++ b/gcc/config/aarch64/aarch64-builtins.cc > @@ -2827,6 +2827,18 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt, > gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt)); > break; > > + BUILTIN_VDC (BINOP, combine, 0, AUTO_FP) > + BUILTIN_VD_I (BINOPU, combine, 0, NONE) > + BUILTIN_VDC_P (BINOPP, combine, 0, NONE) > + { > + if (BYTES_BIG_ENDIAN) > + std::swap(args[0], args[1]); We probably shouldn't do this swap in-place, since args refers directly to the gimple statement. > + tree ret_type = TREE_TYPE (gimple_call_lhs (stmt)); > + tree ctor = build_constructor_va (ret_type, 2, NULL_TREE, args[0], NULL_TREE, args[1]); Minor formatting nit: lines should be under 80 chars. Looks good otherwise, thanks, and sorry for the slow review. Richard > + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), ctor); > + } > + break; > + > /*lower store and load neon builtins to gimple. */ > BUILTIN_VALL_F16 (LOAD1, ld1, 0, LOAD) > BUILTIN_VDQ_I (LOAD1_U, ld1, 0, LOAD) > diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/combine.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/combine.c > new file mode 100644 > index 0000000000000000000000000000000000000000..d08faf7a4a160a1e83428ed9b270731bbf7b8c8a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/combine.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile { target { aarch64*-*-* } } } */ > +/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */ > +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ > + > +#include > + > +/* > +** foo: > +** umov w0, v1\.s\[1\] > +** ret > +*/ > + > +int32_t foo (int32x2_t a, int32x2_t b) > +{ > + int32x4_t c = vcombine_s32(a, b); > + return vgetq_lane_s32(c, 3); > +} > +