From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 82395 invoked by alias); 19 Jun 2017 13:37:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 81368 invoked by uid 89); 19 Jun 2017 13:37:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=bracket, You've, Youve, Trunk X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 19 Jun 2017 13:37:17 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D33BE344; Mon, 19 Jun 2017 06:37:20 -0700 (PDT) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F2DC03F557; Mon, 19 Jun 2017 06:37:19 -0700 (PDT) Subject: Re: [Neon intrinsics] Literal vector construction through vcombine is poor To: Michael Collison , GCC Patches Cc: nd References: From: "Richard Earnshaw (lists)" Message-ID: Date: Mon, 19 Jun 2017 13:37:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SW-Source: 2017-06/txt/msg01304.txt.bz2 On 16/06/17 22:08, Michael Collison wrote: > This patch improves code generation for literal vector construction by expanding and exposing the pattern to rtl optimization earlier. The current implementation delays splitting the pattern until after reload which results in poor code generation for the following code: > > > #include "arm_neon.h" > > int16x8_t > foo () > { > return vcombine_s16 (vdup_n_s16 (0), vdup_n_s16 (8)); > } > > Trunk generates: > > foo: > movi v1.2s, 0 > movi v0.4h, 0x8 > dup d2, v1.d[0] > ins v2.d[1], v0.d[0] > orr v0.16b, v2.16b, v2.16b > ret > > With the patch we now generate: > > foo: > movi v1.4h, 0x8 > movi v0.4s, 0 > ins v0.d[1], v1.d[0] > ret > > Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk. > > 2017-06-15 Michael Collison > > * config/aarch64/aarch64-simd.md(aarch64_combine_internal): > Convert from define_insn_and_split into define_expand > * config/aarch64/aarch64.c(aarch64_split_simd_combine): > Allow register and subreg operands. > Your changelog entry is confusing. You've deleted the aarch64_combine_internal pattern entirely, having merged some of its functionality directly into its caller (aarch64_combine). So I think it should read: * config/aarch64/aarch64-simd.md (aarch64_combine): Directly call aarch64_split_simd_combine. (aarch64_combine_internal): Delete pattern. * ... Note also there should be a space between the file name and the open bracket for the first function name. Why don't you need the big-endian code path any more? R. > > pr7057.patch > > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > index c462164..4a253a9 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2807,27 +2807,11 @@ > op1 = operands[1]; > op2 = operands[2]; > } > - emit_insn (gen_aarch64_combine_internal (operands[0], op1, op2)); > - DONE; > -} > -) > > -(define_insn_and_split "aarch64_combine_internal" > - [(set (match_operand: 0 "register_operand" "=&w") > - (vec_concat: (match_operand:VDC 1 "register_operand" "w") > - (match_operand:VDC 2 "register_operand" "w")))] > - "TARGET_SIMD" > - "#" > - "&& reload_completed" > - [(const_int 0)] > -{ > - if (BYTES_BIG_ENDIAN) > - aarch64_split_simd_combine (operands[0], operands[2], operands[1]); > - else > - aarch64_split_simd_combine (operands[0], operands[1], operands[2]); > + aarch64_split_simd_combine (operands[0], op1, op2); > + > DONE; > } > -[(set_attr "type" "multiple")] > ) > > (define_expand "aarch64_simd_combine" > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 2e385c4..46bd78b 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -1650,7 +1650,8 @@ aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2) > > gcc_assert (VECTOR_MODE_P (dst_mode)); > > - if (REG_P (dst) && REG_P (src1) && REG_P (src2)) > + if (register_operand (dst, dst_mode) && register_operand (src1, src_mode) > + && register_operand (src2, src_mode)) > { > rtx (*gen) (rtx, rtx, rtx); > >