From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-472108-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 103021 invoked by alias); 26 Jan 2018 14:00:04 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 102806 invoked by uid 89); 26 Jan 2018 14:00:03 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-10.9 required=5.0 tests=BAYES_00,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 26 Jan 2018 14:00:00 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DF6A91529;	Fri, 26 Jan 2018 05:59:58 -0800 (PST)
Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DF23F3F53D;	Fri, 26 Jan 2018 05:59:57 -0800 (PST)
Message-ID: <5A6B345C.9040105@foss.arm.com>
Date: Fri, 26 Jan 2018 14:22:00 -0000
From: Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com,  james.greenhalgh@arm.com, marcus.shawcroft@arm.com,  richard.sandiford@linaro.org
Subject: Re: [AArch64] Fix sve/extract_[12].c for big-endian SVE
References: <87d11wpx0r.fsf@linaro.org>
In-Reply-To: <87d11wpx0r.fsf@linaro.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-SW-Source: 2018-01/txt/msg02179.txt.bz2

Hi Richard,

On 26/01/18 13:31, Richard Sandiford wrote:
> sve/extract_[12].c were relying on the target-independent optimisation
> that removes a redundant vec_select, so that we don't end up with
> things like:
>
>     dup v0.4s, v0.4s[0]
>     ...use s0...
>
> But that optimisation rightly doesn't trigger for big-endian targets,
> because GCC expects lane 0 to be in the high part of the register
> rather than the low part.
>
> SVE breaks this assumption -- see the comment at the head of
> aarch64-sve.md for details -- so the optimisation is valid for
> both endiannesses.  Long term, we probably need some kind of target
> hook to make GCC aware of this.
>
> But there's another problem with the current extract pattern: it doesn't
> tell the register allocator how cheap an extraction of lane 0 is with
> tied registers.  It seems better to split the lane 0 case out into
> its own pattern and use tied operands for the FPR<-SIMD case,
> so that using different registers has the cost of an extra reload.
> I think we want this for both endiannesses, regardless of the hook
> described above.
>
> Also, the gen_lowpart in this pattern fails for aarch64_be due to
> TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG
> instead.  We're only creating this rtl in order to print it, so there's
> no need for anything fancier.
>
> Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?
>
> Richard
>
>
> 2018-01-26  Richard Sandiford <richard.sandiford@linaro.org>
>
> gcc/
>         * config/aarch64/aarch64-sve.md (*vec_extract<mode><Vel>_0): New
>         pattern.
>         (*vec_extract<mode><Vel>_v128): Require a nonzero lane number.
>         Use gen_rtx_REG rather than gen_lowpart.
>
> Index: gcc/config/aarch64/aarch64-sve.md
> ===================================================================
> --- gcc/config/aarch64/aarch64-sve.md   2018-01-13 18:01:51.232735405 +0000
> +++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:26:50.176756711 +0000
> @@ -484,18 +484,52 @@ (define_expand "vec_extract<mode><Vel>"
>    }
>  )
>
> +;; Extract element zero.  This is a special case because we want to force
> +;; the registers to be the same for the second alternative, and then
> +;; split the instruction into nothing after RA.
> +(define_insn_and_split "*vec_extract<mode><Vel>_0"
> +  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
> +       (vec_select:<VEL>
> +         (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")
> +         (parallel [(const_int 0)])))]
> +  "TARGET_SVE"
> +  {
> +    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
> +    switch (which_alternative)
> +      {
> +       case 0:
> +         return "umov\\t%<vwcore>0, %1.<Vetype>[0]";
> +       case 1:
> +         return "#";
> +       case 2:
> +         return "st1\\t{%1.<Vetype>}[0], %0";
> +       default:
> +         gcc_unreachable ();
> +      }
> +  }
> +  "&& reload_completed
> +   && REG_P (operands[1])
> +   && REGNO (operands[0]) == REGNO (operands[1])"

Is it guaranteed that operands[0] will be a REG rtx here?
The aarch64_simd_nonimmediate_operand predicate and Utv constraint might allow
a MEM, which would cause an error in an RTL-checking build.
If I'm not mistaken GCC may attempt the split even when matching alternative 2.

Kyrill

> +  [(const_int 0)]
> +  {
> +    emit_note (NOTE_INSN_DELETED);
> +    DONE;
> +  }
> +  [(set_attr "type" "neon_to_gp_q, untyped, neon_store1_one_lane_q")]
> +)
> +
>  ;; Extract an element from the Advanced SIMD portion of the register.
>  ;; We don't just reuse the aarch64-simd.md pattern because we don't
> -;; want any chnage in lane number on big-endian targets.
> +;; want any change in lane number on big-endian targets.
>  (define_insn "*vec_extract<mode><Vel>_v128"
>    [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
>          (vec_select:<VEL>
>            (match_operand:SVE_ALL 1 "register_operand" "w, w, w")
>            (parallel [(match_operand:SI 2 "const_int_operand")])))]
>    "TARGET_SVE
> -   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 15)"
> +   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 1, 15)"
>    {
> -    operands[1] = gen_lowpart (<V128>mode, operands[1]);
> +    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
>      switch (which_alternative)
>        {
>          case 0: