From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=FN6r=4T=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 9F1B63858D1E
	for <gcc-patches@gcc.gnu.org>; Wed, 21 Dec 2022 10:31:02 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9F1B63858D1E
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 52E9F2F4;
	Wed, 21 Dec 2022 02:31:43 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 688C93F71E;
	Wed, 21 Dec 2022 02:31:01 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <tamar.christina@arm.com>
Mail-Followup-To: Tamar Christina <tamar.christina@arm.com>,gcc-patches@gcc.gnu.org,  nd@arm.com,  Richard.Earnshaw@arm.com,  Marcus.Shawcroft@arm.com,  Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com
Cc: gcc-patches@gcc.gnu.org,  nd@arm.com,  Richard.Earnshaw@arm.com,  Marcus.Shawcroft@arm.com,  Kyrylo.Tkachov@arm.com
Subject: Re: [PATCH]AArch64 relax constraints on FP16 insn PR108172
References: <patch-16717-tamar@arm.com>
Date: Wed, 21 Dec 2022 10:31:00 +0000
In-Reply-To: <patch-16717-tamar@arm.com> (Tamar Christina's message of "Tue,
	20 Dec 2022 21:32:19 +0000")
Message-ID: <mpta63h3v6j.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-37.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,KAM_NUMSUBJECT,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> The move, load, load, store, dup, basically all the non arithmetic FP16
> instructions use baseline armv8-a HF support, and so do not require the
> Armv8.2-a extensions.  This relaxes the instructions.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	PR target/108172
> 	* config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): Relax
> 	TARGET_SIMD_F16INST to TARGET_SIMD.
> 	* config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Re-order.
> 	* config/aarch64/iterators.md (VMOVE): Drop TARGET_SIMD_F16INST.
>
> gcc/testsuite/ChangeLog:
>
> 	PR target/108172
> 	* gcc.target/aarch64/pr108172.c: New test.

OK, thanks.

I think we need better tests for this area though.  The VMOVE uses
include aarch64_dup_lane<mode>, which uses <Vtype>, which has no
definition for V2HF, so we get:

    return "dup\t%0.<Vtype>, %1.h[%2]";

The original patch defined Vtype to "2h", but using that here would
have generated an invalid instruction.

We also have unexpanded <Vtype>s in the pairwise operations:

    "faddp\t%h0, %1.<Vtype>",
    "fmaxp\t%h0, %1.<Vtype>",
    "fminp\t%h0, %1.<Vtype>",
    "fmaxnmp\t%h0, %1.<Vtype>",
    "fminnmp\t%h0, %1.<Vtype>",

Would it be easy (using a combination of this patch and a follow-on patch)
to wind the V2HF support back to a state that makes sense on its own,
without the postponed pairwise support?  Or would it be simpler to
revert 2cba118e538ba0b7582af7f9fb5ba2dfbb772f8e for GCC 13 and revisit
this in GCC 14, alongside the original motivating use case?

Richard

> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index a62b6deaf4a57e570074d7d894e6fac13779f6fb..8a9f655d547285ec7bdc173086308d7d44a8d482 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -164,7 +164,7 @@ (define_insn "*aarch64_simd_movv2hf"
>  		"=w, m,  m,  w, ?r, ?w, ?r, w, w")
>  	(match_operand:V2HF 1 "general_operand"
>  		"m,  Dz, w,  w,  w,  r,  r, Dz, Dn"))]
> -  "TARGET_SIMD_F16INST
> +  "TARGET_SIMD
>     && (register_operand (operands[0], V2HFmode)
>         || aarch64_simd_reg_or_zero (operands[1], V2HFmode))"
>     "@
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 73515c174fa4fe7830527e7eabd91c4648130ff4..d1c0476321b79bc6aded350d24ea5d556c796519 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -3616,6 +3616,8 @@ aarch64_classify_vector_mode (machine_mode mode)
>      case E_V4x2DFmode:
>        return TARGET_FLOAT ? VEC_ADVSIMD | VEC_STRUCT : 0;
>  
> +    /* 32-bit Advanced SIMD vectors.  */
> +    case E_V2HFmode:
>      /* 64-bit Advanced SIMD vectors.  */
>      case E_V8QImode:
>      case E_V4HImode:
> @@ -3634,7 +3636,6 @@ aarch64_classify_vector_mode (machine_mode mode)
>      case E_V8BFmode:
>      case E_V4SFmode:
>      case E_V2DFmode:
> -    case E_V2HFmode:
>        return TARGET_FLOAT ? VEC_ADVSIMD : 0;
>  
>      default:
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index a521dbde1ec42c0c442a9ca3dd52c9727d116399..70742520984d30158e62a38c92abea82b2dac059 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -204,8 +204,7 @@ (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
>  ;; All Advanced SIMD modes suitable for moving, loading, and storing
>  ;; including V2HF
>  (define_mode_iterator VMOVE [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
> -			     V4HF V8HF V4BF V8BF V2SF V4SF V2DF
> -			     (V2HF "TARGET_SIMD_F16INST")])
> +			     V2HF V4HF V8HF V4BF V8BF V2SF V4SF V2DF])
>  
>  
>  ;; The VALL_F16 modes except the 128-bit 2-element ones.
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr108172.c b/gcc/testsuite/gcc.target/aarch64/pr108172.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b29054fdb1d6e602755bc93089f1edec4eb53b8e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr108172.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O0" } */
> +
> +typedef _Float16 v4hf __attribute__ ((vector_size (8)));
> +typedef _Float16 v2hf __attribute__ ((vector_size (4)));
> +
> +v4hf
> +v4hf_abi_1 (v4hf a)
> +{
> +  return a;
> +}
> +
> +v4hf
> +v4hf_abi_3 (v4hf a, v4hf b, v4hf c)
> +{
> +  return c;
> +}
> +
> +v4hf
> +v4hf_abi_4 (v4hf a, v4hf b, v4hf c, v4hf d)
> +{
> +  return d;
> +}
> +
> +v2hf
> +v2hf_test (v2hf a, v2hf b, v2hf c, v2hf d)
> +{
> +  return b;
> +}
> +