From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tnfchris@sourceware.org>
Received: by sourceware.org (Postfix, from userid 1984)
	id CB38A3858C00; Wed, 28 Jun 2023 13:33:33 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CB38A3858C00
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1687959213;
	bh=WVMWrxMVydMvC66wmHPVDvKay9WWzr55HX9l/AmbkxQ=;
	h=From:To:Subject:Date:From;
	b=MVBTV7Fmf2Liqpik4F1k2frYzh65Lhi2/AcyhitwXWivnjOIMDfNf1eqnWreFkt7i
	 LWmHuFajQRZKgprWQNAKajDieR3ryp1ZTTkvWWca/23XrWnfrjknnmU088aKQ7Gln/
	 BlCKk7eCgbEeA7gS6owf09nNgH5r0BwvG9y+5tYA=
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Tamar Christina <tnfchris@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc(refs/users/tnfchris/heads/gcc-14-early-break)] Add MVE cbranch
 implementation
X-Act-Checkin: gcc
X-Git-Author: Tamar Christina <tamar.christina@arm.com>
X-Git-Refname: refs/users/tnfchris/heads/gcc-14-early-break
X-Git-Oldrev: c5214656ac4043f8611ec80985682ecf0c8697d4
X-Git-Newrev: 9b341dcdc98efd309dff8da28544fac225dafb2f
Message-Id: <20230628133333.CB38A3858C00@sourceware.org>
Date: Wed, 28 Jun 2023 13:33:33 +0000 (GMT)
List-Id: <gcc-cvs.sourceware.org>

https://gcc.gnu.org/g:9b341dcdc98efd309dff8da28544fac225dafb2f

commit 9b341dcdc98efd309dff8da28544fac225dafb2f
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Wed Jun 28 14:28:32 2023 +0100

    Add MVE cbranch implementation
    
    This adds an implementation for conditional branch optab for MVE.
    
    Unfortunately MVE has rather limited operations on VPT.P0, we are missing the
    ability to do P0 comparisons and logical OR on P0.
    
    For that reason we can only support cbranch with 0, as for comparing to a 0
    predicate we don't need to actually do a comparison, we only have to check that
    any bit is set within P0.
    
    Because we can only do P0 comparisons with 0, the costing of the comparison was
    reduced in order for the compiler not to try to push 0 to a register thinking
    it's too expensive.  For the cbranch implementation to be safe we must see the
    constant 0 vector.
    
    For the lack of logical OR on P0 we can't really work around.  This means MVE
    can't support cases where the sizes of operands in the comparison don't match,
    i.e. when one operand has been unpacked.
    
    For e.g.
    
    void f1 ()
    {
      for (int i = 0; i < N; i++)
        {
          b[i] += a[i];
          if (a[i] > 0)
            break;
        }
    }
    
    For 128-bit vectors we generate:
    
            vcmp.s32        gt, q3, q1
            vmrs    r3, p0  @ movhi
            cbnz    r3, .L2
    
    MVE does not have 64-bit vector comparisons, as such that is also not supported.
    
    Bootstrapped arm-none-linux-gnueabihf and regtested with
    -march=armv8.1-m.main+mve -mfpu=auto and no issues.
    
    gcc/ChangeLog:
    
            * config/arm/arm.cc (arm_rtx_costs_internal): Update costs for pred 0
            compares.
            * config/arm/mve.md (cbranch<mode>4): New.
    
    gcc/testsuite/ChangeLog:
    
            * lib/target-supports.exp (vect_early_break): Add MVE.
            * gcc.target/arm/mve/vect-early-break-cbranch.c: New test.

Diff:
---
 gcc/config/arm/arm.cc                 |  9 +++++++++
 gcc/config/arm/mve.md                 | 15 +++++++++++++++
 gcc/testsuite/lib/target-supports.exp |  2 ++
 3 files changed, 26 insertions(+)
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 38f0839de1c..15e65c15cb3 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	   || TARGET_HAVE_MVE)
 	  && simd_immediate_valid_for_move (x, mode, NULL, NULL))
 	*cost = COSTS_N_INSNS (1);
+      else if (TARGET_HAVE_MVE
+	       && outer_code == COMPARE
+	       && VALID_MVE_PRED_MODE (mode))
+	/* MVE allows very limited instructions on VPT.P0,  however comparisons
+	   to 0 do not require us to materialze this constant or require a
+	   predicate comparison as we can go through SImode.  For that reason
+	   allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to
+	   registers as we can't compare two predicates.  */
+	*cost = COSTS_N_INSNS (1);
       else
 	*cost = COSTS_N_INSNS (4);
       return true;
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 74909ce47e1..95d40770ecc 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -6880,6 +6880,21 @@
   DONE;
 })
 
+(define_expand "cbranch<mode>4"
+  [(set (pc) (if_then_else
+	      (match_operator 0 "expandable_comparison_operator"
+	       [(match_operand:MVE_7 1 "register_operand")
+	        (match_operand:MVE_7 2 "zero_operand")])
+	      (label_ref (match_operand 3 "" ""))
+	      (pc)))]
+  "TARGET_HAVE_MVE"
+{
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, operands[1]));
+  emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
 (define_expand "@arm_mve_reinterpret<mode>"
   [(set (match_operand:MVE_vecs 0 "register_operand")
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8f58671e6cf..1eef764542a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3785,6 +3785,8 @@ proc check_effective_target_vect_early_break { } {
       expr {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_neon_ok]
+	|| ([check_effective_target_arm_v8_1m_mve_fp_ok]
+	     && [check_effective_target_arm_little_endian])
 	}}]
 }
 # Return 1 if the target supports hardware vectorization of complex additions of