From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A9BFD385842D for ; Thu, 14 Dec 2023 19:34:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A9BFD385842D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A9BFD385842D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702582458; cv=none; b=m4jVGwSyMs37G0lCCoRTrTuSKNuQ2BFoJmUySko781ifOUPdyhLJvb5Si0LQNAnryhMPaCkFjJ1ddrCwy/+pN2yS8hrJef1z+sdZuuAdWQsfs9Rxnt+svCfl7VKp6IeVTJGluv+bj37/hfSa8rgKSHt2lfnTT49XcAyrzqKIC8s= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702582458; c=relaxed/simple; bh=hgh0wnY9W8nPMpzxb9MFhOXDrjbxJwNnsUXHI7RGBVc=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=nxCA2UAjDgp/ujWJW78BZAgcfAAR1KUmKrIZYxdFHT3A54CnFtgvKgPRaPD752ArSAXirOcS8LxavVCPNQ0Jwemk9qZZCwQURb91iExzDeMyIeTXU9zR6WSQkV8e9lgcYcYDy1FMSYn4iMIwH8JZS/xLXY8HkzmX6gHhlg83BuY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A884EC15; Thu, 14 Dec 2023 11:35:01 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4A2253F738; Thu, 14 Dec 2023 11:34:15 -0800 (PST) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina ,"gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov Subject: Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD References: Date: Thu, 14 Dec 2023 19:34:14 +0000 In-Reply-To: (Tamar Christina's message of "Thu, 14 Dec 2023 18:40:49 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-21.7 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Tamar Christina writes: >> I see you've changed it from: >> >> + rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); >> + rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); >> + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); >> >> to: >> >> + emit_jump_insn (gen_cbranchdi4 (operands[0], val, CONST0_RTX (DImode), >> + operands[3])); >> >> Was that to fix a specific problem? The original looked OK to me >> for that part (it was the vector comparison that I was asking about). >> > > No,It was to be more consistent with the Arm and MVE patch. > > Note that I may update the tests to disable scheduling. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md (cbranch4): New. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vect-early-break-cbranch.c: New test. > > --- inline copy of patch --- > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > index c6f2d5828373f2a5272b9d1227bfe34365f9fd09..309ec9535294d6e9cdc530f71d9fe38bb916c966 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -3911,6 +3911,45 @@ (define_expand "vcond_mask_" > DONE; > }) > > +;; Patterns comparing two vectors and conditionally jump > + > +(define_expand "cbranch4" > + [(set (pc) > + (if_then_else > + (match_operator 0 "aarch64_equality_operator" > + [(match_operand:VDQ_I 1 "register_operand") > + (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")]) > + (label_ref (match_operand 3 "")) > + (pc)))] > + "TARGET_SIMD" > +{ > + auto code = GET_CODE (operands[0]); > + rtx tmp = operands[1]; > + > + /* If comparing against a non-zero vector we have to do a comparison first ...an EOR first (or XOR) OK with that change, thanks. Richard > + so we can have a != 0 comparison with the result. */ > + if (operands[2] != CONST0_RTX (mode)) > + { > + tmp = gen_reg_rtx (mode); > + emit_insn (gen_xor3 (tmp, operands[1], operands[2])); > + } > + > + /* For 64-bit vectors we need no reductions. */ > + if (known_eq (128, GET_MODE_BITSIZE (mode))) > + { > + /* Always reduce using a V4SI. */ > + rtx reduc = gen_lowpart (V4SImode, tmp); > + rtx res = gen_reg_rtx (V4SImode); > + emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); > + emit_move_insn (tmp, gen_lowpart (mode, res)); > + } > + > + rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); > + rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); > + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); > + DONE; > +}) > + > ;; Patterns comparing two vectors to produce a mask. > > (define_expand "vec_cmp" > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c > new file mode 100644 > index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c > @@ -0,0 +1,124 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ > + > +#pragma GCC target "+nosve" > + > +#define N 640 > +int a[N] = {0}; > +int b[N] = {0}; > + > + > +/* > +** f1: > +** ... > +** cmgt v[0-9]+.4s, v[0-9]+.4s, #0 > +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > +** fmov x[0-9]+, d[0-9]+ > +** cbnz x[0-9]+, \.L[0-9]+ > +** ... > +*/ > +void f1 () > +{ > + for (int i = 0; i < N; i++) > + { > + b[i] += a[i]; > + if (a[i] > 0) > + break; > + } > +} > + > +/* > +** f2: > +** ... > +** cmge v[0-9]+.4s, v[0-9]+.4s, #0 > +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > +** fmov x[0-9]+, d[0-9]+ > +** cbnz x[0-9]+, \.L[0-9]+ > +** ... > +*/ > +void f2 () > +{ > + for (int i = 0; i < N; i++) > + { > + b[i] += a[i]; > + if (a[i] >= 0) > + break; > + } > +} > + > +/* > +** f3: > +** ... > +** cmeq v[0-9]+.4s, v[0-9]+.4s, #0 > +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > +** fmov x[0-9]+, d[0-9]+ > +** cbnz x[0-9]+, \.L[0-9]+ > +** ... > +*/ > +void f3 () > +{ > + for (int i = 0; i < N; i++) > + { > + b[i] += a[i]; > + if (a[i] == 0) > + break; > + } > +} > + > +/* > +** f4: > +** ... > +** cmtst v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > +** fmov x[0-9]+, d[0-9]+ > +** cbnz x[0-9]+, \.L[0-9]+ > +** ... > +*/ > +void f4 () > +{ > + for (int i = 0; i < N; i++) > + { > + b[i] += a[i]; > + if (a[i] != 0) > + break; > + } > +} > + > +/* > +** f5: > +** ... > +** cmlt v[0-9]+.4s, v[0-9]+.4s, #0 > +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > +** fmov x[0-9]+, d[0-9]+ > +** cbnz x[0-9]+, \.L[0-9]+ > +** ... > +*/ > +void f5 () > +{ > + for (int i = 0; i < N; i++) > + { > + b[i] += a[i]; > + if (a[i] < 0) > + break; > + } > +} > + > +/* > +** f6: > +** ... > +** cmle v[0-9]+.4s, v[0-9]+.4s, #0 > +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > +** fmov x[0-9]+, d[0-9]+ > +** cbnz x[0-9]+, \.L[0-9]+ > +** ... > +*/ > +void f6 () > +{ > + for (int i = 0; i < N; i++) > + { > + b[i] += a[i]; > + if (a[i] <= 0) > + break; > + } > +}