From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 7A5803858C50; Thu, 30 May 2024 20:12:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7A5803858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7A5803858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717099949; cv=none; b=ZgeikrQwT9DWUsFGVS1dZ2lw8/HdoR5BkIh7yELZPwR67DhymLVPuXZcguT7k3/UbK5d40OHZpXNILWB6TWRfDtS4uDo2t3Il1Jv4RTsOcbvmbmYP2MxRr0JcByxHSkt6zO5ekeLQDXNWJmhqqqOIIyKwAG8wimkM7aFhI1uRoo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717099949; c=relaxed/simple; bh=HNRyEkiKvUEgQZOy4NqrUeYh6ET0WFZz7q0Eq4gRVzs=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=bAD7nJsDLRkUSadC+RbvTPA8wBjEZ8xebXNdfAj3EpqXo6ygejSIEmoGF/QvJOCvYIr4tyuYs50a1tyrox2e1bDTUWkEA2f4dHTPSE6T88hd5J/Y7+apouvoRRTgIa9mwfUGPs0AuZ61Lkqfu8sMzTqpgVqwI8R3gJLKcQOT/BQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 478211424; Thu, 30 May 2024 13:12:51 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EEF403F641; Thu, 30 May 2024 13:12:25 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina ,"gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , "ktkachov\@gcc.gnu.org" , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , "ktkachov\@gcc.gnu.org" Subject: Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns References: Date: Thu, 30 May 2024 21:12:24 +0100 In-Reply-To: (Tamar Christina's message of "Tue, 28 May 2024 09:38:42 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-13.9 required=5.0 tests=BAYES_00,KAM_ASCII_DIVIDERS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Tamar Christina writes: > [...] > @@ -6651,8 +6661,10 @@ (define_insn "and3" > (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") > (match_operand:PRED_ALL 2 "register_operand")))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 ] > - [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b > + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b > + [ ?Upa , 0 , Upa; yes ] ^ > + [ Upa , Upa, Upa; no ] ^ I think this ought to be: > + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa ; yes ] and\t%0.b, %1/z, %2.b, %2.b > + [ ?Upa , 0Upa, 0Upa; yes ] ^ > + [ Upa , Upa, Upa ; no ] ^ so that operand 2 can be tied to operand 0 in the worst case. Similarly: > } > ) > > @@ -6679,8 +6691,10 @@ (define_insn "@aarch64_pred__z" > (match_operand:PRED_ALL 3 "register_operand")) > (match_operand:PRED_ALL 1 "register_operand")))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] \t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] \t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , 0 , Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; no ] ^ > } > ) this would be: {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] [ &Upa , Upa , Upa , Upa ; yes ] \t%0.b, %1/z, %2.b, %3.b [ ?Upa , 0Upa, 0Upa, 0Upa; yes ] ^ [ Upa , Upa , Upa, Upa ; no ] ^ } Same idea for the rest. I tried this on: ---------------------------------------------------------------------- #include void use (svbool_t, svbool_t, svbool_t); void f1 (svbool_t p0, svbool_t p1, svbool_t p2, int n, svbool_t *ptr) { while (n--) p2 = svand_z (p0, p1, p2); *ptr = p2; } void f2 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr) { *ptr = svand_z (p0, p1, p2); } void f3 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr) { use (svand_z (p0, p1, p2), p1, p2); } void f4 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr) { use (p0, svand_z (p0, p1, p2), p2); } void f5 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr) { use (p0, p1, svand_z (p0, p1, p2)); } ---------------------------------------------------------------------- and it seemed to produce the right output: ---------------------------------------------------------------------- f1: cbz w0, .L2 sub w0, w0, #1 .p2align 5,,15 .L3: and p2.b, p0/z, p1.b, p2.b sub w0, w0, #1 cmn w0, #1 bne .L3 .L2: str p2, [x1] ret f2: and p3.b, p0/z, p1.b, p2.b str p3, [x0] ret f3: and p0.b, p0/z, p1.b, p2.b b use f4: and p1.b, p0/z, p1.b, p2.b b use f5: and p2.b, p0/z, p1.b, p2.b b use ---------------------------------------------------------------------- (with that coming directly from RA, rather than being cleaned up later) > [...] > @@ -10046,8 +10104,10 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" > (match_dup 3)] > UNSPEC_BRKN))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b > + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b > + [ ?Upa , 0 , Upa, 0; yes ] ^ > + [ Upa , Upa, Upa, 0; no ] ^ > } > "&& (operands[4] != CONST0_RTX (VNx16BImode) > || operands[5] != CONST0_RTX (VNx16BImode))" Probably best to leave this out. All alternatives require operand 3 to match operand 0. So operands 1 and 2 will only match operand 0 if they're the same as operand 3. In that case it'd be better to allow the sharing rather than force the same value to be stored in two registers. That is, if op1 != op3 && op2 != op3 then we get what we want naturally, regardless of tuning. The same thing would apply to the BRKN instances of : > @@ -10020,8 +10076,10 @@ (define_insn "@aarch64_brk" > (match_operand:VNx16BI 3 "register_operand")] > SVE_BRK_BINARY))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, ] brk\t%0.b, %1/z, %2.b, %.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, ; yes ] brk\t%0.b, %1/z, %2.b, %.b > + [ ?Upa , 0 , Upa, ; yes ] ^ > + [ Upa , Upa, Upa, ; no ] ^ > } > ) but I think we should keep this factoring/abstraction and just add the extra alternatives regardless. I.e.: {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] [ &Upa , Upa , Upa , ; yes ] brk\t%0.b, %1/z, %2.b, %.b [ ?Upa , 0Upa, 0Upa, 0; yes ] ^ [ Upa , Upa , Upa , ; no ] ^ (even though this gives "00", which is valid but redundant). OK with those changes, thanks. Richard