From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 5B32C3857C4A for ; Thu, 7 Mar 2024 16:54:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5B32C3857C4A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5B32C3857C4A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709830470; cv=none; b=X7QuvAYwDl6zlfDVglPZX2Qy2OVjNMloWaGGJ8TLRctQn6m4YSZykK334tMYK9hA6X333kNUpxU7nqHZT+bxqmIRXJZO0NJKtwMORleRBEKgj+vms7Daaeosm+3oJSdT3mXba3k+UiZpY6WaNhqFCSsBxO+zGnyWHscWoW804Bo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709830470; c=relaxed/simple; bh=zIaFgQgvwtuLy6eX54eh9e8z6VoUxx9Uq8XuKC7LpX8=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=i8zb4dtjeEpp4t9J1TxpYGhyxmM65SCCQmtHKI+sZR9LqutbzkOhjyFRUfmWsxeiOt+8bWpuGLdrWRkLYminncHJAmR11H+lXY58J7ednhDzMMffxwwdaXKfFtQyq3XgV4iodyDuDZadOOT5AO27i1RMSMu+yQCVhsm8z4l6tMc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F205C1FB; Thu, 7 Mar 2024 08:55:04 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 912A63F762; Thu, 7 Mar 2024 08:54:27 -0800 (PST) From: Richard Sandiford To: Andrew Pinski Mail-Followup-To: Andrew Pinski ,, richard.sandiford@arm.com Cc: Subject: Re: [PATCH] aarch64: Fix costing of manual bfi instructions References: <20240223162513.3632569-1-quic_apinski@quicinc.com> Date: Thu, 07 Mar 2024 16:54:26 +0000 In-Reply-To: <20240223162513.3632569-1-quic_apinski@quicinc.com> (Andrew Pinski's message of "Fri, 23 Feb 2024 08:25:13 -0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-20.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Andrew Pinski writes: > This fixes the cost model for BFI instructions which don't > use directly zero_extract on the LHS. > aarch64_bfi_rtx_p does the heavy lifting by matching of > the patterns. > > Note this alone does not fix PR 107270, it is a step in the right > direction. There we get z zero_extend for the non-shifted part > which we don't currently match. > > Built and tested on aarch64-linux-gnu with no regressions. > > gcc/ChangeLog: > > * config/aarch64/aarch64.cc (aarch64_bfi_rtx_p): New function. > (aarch64_rtx_costs): For IOR, try calling aarch64_bfi_rtx_p. > > Signed-off-by: Andrew Pinski > --- > gcc/config/aarch64/aarch64.cc | 94 +++++++++++++++++++++++++++++++++++ > 1 file changed, 94 insertions(+) > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 3d8341c17fe..dc5c5c23cb3 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -13776,6 +13776,90 @@ aarch64_extr_rtx_p (rtx x, rtx *res_op0, rtx *res_op1) > return false; > } > > +/* Return true iff X is an rtx that will match an bfi instruction > + i.e. as described in the *aarch64_bfi5 family of patterns. > + OP0 and OP1 will be set to the operands of the insert involved > + on success and will be NULL_RTX otherwise. */ > + > +static bool > +aarch64_bfi_rtx_p (rtx x, rtx *res_op0, rtx *res_op1) I think it'd be slightly neater to pass an XEXP index in as well, and use it... > +{ > + rtx op0, op1; > + scalar_int_mode mode; > + > + *res_op0 = NULL_RTX; > + *res_op1 = NULL_RTX; > + if (!is_a (GET_MODE (x), &mode)) > + return false; > + > + if (GET_CODE (x) != IOR) > + return false; > + > + unsigned HOST_WIDE_INT mask1; > + unsigned HOST_WIDE_INT shft_amnt; > + unsigned HOST_WIDE_INT mask2; > + rtx shiftop; > + > + rtx iop0 = XEXP (x, 0); > + rtx iop1 = XEXP (x, 1); ...here as opno and 1 - opno. That way we don't need to... > + > + if (GET_CODE (iop0) == AND > + && CONST_INT_P (XEXP (iop0, 1)) > + && GET_CODE (XEXP (iop0, 0)) != ASHIFT) > + { > + op0 = XEXP (iop0, 0); > + mask1 = UINTVAL (XEXP (iop0, 1)); > + shiftop = iop1; > + } > + else if (GET_CODE (iop1) == AND > + && CONST_INT_P (XEXP (iop1, 1)) > + && GET_CODE (XEXP (iop1, 0)) != ASHIFT) > + { > + op0 = XEXP (iop1, 0); > + mask1 = UINTVAL (XEXP (iop1, 1)); > + shiftop = iop0; > + } > + else > + return false; ...handle this both ways, and don't need to exclude ASHIFT. Maybe some variation on "insert_op" would be better than "shiftop", since the operand might not include a shift. Looks generally good to me otherwise FWIW, but obviously GCC 15 material. Thanks, Richard > + > + /* Shifted with no mask. */ > + if (GET_CODE (shiftop) == ASHIFT > + && CONST_INT_P (XEXP (shiftop, 1))) > + { > + shft_amnt = UINTVAL (XEXP (shiftop, 1)); > + mask2 = HOST_WIDE_INT_M1U << shft_amnt; > + op1 = XEXP (shiftop, 0); > + } > + else if (GET_CODE (shiftop) == AND > + && CONST_INT_P (XEXP (shiftop, 1))) > + { > + mask2 = UINTVAL (XEXP (shiftop, 1)); > + if (GET_CODE (XEXP (shiftop, 0)) == ASHIFT > + && CONST_INT_P (XEXP (XEXP (shiftop, 0), 1))) > + { > + op1 = XEXP (XEXP (shiftop, 0), 0); > + shft_amnt = UINTVAL (XEXP (XEXP (shiftop, 0), 1)); > + } > + else > + { > + op1 = XEXP (shiftop, 0); > + shft_amnt = 0; > + } > + } > + else > + return false; > + > + if (shft_amnt >= GET_MODE_BITSIZE (mode)) > + return false; > + > + if (!aarch64_masks_and_shift_for_bfi_p (mode, mask1, shft_amnt, mask2)) > + return false; > + > + *res_op0 = op0; > + *res_op1 = op1; > + return true; > +} > + > /* Calculate the cost of calculating (if_then_else (OP0) (OP1) (OP2)), > storing it in *COST. Result is true if the total cost of the operation > has now been calculated. */ > @@ -14662,6 +14746,16 @@ cost_plus: > return true; > } > > + if (aarch64_bfi_rtx_p (x, &op0, &op1)) > + { > + *cost += rtx_cost (op0, mode, IOR, 0, speed); > + *cost += rtx_cost (op0, mode, IOR, 1, speed); > + if (speed) > + *cost += extra_cost->alu.bfi; > + > + return true; > + } > + > if (aarch64_extr_rtx_p (x, &op0, &op1)) > { > *cost += rtx_cost (op0, mode, IOR, 0, speed);