From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id D06BA3858D1E for ; Wed, 31 Jan 2024 13:45:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D06BA3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D06BA3858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706708710; cv=none; b=GWqz/EQf0y8dto0b4yi6mj/2G1iJbuzXClRGLrnB9KMMsbHdceGtXND3cWkEU8n/4DStrmJDE+sv5S4bi2c/xWieRBgVpwSbwODKPPuTBuhg52Fo7epwreBNK14yvu0dMOWgJcLhqfmq1Qw49okJTZcuoIni+kEV8b+BIVmgx2c= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706708710; c=relaxed/simple; bh=juuWJDXQXNwZSoqiPY5qVSPeiJitKXHpWE3Le2H5SE8=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=tqwqKgSrNy3EFi7461c1KDFEY+vOv35d3GjuAcxg0UtayZ3oBu04I3BSN26DfhQELMNsARfCkIiSQClm92G5JFdATZlPRWPeHdQkT9FcWEMaL7Im0Pog49Ekx51GHyCrPFRoYq7I0kUpFJcMORLfjk8UhpXMQ9Tg8Rt4V3XA0mE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9E9F1DA7; Wed, 31 Jan 2024 05:45:51 -0800 (PST) Received: from [10.2.78.54] (e120077-lin.cambridge.arm.com [10.2.78.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A12103F738; Wed, 31 Jan 2024 05:45:07 -0800 (PST) Message-ID: <2dc807f6-09cd-4139-939e-ac7c9207f86f@arm.com> Date: Wed, 31 Jan 2024 13:45:06 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops Content-Language: en-GB To: Andre Simoes Dias Vieira , "gcc-patches@gcc.gnu.org" Cc: Kyrylo Tkachov , Stam Markianos-Wright References: <20240119144013.29042-1-andre.simoesdiasvieira@arm.com> <20240119144013.29042-3-andre.simoesdiasvieira@arm.com> From: "Richard Earnshaw (lists)" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3491.8 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 30/01/2024 14:09, Andre Simoes Dias Vieira wrote: > Hi Richard, > > Thanks for the reviews, I'm making these changes but just a heads up. > > When hardcoding LR_REGNUM like this we need to change the way we compare the register in doloop_condition_get. This function currently compares the rtx nodes by address, which I think happens to be fine before we assign hard registers, as I suspect we always share the rtx node for the same pseudo, but when assigning registers it seems like we create copies, so things like: > `XEXP (inc_src, 0) == reg` will fail for > inc_src: (plus (reg LR) (const_int -n)' > reg: (reg LR) > > Instead I will substitute the operand '==' with calls to 'rtx_equal_p (op1, op2, NULL)'. Yes, that's fine. R. > > Sound good? > > Kind regards, > Andre > > ________________________________________ > From: Richard Earnshaw (lists) > Sent: Tuesday, January 30, 2024 11:36 AM > To: Andre Simoes Dias Vieira; gcc-patches@gcc.gnu.org > Cc: Kyrylo Tkachov; Stam Markianos-Wright > Subject: Re: [PATCH v3 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops > > On 19/01/2024 14:40, Andre Vieira wrote: >> >> Respin after comments from Kyrill and rebase. I also removed an if-then-else >> construct in arm_mve_check_reg_origin_is_num_elems similar to the other functions >> Kyrill pointed out. >> >> After an earlier comment from Richard Sandiford I also added comments to the >> two tail predication patterns added to explain the need for the unspecs. > > [missing ChangeLog] > > I'm just going to focus on loop-doloop.c in this reply, I'll respond to the other bits in a follow-up. > > 2) (set (reg) (plus (reg) (const_int -1)) > - (set (pc) (if_then_else (reg != 0) > - (label_ref (label)) > - (pc))). > + (set (pc) (if_then_else (reg != 0) > + (label_ref (label)) > + (pc))). > > Some targets (ARM) do the comparison before the branch, as in the > following form: > > - 3) (parallel [(set (cc) (compare ((plus (reg) (const_int -1), 0))) > - (set (reg) (plus (reg) (const_int -1)))]) > - (set (pc) (if_then_else (cc == NE) > ... > > > This comment is becoming confusing. Really the text leading up to 3)... should be inside 3. Something like: > > 3) Some targets (ARM) do the comparison before the branch, as in the > following form: > > (parallel [(set (cc) (compare (plus (reg) (const_int -1)) 0)) > (set (reg) (plus (reg) (const_int -1)))]) > (set (pc) (if_then_else (cc == NE) > (label_ref (label)) > (pc)))]) > > > The same issue on the comment structure also applies to the new point 4... > > + The ARM target also supports a special case of a counter that decrements > + by `n` and terminating in a GTU condition. In that case, the compare and > + branch are all part of one insn, containing an UNSPEC: > + > + 4) (parallel [ > + (set (pc) > + (if_then_else (gtu (unspec:SI [(plus:SI (reg:SI 14 lr) > + (const_int -n))]) > + (const_int n-1])) > + (label_ref) > + (pc))) > + (set (reg:SI 14 lr) > + (plus:SI (reg:SI 14 lr) > + (const_int -n))) > + */ > > I think this needs a bit more clarification. Specifically that this construct supports a predicated vectorized do loop. Also, the placement of the unspec inside the comparison is ugnly and unnecessary. It should be sufficient to have the unspec inside a USE expression, which the mid-end can then ignore entirely. So > > (parallel > [(set (pc) (if_then_else (gtu (plus (reg) (const_int -n)) > (const_int n-1)) > (label_ref) (pc))) > (set (reg) (plus (reg) (const_int -n))) > (additional clobbers and uses)]) > > For Arm, we then add a (use (unspec [(const_int 0)] N)) that is specific to this pattern to stop anything else from matching it. > > Note that we don't need to mention that the register is 'LR' or the modes, those are specific to a particular backend, not the generic pattern we want to match. > > + || !CONST_INT_P (XEXP (inc_src, 1)) > + || INTVAL (XEXP (inc_src, 1)) >= 0) > return 0; > + int dec_num = abs (INTVAL (XEXP (inc_src, 1))); > > We can just use '-INTVAL(...)' here, we've verified just above that the constant is negative. > > - if ((XEXP (condition, 0) == reg) > + /* For the ARM special case of having a GTU: re-form the condition without > + the unspec for the benefit of the middle-end. */ > + if (GET_CODE (condition) == GTU) > + { > + condition = gen_rtx_fmt_ee (GTU, VOIDmode, inc_src, > + GEN_INT (dec_num - 1)); > + return condition; > + } > > If you make the change I mentioned above, this re-forming isn't needed any more, so the arm-specific comment goes away > > - { > + { > if (GET_CODE (pattern) != PARALLEL) > /* For the second form we expect: > > You've fixed the indentation of the brace (good), but the body of the braced expression needs re-indenting as well. > > R. >