From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 130993 invoked by alias); 1 Sep 2015 10:40:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 130981 invoked by uid 89); 1 Sep 2015 10:40:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 01 Sep 2015 10:40:49 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-37-zlMBQK8dS_2-pjHWTNLltQ-1; Tue, 01 Sep 2015 11:40:44 +0100 Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 1 Sep 2015 11:40:44 +0100 Message-ID: <55E580AC.2060200@arm.com> Date: Tue, 01 Sep 2015 10:40:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: James Greenhalgh CC: GCC Patches , Marcus Shawcroft , Richard Earnshaw Subject: Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG References: <55B219A5.8060307@arm.com> <20150803130104.GA13835@arm.com> <55CC8F62.4080604@arm.com> <20150901092501.GA28861@arm.com> In-Reply-To: <20150901092501.GA28861@arm.com> X-MC-Unique: zlMBQK8dS_2-pjHWTNLltQ-1 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-09/txt/msg00026.txt.bz2 Hi James, On 01/09/15 10:25, James Greenhalgh wrote: > On Thu, Aug 13, 2015 at 01:36:50PM +0100, Kyrill Tkachov wrote: > > Some comments below. Thanks, I'll incorporate them, with one clarification inline. > >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index 1394ed7..c8bd8d2 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -6652,6 +6652,25 @@ cost_plus: >> return true; >>=20=20=20 >> case MOD: >> + /* We can expand signed mod by power of 2 using a >> + NEGS, two parallel ANDs and a CSNEG. Assume here >> + that CSNEG is COSTS_N_INSNS (1). This case should > Why do we want to hardcode this assumption rather than parameterise? Even > if you model this as the cost of an unconditional NEG I think that is > better than hardcoding zero cost. > > >> + only ever be reached through the set_smod_pow2_cheap check >> + in expmed.c. */ >> + if (CONST_INT_P (XEXP (x, 1)) >> + && exact_log2 (INTVAL (XEXP (x, 1))) > 0 >> + && (mode =3D=3D SImode || mode =3D=3D DImode)) >> + { >> + *cost +=3D COSTS_N_INSNS (3); > Can you add am comment to make it clear why this is not off-by-one? By > quick inspection it looks like you have made a typo trying to set the > cost to be 3 instructions rather than 4 - a reader needs the extra > knowledge that we already have a COSTS_N_INSNS(1) as a baseline. > > This would be clearer as: > > /* We will expand to four instructions, reset the baseline. */ > *cost =3D COSTS_N_INSNS (4); > >> + >> + if (speed) >> + *cost +=3D 2 * extra_cost->alu.logical >> + + extra_cost->alu.arith; >> + >> + return true; >> + } >> + >> + /* Fall-through. */ >> case UMOD: >> if (speed) >> { >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.= md >> index b7b04c4..a515573 100644 >> --- a/gcc/config/aarch64/aarch64.md >> +++ b/gcc/config/aarch64/aarch64.md >> @@ -302,6 +302,62 @@ (define_expand "cmp" >> } >> ) >>=20=20=20 >> +;; AArch64-specific expansion of signed mod by power of 2 using CSNEG. > Seems a strange comment given that we are in aarch64.md :-). > >> +;; For x0 % n where n is a power of 2 produce: >> +;; negs x1, x0 >> +;; and x0, x0, #(n - 1) >> +;; and x1, x1, #(n - 1) >> +;; csneg x0, x0, x1, mi >> + >> +(define_expand "mod3" >> + [(match_operand:GPI 0 "register_operand" "") >> + (match_operand:GPI 1 "register_operand" "") >> + (match_operand:GPI 2 "const_int_operand" "")] >> + "" >> + { >> + HOST_WIDE_INT val =3D INTVAL (operands[2]); >> + >> + if (val <=3D 0 >> + || exact_log2 (INTVAL (operands[2])) <=3D 0 >> + || !aarch64_bitmask_imm (INTVAL (operands[2]) - 1, mode)) >> + FAIL; >> + >> + rtx mask =3D GEN_INT (val - 1); >> + >> + /* In the special case of x0 % 2 we can do the even shorter: >> + cmp x0, xzr >> + and x0, x0, 1 >> + csneg x0, x0, x0, ge. */ >> + if (val =3D=3D 2) >> + { >> + rtx masked =3D gen_reg_rtx (mode); >> + rtx ccreg =3D aarch64_gen_compare_reg (LT, operands[1], const0_rtx); > Non-obvious why this is correct given the comment above saying we want GE. We want to negate if the comparison earlier yielded "less than zero". Unfortunately, the CSNEG form of that is written as csneg x0, x0, x0, ge which looks counter-intuitive at first glance. With my other patch posted at https://gcc.gnu.org/ml/gcc-patches/2015-09/ms= g00020.html the generated assembly would be: cneg x0, x0, lt which more closely matches the RTL generation in this hunk. If you think that patch should go in, I'll rewrite the CSNEG form in this c= omment to CNEG. Kyrill > >> + emit_insn (gen_and3 (masked, operands[1], mask)); >> + rtx x =3D gen_rtx_LT (VOIDmode, ccreg, const0_rtx); >> + emit_insn (gen_csneg3_insn (operands[0], x, masked, masked)); >> + DONE; >> + } >> + >> + rtx neg_op =3D gen_reg_rtx (mode); >> + rtx_insn *insn =3D emit_insn (gen_neg2_compare0 (neg_op, oper= ands[1])); >> + >> + /* Extract the condition register and mode. */ >> + rtx cmp =3D XVECEXP (PATTERN (insn), 0, 0); >> + rtx cc_reg =3D SET_DEST (cmp); >> + rtx cond =3D gen_rtx_GE (VOIDmode, cc_reg, const0_rtx); >> + >> + rtx masked_pos =3D gen_reg_rtx (mode); >> + emit_insn (gen_and3 (masked_pos, operands[1], mask)); >> + >> + rtx masked_neg =3D gen_reg_rtx (mode); >> + emit_insn (gen_and3 (masked_neg, neg_op, mask)); >> + >> + emit_insn (gen_csneg3_insn (operands[0], cond, >> + masked_neg, masked_pos)); >> + DONE; >> + } >> +) >> + > Thanks, > James >