From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 98920 invoked by alias); 24 Jul 2015 10:55:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 98906 invoked by uid 89); 24 Jul 2015 10:55:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 24 Jul 2015 10:55:38 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-21-NLwINEFzTB2yTQiELUsRsg-1; Fri, 24 Jul 2015 11:55:33 +0100 Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 24 Jul 2015 11:55:33 +0100 Message-ID: <55B219A5.8060307@arm.com> Date: Fri, 24 Jul 2015 10:55:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG X-MC-Unique: NLwINEFzTB2yTQiELUsRsg-1 Content-Type: multipart/mixed; boundary="------------010008080400030506080408" X-IsSubscribed: yes X-SW-Source: 2015-07/txt/msg02038.txt.bz2 This is a multi-part message in MIME format. --------------010008080400030506080408 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-length: 2494 Hi all, This patch implements an aarch64-specific expansion of the signed modulo by= a power of 2. The proposed sequence makes use of the conditional negate instruction CSNEG. For a power of N, x % N can be calculated with: negs x1, x0 and x0, x0, #(N - 1) and x1, x1, #(N - 1) csneg x0, x0, x1, mi So, for N =3D=3D 256 this would be: negs x1, x0 and x0, x0, #255 and x1, x1, #255 csneg x0, x0, x1, mi For comparison, the existing sequence emitted by expand_smod_pow2 in expmed= .c is: asr x1, x0, 63 lsr x1, x1, 56 add x0, x0, x1 and x0, x0, 255 sub x0, x0, x1 Note that the CSNEG sequence is one instruction shorter and that the two an= d operations are independent, compared to the existing sequence where all instructions a= re dependent on the preceeding instructions. For the special case of N =3D=3D 2 we can do even better: cmp x0, xzr and x0, x0, 1 csneg x0, x0, x0, ge I first tried implementing this in the generic code in expmed.c but that di= dn't work out for a few reasons: * This relies on having a conditional-negate instruction. We could gate it = on HAVE_conditional_move and the combiner is capable of merging the final nega= te into the conditional move if a conditional negate is available (like on aarch64)= but on targets without a conditional negate this would end up emitting a separate = negate. * The first negs has to be a negs for the sequence to be a win i.e. having = a separate negate and compare makes the sequence slower than the existing one (at leas= t in my microbenchmarking) and I couldn't get subsequent passes to combine the nega= te and combine into the negs (presumably due to the use of the negated result in one of th= e ands). Doing it in the aarch64 backend where I could just call the exact gen_* fun= ctions that I need worked much more cleanly. The costing logic is updated to reflect this sequence during the intialisat= ion of expmed.c where it calculates the smod_pow2_cheap metric. The tests will come in patch 3 of the series which are partly shared with t= he equivalent arm implementation. Bootstrapped and tested on aarch64. Ok for trunk? Thanks, Kyrill 2015-07-24 Kyrylo Tkachov * config/aarch64/aarch64.md (mod3): New define_expand. (*neg2_compare0): Rename to... (neg2_compare0): ... This. * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case): Reflect CSNEG sequence in MOD by power of 2 case. --------------010008080400030506080408 Content-Type: text/x-patch; name=aarch64-mod-2.patch Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="aarch64-mod-2.patch" Content-length: 3930 commit 428cbef107b559f67006fe84b4522210902b101f Author: Kyrylo Tkachov Date: Wed Jul 15 17:01:13 2015 +0100 [AArch64][1/3] Expand signed mod by power of 2 using CSNEG diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9d88a60..7bb4a55 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6639,8 +6639,26 @@ cost_plus: if (VECTOR_MODE_P (mode)) *cost +=3D extra_cost->vect.alu; else if (GET_MODE_CLASS (mode) =3D=3D MODE_INT) - *cost +=3D (extra_cost->mult[mode =3D=3D DImode].add - + extra_cost->mult[mode =3D=3D DImode].idiv); + { + /* We can expand signed mod by power of 2 using a + NEGS, two parallel ANDs and a CSNEG. Assume here + that CSNEG is COSTS_N_INSNS (1). This case should + only ever be reached through the set_smod_pow2_cheap check + in expmed.c. */ + if (code =3D=3D MOD + && CONST_INT_P (XEXP (x, 1)) + && exact_log2 (INTVAL (XEXP (x, 1))) > 0 + && (mode =3D=3D SImode || mode =3D=3D DImode)) + { + *cost +=3D COSTS_N_INSNS (3) + + 2 * extra_cost->alu.logical + + extra_cost->alu.arith; + return true; + } + + *cost +=3D (extra_cost->mult[mode =3D=3D DImode].add + + extra_cost->mult[mode =3D=3D DImode].idiv); + } else if (mode =3D=3D DFmode) *cost +=3D (extra_cost->fp[1].mult + extra_cost->fp[1].div); diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f264534..3aaf407 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -302,6 +302,62 @@ (define_expand "cmp" } ) =20 +;; AArch64-specific expansion of signed mod by power of 2 using CSNEG. +;; For x0 % n where n is a power of 2 produce: +;; negs x1, x0 +;; and x0, x0, #(n - 1) +;; and x1, x1, #(n - 1) +;; csneg x0, x0, x1, mi + +(define_expand "mod3" + [(match_operand:GPI 0 "register_operand" "") + (match_operand:GPI 1 "register_operand" "") + (match_operand:GPI 2 "const_int_operand" "")] + "" + { + HOST_WIDE_INT val =3D INTVAL (operands[2]); + + if (val <=3D 0 + || exact_log2 (INTVAL (operands[2])) <=3D 0 + || !aarch64_bitmask_imm (INTVAL (operands[2]) - 1, mode)) + FAIL; + + rtx mask =3D GEN_INT (val - 1); + + /* In the special case of x0 % 2 we can do the even shorter: + cmp x0, xzr + and x0, x0, 1 + csneg x0, x0, x0, ge. */ + if (val =3D=3D 2) + { + rtx masked =3D gen_reg_rtx (mode); + rtx ccreg =3D aarch64_gen_compare_reg (LT, operands[1], const0_rtx); + emit_insn (gen_and3 (masked, operands[1], mask)); + rtx x =3D gen_rtx_LT (VOIDmode, ccreg, const0_rtx); + emit_insn (gen_csneg3_insn (operands[0], x, masked, masked)); + DONE; + } + + rtx neg_op =3D gen_reg_rtx (mode); + rtx_insn *insn =3D emit_insn (gen_neg2_compare0 (neg_op, operand= s[1])); + + /* Extract the condition register and mode. */ + rtx cmp =3D XVECEXP (PATTERN (insn), 0, 0); + rtx cc_reg =3D SET_DEST (cmp); + rtx cond =3D gen_rtx_GE (VOIDmode, cc_reg, const0_rtx); + + rtx masked_pos =3D gen_reg_rtx (mode); + emit_insn (gen_and3 (masked_pos, operands[1], mask)); + + rtx masked_neg =3D gen_reg_rtx (mode); + emit_insn (gen_and3 (masked_neg, neg_op, mask)); + + emit_insn (gen_csneg3_insn (operands[0], cond, + masked_neg, masked_pos)); + DONE; + } +) + (define_insn "*condjump" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand 1 "cc_register" "") (const_int 0)]) @@ -2372,7 +2428,7 @@ (define_insn "*ngcsi_uxtw" [(set_attr "type" "adc_reg")] ) =20 -(define_insn "*neg2_compare0" +(define_insn "neg2_compare0" [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ (neg:GPI (match_operand:GPI 1 "register_operand" "r")) (const_int 0))) --------------010008080400030506080408--