From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 80056 invoked by alias); 3 Aug 2015 13:01:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 80043 invoked by uid 89); 3 Aug 2015 13:01:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-HELO: cam-smtp0.cambridge.arm.com Received: from fw-tnat.cambridge.arm.com (HELO cam-smtp0.cambridge.arm.com) (217.140.96.140) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Mon, 03 Aug 2015 13:01:08 +0000 Received: from arm.com (e106375-lin.cambridge.arm.com [10.2.207.23]) by cam-smtp0.cambridge.arm.com (8.13.8/8.13.8) with ESMTP id t73D14uh016898; Mon, 3 Aug 2015 14:01:04 +0100 Date: Mon, 03 Aug 2015 13:01:00 -0000 From: James Greenhalgh To: Kyrill Tkachov Cc: GCC Patches , Marcus Shawcroft , Richard Earnshaw Subject: Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG Message-ID: <20150803130104.GA13835@arm.com> References: <55B219A5.8060307@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55B219A5.8060307@arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes X-SW-Source: 2015-08/txt/msg00060.txt.bz2 On Fri, Jul 24, 2015 at 11:55:33AM +0100, Kyrill Tkachov wrote: > Hi all, > > This patch implements an aarch64-specific expansion of the signed modulo by a power of 2. > The proposed sequence makes use of the conditional negate instruction CSNEG. > For a power of N, x % N can be calculated with: > negs x1, x0 > and x0, x0, #(N - 1) > and x1, x1, #(N - 1) > csneg x0, x0, x1, mi > > So, for N == 256 this would be: > negs x1, x0 > and x0, x0, #255 > and x1, x1, #255 > csneg x0, x0, x1, mi > > For comparison, the existing sequence emitted by expand_smod_pow2 in expmed.c is: > asr x1, x0, 63 > lsr x1, x1, 56 > add x0, x0, x1 > and x0, x0, 255 > sub x0, x0, x1 > > Note that the CSNEG sequence is one instruction shorter and that the two and operations > are independent, compared to the existing sequence where all instructions are dependent > on the preceeding instructions. > > For the special case of N == 2 we can do even better: > cmp x0, xzr > and x0, x0, 1 > csneg x0, x0, x0, ge > > I first tried implementing this in the generic code in expmed.c but that didn't work > out for a few reasons: > > * This relies on having a conditional-negate instruction. We could gate it on > HAVE_conditional_move and the combiner is capable of merging the final negate into > the conditional move if a conditional negate is available (like on aarch64) but on > targets without a conditional negate this would end up emitting a separate negate. > > * The first negs has to be a negs for the sequence to be a win i.e. having a separate > negate and compare makes the sequence slower than the existing one (at least in my > microbenchmarking) and I couldn't get subsequent passes to combine the negate and combine > into the negs (presumably due to the use of the negated result in one of the ands). > Doing it in the aarch64 backend where I could just call the exact gen_* functions that > I need worked much more cleanly. > > The costing logic is updated to reflect this sequence during the intialisation of > expmed.c where it calculates the smod_pow2_cheap metric. > > The tests will come in patch 3 of the series which are partly shared with the equivalent > arm implementation. > > Bootstrapped and tested on aarch64. > Ok for trunk? > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 9d88a60..7bb4a55 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -6639,8 +6639,26 @@ cost_plus: > if (VECTOR_MODE_P (mode)) > *cost += extra_cost->vect.alu; > else if (GET_MODE_CLASS (mode) == MODE_INT) > - *cost += (extra_cost->mult[mode == DImode].add > - + extra_cost->mult[mode == DImode].idiv); > + { > + /* We can expand signed mod by power of 2 using a > + NEGS, two parallel ANDs and a CSNEG. Assume here > + that CSNEG is COSTS_N_INSNS (1). This case should > + only ever be reached through the set_smod_pow2_cheap check > + in expmed.c. */ > + if (code == MOD > + && CONST_INT_P (XEXP (x, 1)) > + && exact_log2 (INTVAL (XEXP (x, 1))) > 0 > + && (mode == SImode || mode == DImode)) > + { > + *cost += COSTS_N_INSNS (3) > + + 2 * extra_cost->alu.logical > + + extra_cost->alu.arith; > + return true; > + } > + > + *cost += (extra_cost->mult[mode == DImode].add > + + extra_cost->mult[mode == DImode].idiv); > + } > else if (mode == DFmode) > *cost += (extra_cost->fp[1].mult > + extra_cost->fp[1].div); This looks like it calculates the wrong cost for !speed. I think we will still expand through mod3 when compiling for size, so we probably still want to cost the multiple instructions. Have I misunderstood? Thanks, James