From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 124869 invoked by alias); 31 Jul 2015 08:50:49 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 124119 invoked by uid 89); 31 Jul 2015 08:50:48 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-lb0-f173.google.com Received: from mail-lb0-f173.google.com (HELO mail-lb0-f173.google.com) (209.85.217.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 31 Jul 2015 08:50:47 +0000 Received: by lbbud7 with SMTP id ud7so38239921lbb.3 for ; Fri, 31 Jul 2015 01:50:43 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.112.13.9 with SMTP id d9mr1653714lbc.57.1438332643807; Fri, 31 Jul 2015 01:50:43 -0700 (PDT) Received: by 10.25.42.18 with HTTP; Fri, 31 Jul 2015 01:50:43 -0700 (PDT) In-Reply-To: <55B219A5.8060307@arm.com> References: <55B219A5.8060307@arm.com> Date: Fri, 31 Jul 2015 09:26:00 -0000 Message-ID: Subject: Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG From: Andrew Pinski To: Kyrill Tkachov Cc: GCC Patches , Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-07/txt/msg02628.txt.bz2 On Fri, Jul 24, 2015 at 3:55 AM, Kyrill Tkachov wrote: > Hi all, > > This patch implements an aarch64-specific expansion of the signed modulo by > a power of 2. > The proposed sequence makes use of the conditional negate instruction CSNEG. > For a power of N, x % N can be calculated with: > negs x1, x0 > and x0, x0, #(N - 1) > and x1, x1, #(N - 1) > csneg x0, x0, x1, mi > > So, for N == 256 this would be: > negs x1, x0 > and x0, x0, #255 > and x1, x1, #255 > csneg x0, x0, x1, mi > > For comparison, the existing sequence emitted by expand_smod_pow2 in > expmed.c is: > asr x1, x0, 63 > lsr x1, x1, 56 > add x0, x0, x1 > and x0, x0, 255 > sub x0, x0, x1 > > Note that the CSNEG sequence is one instruction shorter and that the two and > operations > are independent, compared to the existing sequence where all instructions > are dependent > on the preceeding instructions. Just FYI. For ThunderX, this is a size win and a performance win at least in a microbenchmark. > > For the special case of N == 2 we can do even better: > cmp x0, xzr > and x0, x0, 1 > csneg x0, x0, x0, ge This is a size win and a performance win on ThunderX. > > I first tried implementing this in the generic code in expmed.c but that > didn't work > out for a few reasons: > > * This relies on having a conditional-negate instruction. We could gate it > on > HAVE_conditional_move and the combiner is capable of merging the final > negate into > the conditional move if a conditional negate is available (like on aarch64) > but on > targets without a conditional negate this would end up emitting a separate > negate. > > * The first negs has to be a negs for the sequence to be a win i.e. having a > separate > negate and compare makes the sequence slower than the existing one (at least > in my > microbenchmarking) and I couldn't get subsequent passes to combine the > negate and combine > into the negs (presumably due to the use of the negated result in one of the > ands). > Doing it in the aarch64 backend where I could just call the exact gen_* > functions that > I need worked much more cleanly. I agree this does make it harder to implement in a target generic way. Thanks, Andrew > > The costing logic is updated to reflect this sequence during the > intialisation of > expmed.c where it calculates the smod_pow2_cheap metric. > > The tests will come in patch 3 of the series which are partly shared with > the equivalent > arm implementation. > > Bootstrapped and tested on aarch64. > Ok for trunk? > > Thanks, > Kyrill > > 2015-07-24 Kyrylo Tkachov > > * config/aarch64/aarch64.md (mod3): New define_expand. > (*neg2_compare0): Rename to... > (neg2_compare0): ... This. > * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case): Reflect > CSNEG sequence in MOD by power of 2 case.