From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5320 invoked by alias); 31 Jul 2015 08:19:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 5307 invoked by uid 89); 31 Jul 2015 08:19:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 31 Jul 2015 08:19:17 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-21-UZeXSUNwTsO087y0uXWKDg-1; Fri, 31 Jul 2015 09:19:13 +0100 Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 31 Jul 2015 09:19:13 +0100 Message-ID: <55BB2F80.7050605@arm.com> Date: Fri, 31 Jul 2015 08:43:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG References: <55B219A5.8060307@arm.com> In-Reply-To: <55B219A5.8060307@arm.com> X-MC-Unique: UZeXSUNwTsO087y0uXWKDg-1 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-07/txt/msg02624.txt.bz2 Ping. https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02036.html Thanks, Kyrill On 24/07/15 11:55, Kyrill Tkachov wrote: > Hi all, > > This patch implements an aarch64-specific expansion of the signed modulo = by a power of 2. > The proposed sequence makes use of the conditional negate instruction CSN= EG. > For a power of N, x % N can be calculated with: > negs x1, x0 > and x0, x0, #(N - 1) > and x1, x1, #(N - 1) > csneg x0, x0, x1, mi > > So, for N =3D=3D 256 this would be: > negs x1, x0 > and x0, x0, #255 > and x1, x1, #255 > csneg x0, x0, x1, mi > > For comparison, the existing sequence emitted by expand_smod_pow2 in expm= ed.c is: > asr x1, x0, 63 > lsr x1, x1, 56 > add x0, x0, x1 > and x0, x0, 255 > sub x0, x0, x1 > > Note that the CSNEG sequence is one instruction shorter and that the two = and operations > are independent, compared to the existing sequence where all instructions= are dependent > on the preceeding instructions. > > For the special case of N =3D=3D 2 we can do even better: > cmp x0, xzr > and x0, x0, 1 > csneg x0, x0, x0, ge > > I first tried implementing this in the generic code in expmed.c but that = didn't work > out for a few reasons: > > * This relies on having a conditional-negate instruction. We could gate i= t on > HAVE_conditional_move and the combiner is capable of merging the final ne= gate into > the conditional move if a conditional negate is available (like on aarch6= 4) but on > targets without a conditional negate this would end up emitting a separat= e negate. > > * The first negs has to be a negs for the sequence to be a win i.e. havin= g a separate > negate and compare makes the sequence slower than the existing one (at le= ast in my > microbenchmarking) and I couldn't get subsequent passes to combine the ne= gate and combine > into the negs (presumably due to the use of the negated result in one of = the ands). > Doing it in the aarch64 backend where I could just call the exact gen_* f= unctions that > I need worked much more cleanly. > > The costing logic is updated to reflect this sequence during the intialis= ation of > expmed.c where it calculates the smod_pow2_cheap metric. > > The tests will come in patch 3 of the series which are partly shared with= the equivalent > arm implementation. > > Bootstrapped and tested on aarch64. > Ok for trunk? > > Thanks, > Kyrill > > 2015-07-24 Kyrylo Tkachov > > * config/aarch64/aarch64.md (mod3): New define_expand. > (*neg2_compare0): Rename to... > (neg2_compare0): ... This. > * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case): Reflect > CSNEG sequence in MOD by power of 2 case.