From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 77415 invoked by alias); 18 Sep 2015 14:14:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 77404 invoked by uid 89); 18 Sep 2015 14:14:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: cam-smtp0.cambridge.arm.com Received: from fw-tnat.cambridge.arm.com (HELO cam-smtp0.cambridge.arm.com) (217.140.96.140) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Fri, 18 Sep 2015 14:14:23 +0000 Received: from arm.com (e107456-lin.cambridge.arm.com [10.2.207.14]) by cam-smtp0.cambridge.arm.com (8.13.8/8.13.8) with ESMTP id t8IEEJpM024144; Fri, 18 Sep 2015 15:14:19 +0100 Date: Fri, 18 Sep 2015 14:16:00 -0000 From: James Greenhalgh To: Wilco Dijkstra Cc: "'GCC Patches'" Subject: Re: [PATCH][AArch64][1/5] Improve immediate generation Message-ID: <20150918141419.GA16108@arm.com> References: <000901d0e57b$c1767b50$446371f0$@com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000901d0e57b$c1767b50$446371f0$@com> User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes X-SW-Source: 2015-09/txt/msg01417.txt.bz2 On Wed, Sep 02, 2015 at 01:34:48PM +0100, Wilco Dijkstra wrote: > This patch reimplements aarch64_bitmask_imm using bitwise arithmetic rather > than a slow binary search. The algorithm searches for a sequence of set bits. > If there are no more set bits and not all bits are set, it is a valid mask. > Otherwise it determines the distance to the next set bit and checks the mask > is repeated across the full 64 bits. Native performance is 5-6x faster on > typical queries. > > No change in generated code, passes GCC regression/bootstrap. OK. Thanks, James > ChangeLog: > 2015-09-02 Wilco Dijkstra > > * gcc/config/aarch64/aarch64.c (aarch64_bitmask_imm): > Reimplement using faster algorithm. > > --- > gcc/config/aarch64/aarch64.c | 62 +++++++++++++++++++++++++++++++++++++------- > 1 file changed, 53 insertions(+), 9 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index c666dce..ba1b77e 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -3301,19 +3301,63 @@ aarch64_movw_imm (HOST_WIDE_INT val, machine_mode mode) > || (val & (((HOST_WIDE_INT) 0xffff) << 16)) == val); > } > > +/* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2. */ > + > +static const unsigned HOST_WIDE_INT bitmask_imm_mul[] = > + { > + 0x0000000100000001ull, > + 0x0001000100010001ull, > + 0x0101010101010101ull, > + 0x1111111111111111ull, > + 0x5555555555555555ull, > + }; > + > > /* Return true if val is a valid bitmask immediate. */ > + > bool > -aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode mode) > +aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode) > { > - if (GET_MODE_SIZE (mode) < 8) > - { > - /* Replicate bit pattern. */ > - val &= (HOST_WIDE_INT) 0xffffffff; > - val |= val << 32; > - } > - return bsearch (&val, aarch64_bitmasks, AARCH64_NUM_BITMASKS, > - sizeof (aarch64_bitmasks[0]), aarch64_bitmasks_cmp) != NULL; > + unsigned HOST_WIDE_INT val, tmp, mask, first_one, next_one; > + int bits; > + > + /* Check for a single sequence of one bits and return quickly if so. > + The special cases of all ones and all zeroes returns false. */ > + val = (unsigned HOST_WIDE_INT) val_in; > + tmp = val + (val & -val); > + > + if (tmp == (tmp & -tmp)) > + return (val + 1) > 1; > + > + /* Replicate 32-bit immediates so we can treat them as 64-bit. */ > + if (mode == SImode) > + val = (val << 32) | (val & 0xffffffff); > + > + /* Invert if the immediate doesn't start with a zero bit - this means we > + only need to search for sequences of one bits. */ > + if (val & 1) > + val = ~val; > + > + /* Find the first set bit and set tmp to val with the first sequence of one > + bits removed. Return success if there is a single sequence of ones. */ > + first_one = val & -val; > + tmp = val & (val + first_one); > + > + if (tmp == 0) > + return true; > + > + /* Find the next set bit and compute the difference in bit position. */ > + next_one = tmp & -tmp; > + bits = clz_hwi (first_one) - clz_hwi (next_one); > + mask = val ^ tmp; > + > + /* Check the bit position difference is a power of 2, and that the first > + sequence of one bits fits within 'bits' bits. */ > + if ((mask >> bits) != 0 || bits != (bits & -bits)) > + return false; > + > + /* Check the sequence of one bits is repeated 64/bits times. */ > + return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26]; > } > > > -- > 1.8.3 > >