From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 47867 invoked by alias); 2 Sep 2015 12:35:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 46638 invoked by uid 89); 2 Sep 2015 12:35:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 02 Sep 2015 12:34:56 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-16-TDfa8dddTvuJSJ0u43on-A-1; Wed, 02 Sep 2015 13:34:51 +0100 Received: from e103246vm ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 2 Sep 2015 13:34:50 +0100 From: "Wilco Dijkstra" To: "'GCC Patches'" Subject: [PATCH][AArch64][1/5] Improve immediate generation Date: Wed, 02 Sep 2015 12:35:00 -0000 Message-ID: <000901d0e57b$c1767b50$446371f0$@com> MIME-Version: 1.0 X-MC-Unique: TDfa8dddTvuJSJ0u43on-A-1 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2015-09/txt/msg00138.txt.bz2 This patch reimplements aarch64_bitmask_imm using bitwise arithmetic rather= than a slow binary search. The algorithm searches for a sequence of set bits. If there are no = more set bits and not all bits are set, it is a valid mask. Otherwise it determines the distance to t= he next set bit and checks the mask is repeated across the full 64 bits. Native performance is = 5-6x faster on typical queries. No change in generated code, passes GCC regression/bootstrap. ChangeLog: 2015-09-02 Wilco Dijkstra * gcc/config/aarch64/aarch64.c (aarch64_bitmask_imm): Reimplement using faster algorithm. --- gcc/config/aarch64/aarch64.c | 62 +++++++++++++++++++++++++++++++++++++---= ---- 1 file changed, 53 insertions(+), 9 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index c666dce..ba1b77e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3301,19 +3301,63 @@ aarch64_movw_imm (HOST_WIDE_INT val, machine_mode m= ode) || (val & (((HOST_WIDE_INT) 0xffff) << 16)) =3D=3D val); } =20 +/* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2. */ + +static const unsigned HOST_WIDE_INT bitmask_imm_mul[] =3D + { + 0x0000000100000001ull, + 0x0001000100010001ull, + 0x0101010101010101ull, + 0x1111111111111111ull, + 0x5555555555555555ull, + }; + =20 /* Return true if val is a valid bitmask immediate. */ + bool -aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode mode) +aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode) { - if (GET_MODE_SIZE (mode) < 8) - { - /* Replicate bit pattern. */ - val &=3D (HOST_WIDE_INT) 0xffffffff; - val |=3D val << 32; - } - return bsearch (&val, aarch64_bitmasks, AARCH64_NUM_BITMASKS, - sizeof (aarch64_bitmasks[0]), aarch64_bitmasks_cmp) !=3D NULL; + unsigned HOST_WIDE_INT val, tmp, mask, first_one, next_one; + int bits; + + /* Check for a single sequence of one bits and return quickly if so. + The special cases of all ones and all zeroes returns false. */ + val =3D (unsigned HOST_WIDE_INT) val_in; + tmp =3D val + (val & -val); + + if (tmp =3D=3D (tmp & -tmp)) + return (val + 1) > 1; + + /* Replicate 32-bit immediates so we can treat them as 64-bit. */ + if (mode =3D=3D SImode) + val =3D (val << 32) | (val & 0xffffffff); + + /* Invert if the immediate doesn't start with a zero bit - this means we + only need to search for sequences of one bits. */ + if (val & 1) + val =3D ~val; + + /* Find the first set bit and set tmp to val with the first sequence of = one + bits removed. Return success if there is a single sequence of ones. = */ + first_one =3D val & -val; + tmp =3D val & (val + first_one); + + if (tmp =3D=3D 0) + return true; + + /* Find the next set bit and compute the difference in bit position. */ + next_one =3D tmp & -tmp; + bits =3D clz_hwi (first_one) - clz_hwi (next_one); + mask =3D val ^ tmp; + + /* Check the bit position difference is a power of 2, and that the first + sequence of one bits fits within 'bits' bits. */ + if ((mask >> bits) !=3D 0 || bits !=3D (bits & -bits)) + return false; + + /* Check the sequence of one bits is repeated 64/bits times. */ + return val =3D=3D mask * bitmask_imm_mul[__builtin_clz (bits) - 26]; } =20 =20 --=20 1.8.3