From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-406513-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 47867 invoked by alias); 2 Sep 2015 12:35:02 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 46638 invoked by uid 89); 2 Sep 2015 12:35:01 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 02 Sep 2015 12:34:56 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-16-TDfa8dddTvuJSJ0u43on-A-1; Wed, 02 Sep 2015 13:34:51 +0100
Received: from e103246vm ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Wed, 2 Sep 2015 13:34:50 +0100
From: "Wilco Dijkstra" <wdijkstr@arm.com>
To: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: [PATCH][AArch64][1/5] Improve immediate generation
Date: Wed, 02 Sep 2015 12:35:00 -0000
Message-ID: <000901d0e57b$c1767b50$446371f0$@com>
MIME-Version: 1.0
X-MC-Unique: TDfa8dddTvuJSJ0u43on-A-1
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: quoted-printable
X-SW-Source: 2015-09/txt/msg00138.txt.bz2

This patch reimplements aarch64_bitmask_imm using bitwise arithmetic rather=
 than a slow binary
search. The algorithm searches for a sequence of set bits. If there are no =
more set bits and not all
bits are set, it is a valid mask. Otherwise it determines the distance to t=
he next set bit and
checks the mask is repeated across the full 64 bits. Native performance is =
5-6x faster on typical
queries.

No change in generated code, passes GCC regression/bootstrap.

ChangeLog:
2015-09-02  Wilco Dijkstra  <wdijkstr@arm.com>

	* gcc/config/aarch64/aarch64.c (aarch64_bitmask_imm):
	Reimplement using faster algorithm.

---
 gcc/config/aarch64/aarch64.c | 62 +++++++++++++++++++++++++++++++++++++---=
----
 1 file changed, 53 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c666dce..ba1b77e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3301,19 +3301,63 @@ aarch64_movw_imm (HOST_WIDE_INT val, machine_mode m=
ode)
 	  || (val & (((HOST_WIDE_INT) 0xffff) << 16)) =3D=3D val);
 }
=20
+/* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2.  */
+
+static const unsigned HOST_WIDE_INT bitmask_imm_mul[] =3D
+  {
+    0x0000000100000001ull,
+    0x0001000100010001ull,
+    0x0101010101010101ull,
+    0x1111111111111111ull,
+    0x5555555555555555ull,
+  };
+
=20
 /* Return true if val is a valid bitmask immediate.  */
+
 bool
-aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode mode)
+aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode)
 {
-  if (GET_MODE_SIZE (mode) < 8)
-    {
-      /* Replicate bit pattern.  */
-      val &=3D (HOST_WIDE_INT) 0xffffffff;
-      val |=3D val << 32;
-    }
-  return bsearch (&val, aarch64_bitmasks, AARCH64_NUM_BITMASKS,
-		  sizeof (aarch64_bitmasks[0]), aarch64_bitmasks_cmp) !=3D NULL;
+  unsigned HOST_WIDE_INT val, tmp, mask, first_one, next_one;
+  int bits;
+
+  /* Check for a single sequence of one bits and return quickly if so.
+     The special cases of all ones and all zeroes returns false.  */
+  val =3D (unsigned HOST_WIDE_INT) val_in;
+  tmp =3D val + (val & -val);
+
+  if (tmp =3D=3D (tmp & -tmp))
+    return (val + 1) > 1;
+
+  /* Replicate 32-bit immediates so we can treat them as 64-bit.  */
+  if (mode =3D=3D SImode)
+    val =3D (val << 32) | (val & 0xffffffff);
+
+  /* Invert if the immediate doesn't start with a zero bit - this means we
+     only need to search for sequences of one bits.  */
+  if (val & 1)
+    val =3D ~val;
+
+  /* Find the first set bit and set tmp to val with the first sequence of =
one
+     bits removed.  Return success if there is a single sequence of ones. =
 */
+  first_one =3D val & -val;
+  tmp =3D val & (val + first_one);
+
+  if (tmp =3D=3D 0)
+    return true;
+
+  /* Find the next set bit and compute the difference in bit position.  */
+  next_one =3D tmp & -tmp;
+  bits =3D clz_hwi (first_one) - clz_hwi (next_one);
+  mask =3D val ^ tmp;
+
+  /* Check the bit position difference is a power of 2, and that the first
+     sequence of one bits fits within 'bits' bits.  */
+  if ((mask >> bits) !=3D 0 || bits !=3D (bits & -bits))
+    return false;
+
+  /* Check the sequence of one bits is repeated 64/bits times.  */
+  return val =3D=3D mask * bitmask_imm_mul[__builtin_clz (bits) - 26];
 }
=20
=20
--=20
1.8.3