From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-401250-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 69712 invoked by alias); 25 Jun 2015 11:52:00 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 69701 invoked by uid 89); 25 Jun 2015 11:51:59 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.2 required=5.0 tests=AWL,BAYES_40,KAM_ASCII_DIVIDERS,SPF_PASS autolearn=no version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 25 Jun 2015 11:51:57 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-25-i4UMXhRcR0yqiEmWoUaxeg-1
Received: from localhost ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Thu, 25 Jun 2015 12:51:54 +0100
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <richard.guenther@gmail.com>
Mail-Followup-To: Richard Biener <richard.guenther@gmail.com>,GCC Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: Remove redundant AND from count reduction loop
References: <87pp4m8mkp.fsf@e105548-lin.cambridge.arm.com>	<alpine.DEB.2.20.1506232307180.1715@laptop-mg.saclay.inria.fr>	<CAFiYyc0F3YqAjr2+EMf8STUBrVN+bE3aYpC+1nnstSaf2oiaDg@mail.gmail.com>	<87egl1sa2p.fsf@e105548-lin.cambridge.arm.com>	<CAFiYyc0_JiQSZ=M1NqZAaDgh6kYy_EXtpZp9cd2NJOtTG46ang@mail.gmail.com>	<87a8vps6p1.fsf@e105548-lin.cambridge.arm.com>	<CAFiYyc1M4-j8QOWSkowHp8fh5D9ScYACEDPQ0KHH52tEBrP54g@mail.gmail.com>	<871th1s322.fsf@e105548-lin.cambridge.arm.com>	<CAFiYyc1Q_NJftYjH49v+konjyiOZw_hzJ6un31R_mizdhk60=w@mail.gmail.com>	<87twtxqlbq.fsf@e105548-lin.cambridge.arm.com>	<CAFiYyc3qGd_L-dSGN7vefXV+901xhWj0YfYk0pA_w5C9VxwQ4A@mail.gmail.com>	<87pp4kqk4l.fsf@e105548-lin.cambridge.arm.com>	<CAFiYyc0KeCps7BQXzdUo=t3uqx=uz3SH6Nae44KgZQu+fzyhXg@mail.gmail.com>
Date: Thu, 25 Jun 2015 11:52:00 -0000
In-Reply-To: <CAFiYyc0KeCps7BQXzdUo=t3uqx=uz3SH6Nae44KgZQu+fzyhXg@mail.gmail.com>	(Richard Biener's message of "Thu, 25 Jun 2015 11:05:05 +0100")
Message-ID: <87lhf8qa3p.fsf@e105548-lin.cambridge.arm.com>
User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/24.3 (gnu/linux)
MIME-Version: 1.0
X-MC-Unique: i4UMXhRcR0yqiEmWoUaxeg-1
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: quoted-printable
X-SW-Source: 2015-06/txt/msg01785.txt.bz2

Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Jun 25, 2015 at 10:15 AM, Richard Sandiford
>> Index: gcc/match.pd
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- gcc/match.pd        2015-06-24 20:24:31.344998571 +0100
>> +++ gcc/match.pd        2015-06-24 20:24:31.340998617 +0100
>> @@ -1014,6 +1014,26 @@ along with GCC; see the file COPYING3.
>>    (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
>>    (cnd @0 @2 @1)))
>>
>> +/* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C), since vector comparisons
>> +   return all-1 or all-0 results.  */
>> +/* ??? We could instead convert all instances of the vec_cond to negate,
>> +   but that isn't necessarily a win on its own.  */
>> +(simplify
>> + (plus:c @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zer=
op@2)))
>> + (if (VECTOR_TYPE_P (type)
>> +      && TYPE_VECTOR_SUBPARTS (type) =3D=3D TYPE_VECTOR_SUBPARTS (TREE_=
TYPE (@0))
>> +      && (TYPE_MODE (TREE_TYPE (type))
>> +          =3D=3D TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)))))
>> +  (minus @3 (view_convert @0))))
>> +
>> +/* ... likewise A - (B vcmp C ? 1 : 0) -> A + (B vcmp C).  */
>> +(simplify
>> + (minus @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zero=
p@2)))
>> + (if (VECTOR_TYPE_P (type)
>> +      && TYPE_VECTOR_SUBPARTS (type) =3D=3D TYPE_VECTOR_SUBPARTS (TREE_=
TYPE (@0))
>> +      && (TYPE_PRECISION (TREE_TYPE (type))
>> +          =3D=3D TYPE_PRECISION (TREE_TYPE (TREE_TYPE (@0)))))
>
> Either TYPE_PRECISION or TYPE_MODE please ;)

Bah.  The main reason I hate cut-&-paste is that I'm so hopeless at it.

> I think that TYPE_MODE is more correct if you consider (minus V4SF
> (view_convert:V4SF (vec_cond V4SI V4SI V4SI)) where you would end up
> with a non-sensical TYPE_PRECISION query on V4SF.  So probably
> VECTOR_INTEGER_TYPE_P again, then TYPE_PRECISION is good.

Actually, they were both meant to be TYPE_MODE, as below.  Is this OK?

Thanks,
Richard


gcc/
	* match.pd: Add patterns for vec_conds between 1 and 0.

gcc/testsuite/
	* gcc.target/aarch64/vect-add-sub-cond.c: New test.

Index: gcc/match.pd
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- gcc/match.pd	2015-06-25 11:06:50.462827031 +0100
+++ gcc/match.pd	2015-06-25 11:07:23.742445798 +0100
@@ -1014,6 +1014,26 @@ along with GCC; see the file COPYING3.
   (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
   (cnd @0 @2 @1)))
=20
+/* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C), since vector comparisons
+   return all-1 or all-0 results.  */
+/* ??? We could instead convert all instances of the vec_cond to negate,
+   but that isn't necessarily a win on its own.  */
+(simplify
+ (plus:c @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@=
2)))
+ (if (VECTOR_TYPE_P (type)
+      && TYPE_VECTOR_SUBPARTS (type) =3D=3D TYPE_VECTOR_SUBPARTS (TREE_TYP=
E (@0))
+      && (TYPE_MODE (TREE_TYPE (type))
+          =3D=3D TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)))))
+  (minus @3 (view_convert @0))))
+
+/* ... likewise A - (B vcmp C ? 1 : 0) -> A + (B vcmp C).  */
+(simplify
+ (minus @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@2=
)))
+ (if (VECTOR_TYPE_P (type)
+      && TYPE_VECTOR_SUBPARTS (type) =3D=3D TYPE_VECTOR_SUBPARTS (TREE_TYP=
E (@0))
+      && (TYPE_MODE (TREE_TYPE (type))
+          =3D=3D TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)))))
+  (plus @3 (view_convert @0))))
=20
 /* Simplifications of comparisons.  */
=20
Index: gcc/testsuite/gcc.target/aarch64/vect-add-sub-cond.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /dev/null	2015-06-02 17:27:28.541944012 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect-add-sub-cond.c	2015-06-25 11:06:5=
0.458827055 +0100
@@ -0,0 +1,94 @@
+/* Make sure that vector comaprison results are not unnecessarily ANDed
+   with vectors of 1.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#define COUNT1(X) if (X) count +=3D 1
+#define COUNT2(X) if (X) count -=3D 1
+#define COUNT3(X) count +=3D (X)
+#define COUNT4(X) count -=3D (X)
+
+#define COND1(X) (X)
+#define COND2(X) ((X) ? 1 : 0)
+#define COND3(X) ((X) ? -1 : 0)
+#define COND4(X) ((X) ? 0 : 1)
+#define COND5(X) ((X) ? 0 : -1)
+
+#define TEST_LT(X, Y) ((X) < (Y))
+#define TEST_LE(X, Y) ((X) <=3D (Y))
+#define TEST_GT(X, Y) ((X) > (Y))
+#define TEST_GE(X, Y) ((X) >=3D (Y))
+#define TEST_EQ(X, Y) ((X) =3D=3D (Y))
+#define TEST_NE(X, Y) ((X) !=3D (Y))
+
+#define COUNT_LOOP(ID, TYPE, CMP_ARRAY, TEST, COUNT) \
+  TYPE \
+  reduc_##ID (__typeof__ (CMP_ARRAY[0]) x) \
+  { \
+    TYPE count =3D 0; \
+    for (unsigned int i =3D 0; i < 1024; ++i) \
+      COUNT (TEST (CMP_ARRAY[i], x)); \
+    return count; \
+  }
+
+#define COND_LOOP(ID, ARRAY, CMP_ARRAY, TEST, COND) \
+  void \
+  plus_##ID (__typeof__ (CMP_ARRAY[0]) x) \
+  { \
+    for (unsigned int i =3D 0; i < 1024; ++i) \
+      ARRAY[i] +=3D COND (TEST (CMP_ARRAY[i], x)); \
+  } \
+  void \
+  plusc_##ID (void) \
+  { \
+    for (unsigned int i =3D 0; i < 1024; ++i) \
+      ARRAY[i] +=3D COND (TEST (CMP_ARRAY[i], 10)); \
+  } \
+  void \
+  minus_##ID (__typeof__ (CMP_ARRAY[0]) x) \
+  { \
+    for (unsigned int i =3D 0; i < 1024; ++i) \
+      ARRAY[i] -=3D COND (TEST (CMP_ARRAY[i], x)); \
+  } \
+  void \
+  minusc_##ID (void) \
+  { \
+    for (unsigned int i =3D 0; i < 1024; ++i) \
+      ARRAY[i] +=3D COND (TEST (CMP_ARRAY[i], 1)); \
+  }
+
+#define ALL_LOOPS(ID, ARRAY, CMP_ARRAY, TEST) \
+  typedef __typeof__(ARRAY[0]) ID##_type; \
+  COUNT_LOOP (ID##_1, ID##_type, CMP_ARRAY, TEST, COUNT1) \
+  COUNT_LOOP (ID##_2, ID##_type, CMP_ARRAY, TEST, COUNT2) \
+  COUNT_LOOP (ID##_3, ID##_type, CMP_ARRAY, TEST, COUNT3) \
+  COUNT_LOOP (ID##_4, ID##_type, CMP_ARRAY, TEST, COUNT4) \
+  COND_LOOP (ID##_1, ARRAY, CMP_ARRAY, TEST, COND1) \
+  COND_LOOP (ID##_2, ARRAY, CMP_ARRAY, TEST, COND2) \
+  COND_LOOP (ID##_3, ARRAY, CMP_ARRAY, TEST, COND3) \
+  COND_LOOP (ID##_4, ARRAY, CMP_ARRAY, TEST, COND4) \
+  COND_LOOP (ID##_5, ARRAY, CMP_ARRAY, TEST, COND5)
+
+signed int asi[1024] __attribute__ ((aligned (16)));
+unsigned int aui[1024] __attribute__ ((aligned (16)));
+signed long long asl[1024] __attribute__ ((aligned (16)));
+unsigned long long aul[1024] __attribute__ ((aligned (16)));
+float af[1024] __attribute__ ((aligned (16)));
+double ad[1024] __attribute__ ((aligned (16)));
+
+ALL_LOOPS (si_si, aui, asi, TEST_LT)
+ALL_LOOPS (ui_si, aui, asi, TEST_LE)
+ALL_LOOPS (si_ui, aui, asi, TEST_GT)
+ALL_LOOPS (ui_ui, aui, asi, TEST_GE)
+ALL_LOOPS (sl_sl, asl, asl, TEST_NE)
+ALL_LOOPS (ul_ul, aul, aul, TEST_EQ)
+ALL_LOOPS (si_f, asi, af, TEST_LE)
+ALL_LOOPS (ui_f, aui, af, TEST_GT)
+ALL_LOOPS (sl_d, asl, ad, TEST_GE)
+ALL_LOOPS (ul_d, aul, ad, TEST_GT)
+
+/* { dg-final { scan-assembler-not "\tand\t" } } */
+/* { dg-final { scan-assembler-not "\tld\[^\t\]*\t\[wx\]" } } */
+/* { dg-final { scan-assembler-not "\tst\[^\t\]*\t\[wx\]" } } */
+/* { dg-final { scan-assembler "\tldr\tq" } } */
+/* { dg-final { scan-assembler "\tstr\tq" } } */