From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-468907-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 89590 invoked by alias); 11 Dec 2017 13:11:45 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 89573 invoked by uid 89); 11 Dec 2017 13:11:44 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-16.4 required=5.0 tests=BAYES_00,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_NUMSUBJECT,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=k_8
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 11 Dec 2017 13:11:42 +0000
Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 35EB92D1EF9;	Mon, 11 Dec 2017 13:11:41 +0000 (UTC)
Received: from tucnak.zalov.cz (ovpn-116-34.ams2.redhat.com [10.36.116.34])	by smtp.corp.redhat.com (Postfix) with ESMTPS id 965B62D076;	Mon, 11 Dec 2017 13:11:40 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])	by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id vBBDBauk007078;	Mon, 11 Dec 2017 14:11:37 +0100
Received: (from jakub@localhost)	by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id vBBDBYPl007077;	Mon, 11 Dec 2017 14:11:34 +0100
Date: Mon, 11 Dec 2017 13:11:00 -0000
From: Jakub Jelinek <jakub@redhat.com>
To: Kilian Verhetsel <kilian.verhetsel@uclouvain.be>
Cc: Richard Biener <richard.guenther@gmail.com>,        Alan Hayward <Alan.Hayward@arm.com>,        GCC Patches <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>
Subject: Re: [PATCH] Fix result for conditional reductions matching at index 0
Message-ID: <20171211131134.GL2353@tucnak>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <87d14brhj6.fsf@uclouvain.be> <CAFiYyc3VrW4bVB9WXQ4yMhPRn_FpV6T3vAA=6YLVi5CdZopHEg@mail.gmail.com> <87zi7fbn07.fsf@uclouvain.be> <CAFiYyc2GcuaKcD8XgfT0XkFd=gCRFiaM=eHMXFTK1P1Ct4rWzw@mail.gmail.com> <E4166ABC-97D0-4A48-B1C6-5EAB06CC1979@arm.com> <CAFiYyc2jzbC4x-h=+8ksNz4Oc90U4XMqrhqNHnwf+Tj=84Vckg@mail.gmail.com> <87wp2ib6aj.fsf@uclouvain.be> <20171208181501.GB2353@tucnak> <87po7lleh4.fsf@uclouvain.be>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <87po7lleh4.fsf@uclouvain.be>
User-Agent: Mutt/1.7.1 (2016-10-04)
X-IsSubscribed: yes
X-SW-Source: 2017-12/txt/msg00610.txt.bz2

On Mon, Dec 11, 2017 at 11:56:55AM +0100, Kilian Verhetsel wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> > As it doesn't apply, I can't easily check what the patch generates
> > on the PR80631 testcases vs. my thoughts on that; though, if it emits
> > something more complicated for the simple cases, perhaps we could improve
> > those (not handle it like COND_REDUCTION, but instead as former
> > INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0
> > if 0 isn't usable for the condition never matched value.
> 
> While you could use values different from 0, I'm not sure this can be
> done as efficiently.  0 is convenient because a single bitwise-and
> between the index vector and the condition will set lanes that do not
> contain a match to 0.

Of course it can be done efficiently, what we care most is that the body of
the vectorized loop is efficient.  Whether we choose -1, 0 or 124 as the
COND_EXPR not ever meant value matters only before that loop (when we need
to load that into a register holding vector of all those constants) and
then a scalar comparison on the REDUC_* result.  Load of -1 vector on some targets
is as expensive as load of 0, for arbitrary value worst case it is one
memory load compared to a specialized zero register (or set all bits)
instruction.  On the other side, by not using any offsetted iteration var,
one can reuse the vector register that holds the IV, which can be used in
some loops too and thus decrease register pressure.
And while comparison against 0 is sometimes one scalar insn
cheaper than comparison against other value, if the insn producing it
already sets the flags, I doubt it is the case here, so it is exactly the
same cost.  Not to mention that in your patch you are instead subtracting
one in the scalar code.

> Jakub Jelinek <jakub@redhat.com> writes:
> > First of all, I fail to see why we don't handle negative step,
> > that can be done with REDUC_MIN instead of REDUC_MAX.
> 
> That would not work without first using values different from 0 to
> indicate the absence of matches (except in cases where all indices are
> negative), which I assume is why the test was there in the first place.
> 
> Below is the patch with fixed formatting and changes to make it apply
> cleanly to r255537 (as far as I can tell this was simply caused by some
> variables and constants having been renamed).

Thanks, it applies cleanly now
> +  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> +	    || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> +		== INTEGER_INDUC_COND_REDUCTION))
> +	   && reduc_fn == IFN_LAST)»

contains a character at the end of line that makes it not to compile.

Trying to understand your patch, here is the difference with your patch
between additional:
--- tree-vect-loop.c	2017-12-11 13:39:35.619122907 +0100
+++ tree-vect-loop.c	2017-12-11 13:35:27.000000000 +0100
@@ -6021,8 +6021,8 @@ vectorizable_reduction (gimple *stmt, gi
 	    dump_printf_loc (MSG_NOTE, vect_location,
 			     "condition expression based on "
 			     "integer induction.\n");
-	  STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
-	    = INTEGER_INDUC_COND_REDUCTION;
+/*	  STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	    = INTEGER_INDUC_COND_REDUCTION; */
 	}
so that COND_REDUCTION is used, and the case with
INTEGER_INDUC_COND_REDUCTION with your patch on:
int v[256] = { 77, 1, 79, 3, 4, 5, 6, 7 };

__attribute__((noipa)) void
foo ()
{
  int k, r = -1;
  for (k = 0; k < 256; k++)
    if (v[k] == 77)
      r = k;
  if (r != 0)
    __builtin_abort ();
}

   vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
   vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };
+  vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 };
 
   <bb 3> [local count: 139586436]:
   # k_12 = PHI <k_8(9), 0(2)>
   # r_13 = PHI <r_3(9), -1(2)>
   # ivtmp_11 = PHI <ivtmp_2(9), 256(2)>
   # vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 0, 1, 2, 3, 4, 5, 6, 7 }(2)>
-  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { -1, -1, -1, -1, -1, -1, -1, -1 }(2)>
+  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
   # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
-  # ivtmp_30 = PHI <ivtmp_31(9), { 1, 2, 3, 4, 5, 6, 7, 8 }(2)>
-  # _32 = PHI <_33(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
-  # ivtmp_43 = PHI <ivtmp_44(9), 0(2)>
+  # ivtmp_31 = PHI <ivtmp_32(9), { 1, 2, 3, 4, 5, 6, 7, 8 }(2)>
+  # _33 = PHI <_34(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
+  # ivtmp_41 = PHI <ivtmp_42(9), 0(2)>
   vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
   vect__1.4_27 = MEM[(int *)vectp_v.2_25];
   _1 = v[k_12];
   vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, vect_vec_iv_.0_22, vect_r_3.1_24>;
   r_3 = _1 == 77 ? k_12 : r_13;
   k_8 = k_12 + 1;
   ivtmp_2 = ivtmp_11 - 1;
   vectp_v.2_26 = vectp_v.2_25 + 32;
-  _33 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, ivtmp_30, _32>;
-  ivtmp_31 = ivtmp_30 + { 8, 8, 8, 8, 8, 8, 8, 8 };
-  ivtmp_44 = ivtmp_43 + 1;
-  if (ivtmp_44 < 32)
+  _34 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, ivtmp_31, _33>;
+  ivtmp_32 = ivtmp_31 + { 8, 8, 8, 8, 8, 8, 8, 8 };
+  ivtmp_42 = ivtmp_41 + 1;
+  if (ivtmp_42 < 32)
     goto <bb 9>; [92.31%]
   else
     goto <bb 18>; [7.69%]

...
   <bb 18> [local count: 10737418]:
   # r_19 = PHI <r_3(3)>
-  # vect_r_3.6_34 = PHI <vect_r_3.6_29(3)>
-  # _45 = PHI <_33(3)>
-  _35 = REDUC_MAX (_45);
-  _36 = {_35, _35, _35, _35, _35, _35, _35, _35};
-  _37 = { 0, 0, 0, 0, 0, 0, 0, 0 };
-  _38 = _45 == _36;
-  _39 = VEC_COND_EXPR <_38, vect_r_3.6_34, _37>;
-  _40 = VIEW_CONVERT_EXPR<vector(8) unsigned int>(_39);
-  _41 = REDUC_MAX (_40);
-  _42 = (int) _41;
-  if (_42 != 0)
+  # vect_r_3.6_35 = PHI <vect_r_3.6_29(3)>
+  # _43 = PHI <_34(3)>
+  _36 = REDUC_MAX (_43);
+  _37 = {_36, _36, _36, _36, _36, _36, _36, _36};
+  _38 = _36 + 4294967295;
+  _39 = (int) _38;
+  stmp_r_3.7_40 = _36 == 0 ? -1 : _39;
+  if (stmp_r_3.7_40 != 0)

Nothing uses vect_cst__30, the only real change is using 0 instead of -1
as the starting value for vect_r_3.1_24, and the REDUC_MAX stuff, where
the most important is that vect_r_3.6_35 is completely ignored with
INTEGER_INDUC_COND_EXPR and thus we rely on DCE to remove it.

So, my preference would be your patch goes in without the »
character and then we incrementally add SIMPLE_INTEGER_INDUC_COND_EXPR
which will be used if vectorizable_reduction determines possibilities
to use some value smaller than all iterations, and then tweak the cases
that can be handled with your INTEGER_INDUC_COND_EXPR or
SIMPLE_INTEGER_INDUC_COND_EXPR instead of COND_EXPR (negative step,
variable base).

	Jakub