Re: [PATCH] Fix result for conditional reductions matching at index 0

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jakub Jelinek <jakub@redhat.com>
To: Kilian Verhetsel <kilian.verhetsel@uclouvain.be>
Cc: Richard Biener <richard.guenther@gmail.com>,
	       Alan Hayward <Alan.Hayward@arm.com>,
	       GCC Patches <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>
Subject: Re: [PATCH] Fix result for conditional reductions matching at index 0
Date: Mon, 11 Dec 2017 13:11:00 -0000	[thread overview]
Message-ID: <20171211131134.GL2353@tucnak> (raw)
In-Reply-To: <87po7lleh4.fsf@uclouvain.be>

On Mon, Dec 11, 2017 at 11:56:55AM +0100, Kilian Verhetsel wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> > As it doesn't apply, I can't easily check what the patch generates
> > on the PR80631 testcases vs. my thoughts on that; though, if it emits
> > something more complicated for the simple cases, perhaps we could improve
> > those (not handle it like COND_REDUCTION, but instead as former
> > INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0
> > if 0 isn't usable for the condition never matched value.
> 
> While you could use values different from 0, I'm not sure this can be
> done as efficiently.  0 is convenient because a single bitwise-and
> between the index vector and the condition will set lanes that do not
> contain a match to 0.

Of course it can be done efficiently, what we care most is that the body of
the vectorized loop is efficient.  Whether we choose -1, 0 or 124 as the
COND_EXPR not ever meant value matters only before that loop (when we need
to load that into a register holding vector of all those constants) and
then a scalar comparison on the REDUC_* result.  Load of -1 vector on some targets
is as expensive as load of 0, for arbitrary value worst case it is one
memory load compared to a specialized zero register (or set all bits)
instruction.  On the other side, by not using any offsetted iteration var,
one can reuse the vector register that holds the IV, which can be used in
some loops too and thus decrease register pressure.
And while comparison against 0 is sometimes one scalar insn
cheaper than comparison against other value, if the insn producing it
already sets the flags, I doubt it is the case here, so it is exactly the
same cost.  Not to mention that in your patch you are instead subtracting
one in the scalar code.

> Jakub Jelinek <jakub@redhat.com> writes:
> > First of all, I fail to see why we don't handle negative step,
> > that can be done with REDUC_MIN instead of REDUC_MAX.
> 
> That would not work without first using values different from 0 to
> indicate the absence of matches (except in cases where all indices are
> negative), which I assume is why the test was there in the first place.
> 
> Below is the patch with fixed formatting and changes to make it apply
> cleanly to r255537 (as far as I can tell this was simply caused by some
> variables and constants having been renamed).

Thanks, it applies cleanly now
> +  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> +	    || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> +		== INTEGER_INDUC_COND_REDUCTION))
> +	   && reduc_fn == IFN_LAST)Â»

contains a character at the end of line that makes it not to compile.

Trying to understand your patch, here is the difference with your patch
between additional:
--- tree-vect-loop.c	2017-12-11 13:39:35.619122907 +0100
+++ tree-vect-loop.c	2017-12-11 13:35:27.000000000 +0100
@@ -6021,8 +6021,8 @@ vectorizable_reduction (gimple *stmt, gi
 	    dump_printf_loc (MSG_NOTE, vect_location,
 			     "condition expression based on "
 			     "integer induction.\n");
-	  STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
-	    = INTEGER_INDUC_COND_REDUCTION;
+/*	  STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	    = INTEGER_INDUC_COND_REDUCTION; */
 	}
so that COND_REDUCTION is used, and the case with
INTEGER_INDUC_COND_REDUCTION with your patch on:
int v[256] = { 77, 1, 79, 3, 4, 5, 6, 7 };

__attribute__((noipa)) void
foo ()
{
  int k, r = -1;
  for (k = 0; k < 256; k++)
    if (v[k] == 77)
      r = k;
  if (r != 0)
    __builtin_abort ();
}

   vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
   vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };
+  vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 };
 
   <bb 3> [local count: 139586436]:
   # k_12 = PHI <k_8(9), 0(2)>
   # r_13 = PHI <r_3(9), -1(2)>
   # ivtmp_11 = PHI <ivtmp_2(9), 256(2)>
   # vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 0, 1, 2, 3, 4, 5, 6, 7 }(2)>
-  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { -1, -1, -1, -1, -1, -1, -1, -1 }(2)>
+  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
   # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
-  # ivtmp_30 = PHI <ivtmp_31(9), { 1, 2, 3, 4, 5, 6, 7, 8 }(2)>
-  # _32 = PHI <_33(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
-  # ivtmp_43 = PHI <ivtmp_44(9), 0(2)>
+  # ivtmp_31 = PHI <ivtmp_32(9), { 1, 2, 3, 4, 5, 6, 7, 8 }(2)>
+  # _33 = PHI <_34(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
+  # ivtmp_41 = PHI <ivtmp_42(9), 0(2)>
   vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
   vect__1.4_27 = MEM[(int *)vectp_v.2_25];
   _1 = v[k_12];
   vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, vect_vec_iv_.0_22, vect_r_3.1_24>;
   r_3 = _1 == 77 ? k_12 : r_13;
   k_8 = k_12 + 1;
   ivtmp_2 = ivtmp_11 - 1;
   vectp_v.2_26 = vectp_v.2_25 + 32;
-  _33 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, ivtmp_30, _32>;
-  ivtmp_31 = ivtmp_30 + { 8, 8, 8, 8, 8, 8, 8, 8 };
-  ivtmp_44 = ivtmp_43 + 1;
-  if (ivtmp_44 < 32)
+  _34 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, ivtmp_31, _33>;
+  ivtmp_32 = ivtmp_31 + { 8, 8, 8, 8, 8, 8, 8, 8 };
+  ivtmp_42 = ivtmp_41 + 1;
+  if (ivtmp_42 < 32)
     goto <bb 9>; [92.31%]
   else
     goto <bb 18>; [7.69%]

...
   <bb 18> [local count: 10737418]:
   # r_19 = PHI <r_3(3)>
-  # vect_r_3.6_34 = PHI <vect_r_3.6_29(3)>
-  # _45 = PHI <_33(3)>
-  _35 = REDUC_MAX (_45);
-  _36 = {_35, _35, _35, _35, _35, _35, _35, _35};
-  _37 = { 0, 0, 0, 0, 0, 0, 0, 0 };
-  _38 = _45 == _36;
-  _39 = VEC_COND_EXPR <_38, vect_r_3.6_34, _37>;
-  _40 = VIEW_CONVERT_EXPR<vector(8) unsigned int>(_39);
-  _41 = REDUC_MAX (_40);
-  _42 = (int) _41;
-  if (_42 != 0)
+  # vect_r_3.6_35 = PHI <vect_r_3.6_29(3)>
+  # _43 = PHI <_34(3)>
+  _36 = REDUC_MAX (_43);
+  _37 = {_36, _36, _36, _36, _36, _36, _36, _36};
+  _38 = _36 + 4294967295;
+  _39 = (int) _38;
+  stmp_r_3.7_40 = _36 == 0 ? -1 : _39;
+  if (stmp_r_3.7_40 != 0)

Nothing uses vect_cst__30, the only real change is using 0 instead of -1
as the starting value for vect_r_3.1_24, and the REDUC_MAX stuff, where
the most important is that vect_r_3.6_35 is completely ignored with
INTEGER_INDUC_COND_EXPR and thus we rely on DCE to remove it.

So, my preference would be your patch goes in without the Â»
character and then we incrementally add SIMPLE_INTEGER_INDUC_COND_EXPR
which will be used if vectorizable_reduction determines possibilities
to use some value smaller than all iterations, and then tweak the cases
that can be handled with your INTEGER_INDUC_COND_EXPR or
SIMPLE_INTEGER_INDUC_COND_EXPR instead of COND_EXPR (negative step,
variable base).

	Jakub

next prev parent reply	other threads:[~2017-12-11 13:11 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-21 11:41 Kilian Verhetsel
2017-11-21 14:06 ` Richard Biener
2017-11-21 16:49   ` Kilian Verhetsel
2017-11-21 17:09     ` Alan Hayward
2017-11-22  9:17     ` Richard Biener
2017-11-22 11:15       ` Alan Hayward
2017-11-22 15:07         ` Richard Biener
2017-11-22 17:23           ` Kilian Verhetsel
2017-11-23 10:30             ` Alan Hayward
2017-11-23 12:39               ` Richard Biener
2017-12-08 18:15             ` Jakub Jelinek
2017-12-11 10:57               ` Kilian Verhetsel
2017-12-11 13:11                 ` Jakub Jelinek [this message]
2017-12-11 13:51                   ` Jakub Jelinek
2017-12-11 17:00                     ` Kilian Verhetsel
2017-12-11 17:06                       ` Jakub Jelinek
2017-12-11 21:22                       ` [PATCH] Fix result for conditional reductions matching at index 0 (PR tree-optimization/80631) Jakub Jelinek
2017-12-12  7:57                         ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171211131134.GL2353@tucnak \
    --to=jakub@redhat.com \
    --cc=Alan.Hayward@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kilian.verhetsel@uclouvain.be \
    --cc=nd@arm.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).