Re: [RFC][PR82479] missing popcount builtin detection

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Bin.Cheng" <amker.cheng@gmail.com>
To: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Cc: Richard Biener <richard.guenther@gmail.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC][PR82479] missing popcount builtin detection
Date: Thu, 17 May 2018 10:05:00 -0000	[thread overview]
Message-ID: <CAHFci2_4J3MgyjNDUPkBknTORQ8Uq1nExnN=KzFLnU_pQ47BOg@mail.gmail.com> (raw)
In-Reply-To: <CAELXzTNWmeULCBA8rgn5zq3UZHrNP0qK4dW_0gYHTqxyDy9CJA@mail.gmail.com>

On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,
>
> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:
>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
>> <kugan.vivekanandarajah@linaro.org> wrote:
>>> Hi Richard,
>>>
>>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:
>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah
>>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah
>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Thanks for the review.
>>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah
>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I
>>>>>>>>> would like to queue this for review for next stage 1.
>>>>>>>>>
>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.
>>>>>>>>> 2. This does not distribute loop to detect popcount (like
>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct
>>>>>>>>> me if I am wrong.
>>>>>>>>
>>>>>>>> But then it has no business inside loop distribution but instead is
>>>>>>>> doing final value
>>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think
>>>>>>>> final value replacement would already do the correct thing if you
>>>>>>>> teached number of
>>>>>>>> iteration analysis that niter for
>>>>>>>>
>>>>>>>>   <bb 3> [local count: 955630224]:
>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>
>>>>>>>>   _1 = b_11 + -1;
>>>>>>>>   b_8 = _1 & b_11;
>>>>>>>>   if (b_8 != 0)
>>>>>>>>     goto <bb 6>; [89.00%]
>>>>>>>>   else
>>>>>>>>     goto <bb 8>; [11.00%]
>>>>>>>>
>>>>>>>>   <bb 6> [local count: 850510900]:
>>>>>>>>   goto <bb 3>; [100.00%]
>>>>>>>
>>>>>>> I am looking into this approach. What should be the scalar evolution
>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me
>>>>>>> and can this be represented with the scev?
>>>>>>
>>>>>> No, it's not affine and thus cannot be represented.  You only need the
>>>>>> scalar evolution of the counting IV which is already handled and
>>>>>> the number of iteration analysis needs to handle the above IV - this
>>>>>> is the missing part.
>>>>> Thanks for the clarification. I am now matching this loop pattern in
>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions
>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in
>>>>> the loop preheater and setting the loop niter with this. This will be
>>>>> used by the final value replacement. Is this what you wanted?
>>>>
>>>> No, you shouldn't insert a popcount stmt but instead the niter
>>>> GENERIC tree should be a CALL_EXPR to popcount with the
>>>> appropriate argument.
>>>
>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if
>>> niter in tree_niter_desc can take such.
>>>
>>> Attached patch now does this. Also had to add support for CALL_EXPR in
>>> few places to handle niter with CALL_EXPR. Does this look OK?
>>
>> Overall this looks ok - the patch includes changes in places that I don't think
>> need changes such as chrec_convert_1 or extract_ops_from_tree.
>> The expression_expensive_p change should be more specific than making
>> all calls inexpensive as well.
>
> Changed it.
>
>>
>> The verify_ssa change looks bogus, you do
>>
>> +  dest = gimple_phi_result (count_phi);
>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
>> +
>> +  var = build_call_expr (fn, 1, src);
>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
>> +                       build_int_cst (TREE_TYPE (dest), 1));
>>
>> why do you allocate a new SSA name here?  It seems unused
>> as you overwrive 'var' with the CALL_EXPR immediately.
> Changed now.
>
>>
>> I didn't review the pattern matching thoroughly nor the exact place you
>> call it.  But
>>
>> +      if (check_popcount_pattern (loop, &count))
>> +       {
>> +         niter->assumptions = boolean_false_node;
>> +         niter->control.base = NULL_TREE;
>> +         niter->control.step = NULL_TREE;
>> +         niter->control.no_overflow = false;
>> +         niter->niter = count;
>> +         niter->assumptions = boolean_true_node;
>> +         niter->may_be_zero = boolean_false_node;
>> +         niter->max = -1;
>> +         niter->bound = NULL_TREE;
>> +         niter->cmp = ERROR_MARK;
>> +         return true;
>> +       }
>>
>> simply setting may_be_zero to false looks fishy.
> Should I set this to (argument to popcount == zero)?
No, I think that's unnecessary.  The number of iterations is computed
as: may_be_zero ? 0 : niter;
Here niter is ZERO even when may_be_zero is set to false, and niters
is computed correctly.

I think the point is that may_be_zero is false doesn't imply that
niters is non-zero.

>
>> Try with -fno-tree-loop-ch.
> I changed the pattern matching to handle loop without header copying
> too. Looks a bit complicated checking all the conditions. Wondering if
> this can be done in a simpler and easier to read way.
>
>>  Also max should not be negative,
>> it should be the number of bits in the IV type?
>
> Changed this too.
>>
>> A related testcase could be that we can completely peel
>> a loop like the following which iterates at most 8 times:
>>
>> int a[8];
>> void foo (unsigned char ctrl)
>> {
>>   int c = 0;
>>   while (ctrl)
>>     {
>>        ctrl = ctrl & (ctrl - 1);
>>        a[c++] = ctrl;
>>     }
>> }
>
> Hmm, this is an interesting test case but as I am now trying to match
> a loop which does popcount, this is not handled.
>
>
> Attaching the current version for review.
Here are some comments.

> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index 7a54c5f..f390321 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -2430,6 +2430,134 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
>    return (!integer_zerop (niter->assumptions));
>  }
>
> +/* See if LOOP is a popcout implementation of the form
> +
> +    int c = 0;
> +    while (b) {
> +    b = b & (b - 1);
> +    c++;
> +    }
> +
> +    If so, Set NITER to  __builtin_popcount (b) - 1
> +    return true if we did, false otherwise.  */
> +
> +static bool
> +check_popcount_pattern (loop_p loop, tree *niter, HOST_WIDE_INT *max)
> +{
> +  tree lhs, rhs;
> +  tree dest;
> +  gimple *and_minus_one;
> +  gimple *phi;
> +  int count = 0;
> +  gimple *count_stmt = NULL;
> +  bool adjust = true;
> +
> +  if (!single_exit (loop))
> +    return false;
> +
> +  /* Check loop terminating branch is like
> +     if (b != 0).  */
> +  gimple *stmt = last_stmt (loop->header);
> +  if (!stmt
> +      || gimple_code (stmt) != GIMPLE_COND
> +      || !zerop (gimple_cond_rhs (stmt)))
The check doesn't fully match the comment.  NE is not checked here.
Also can move below "(TREE_CODE (lhs) != SSA_NAME)" check up to this
point, making simple checks earlier.

> +    return false;
> +
> +  /* Check the loop closed SSA definition for just the variable c defined in
> +     loop.  */
> +  basic_block bb = single_exit (loop)->dest;
single_exit is repeatedly called various times, call it once and use
the returning edge instead?  I am not sure GCC is smart enough
removing repeated call instances.  Same to loop_latch_edge.

> +  for (gphi_iterator gpi = gsi_start_phis (bb);
> +       !gsi_end_p (gpi); gsi_next (&gpi))
> +    {
> +      phi = gpi.phi ();
> +      count++;
> +    }
> +
> +  if (count != 1)
> +    return false;
> +
> +  rhs = gimple_phi_arg_def (phi, single_exit (loop)->dest_idx);
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    return false;
> +  count_stmt = SSA_NAME_DEF_STMT (rhs);
> +  lhs = gimple_cond_lhs (stmt);
> +  if (TREE_CODE (lhs) != SSA_NAME)
> +    return false;
> +  gimple *and_stmt = SSA_NAME_DEF_STMT (lhs);
> +
> +  /* Depending on copy-header is performed, feeding PHI stmts might be in
> +     the loop header or loop exit, handle this.  */
> +  if (gimple_code (count_stmt) == GIMPLE_PHI)
> +    {
> +      tree t;
> +      if (gimple_code (and_stmt) != GIMPLE_PHI
> +      || gimple_bb (and_stmt) != single_exit (loop)->src
> +      || gimple_bb (count_stmt) != single_exit (loop)->src)
> +    return false;
> +      t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx);
> +      if (TREE_CODE (t) != SSA_NAME)
> +    return false;
> +      count_stmt = SSA_NAME_DEF_STMT (t);
> +      t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);
> +      if (TREE_CODE (t) != SSA_NAME)
> +    return false;
> +      and_stmt = SSA_NAME_DEF_STMT (t);
> +      adjust = false;
> +    }
> +
> +  /* Make sure we have a count by one.  */
> +  if (!is_gimple_assign (count_stmt)
> +      || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR)
> +      || !integer_onep (gimple_assign_rhs2 (count_stmt)))
> +    return false;
> +
> +  /* Cheeck "b = b & (b - 1)" is calculated.  */
Typo.

> +  if (!is_gimple_assign (and_stmt)
> +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
> +    return false;
> +
> +  lhs = gimple_assign_rhs1 (and_stmt);
> +  rhs = gimple_assign_rhs2 (and_stmt);
> +  if (TREE_CODE (lhs) == SSA_NAME
> +      && (and_minus_one = SSA_NAME_DEF_STMT (lhs))
> +      && is_gimple_assign (and_minus_one)
> +      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
> +      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
> +      lhs = rhs;
> +  else if (TREE_CODE (rhs) == SSA_NAME
> +      && (and_minus_one = SSA_NAME_DEF_STMT (rhs))
> +      && is_gimple_assign (and_minus_one)
> +      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
> +      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
Could you avoid duplication by factoring the condition into an inline
function?  They are exactly the same for lhs/rhs.

> +      ;
> +  else
> +    return false;
> +
> +  if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one))
> +      && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one)))
Here you already got lhs correctly, so don't need to check on and_stmt
rhs directly.  You can even merge this check into above one.

> +    return false;
> +
> +  /* Check the recurrence.  */
> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));
> +  gimple *src_phi = SSA_NAME_DEF_STMT (lhs);
> +  if (gimple_code (phi) != GIMPLE_PHI
> +      || gimple_code (src_phi) != GIMPLE_PHI)
I think this is redundant since you have lhs equals to
gimple_assign_rhs1 (and_minus_one).  So phi == src_phi is always true?

> +    return false;
> +
> +  dest = gimple_assign_lhs (count_stmt);
> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);
> +  if (adjust)
> +    *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),
> +              build_call_expr (fn, 1, src),
> +              build_int_cst (TREE_TYPE (dest), 1));
> +  else
> +    *niter = build_call_expr (fn, 1, src);
> +  *max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));
> +  return true;
> +}
> +
> +
>  /* Like number_of_iterations_exit_assumptions, but return TRUE only if
>     the niter information holds unconditionally.  */
>
> @@ -2441,7 +2569,25 @@ number_of_iterations_exit (struct loop *loop, edge exit,
>    gcond *stmt;
>    if (!number_of_iterations_exit_assumptions (loop, exit, niter,
>                            &stmt, every_iteration))
> -    return false;
> +    {
> +      tree count;
> +      HOST_WIDE_INT max;
> +      if (check_popcount_pattern (loop, &count, &max))
> +    {
> +      niter->assumptions = boolean_false_node;
> +      niter->control.base = NULL_TREE;
> +      niter->control.step = NULL_TREE;
> +      niter->control.no_overflow = false;
> +      niter->niter = count;
> +      niter->assumptions = boolean_true_node;
> +      niter->may_be_zero = boolean_false_node;
> +      niter->max = max;
> +      niter->bound = NULL_TREE;
> +      niter->cmp = ERROR_MARK;
> +      return true;
> +    }
Better to merge these compound statement into check_popcount_pattern
and rename it into something like number_of_iterations_popcount.

I wondered if the more inefficient version popcount should be checked, like:

int count = 0;
while (x)
{
  count += x & 1;
  x = x >> 1;
}

Thanks,
bin

> +      return false;
> +    }
>
>    if (integer_nonzerop (niter->assumptions))
>      return true;
> --
> 2.7.4
>

next prev parent reply	other threads:[~2018-05-17  9:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-24 22:17 Kugan Vivekanandarajah
2018-01-25 10:24 ` Richard Biener
2018-01-31 10:41   ` Kugan Vivekanandarajah
2018-01-31 11:05     ` Richard Biener
2018-02-01  4:08       ` Kugan Vivekanandarajah
2018-02-01 12:21         ` Richard Biener
2018-02-08  0:41           ` Kugan Vivekanandarajah
2018-03-05 15:25             ` Richard Biener
2018-03-06 16:20               ` Bin.Cheng
2018-03-07  8:26                 ` Richard Biener
2018-03-07 11:25                   ` Bin.Cheng
2018-03-08 22:07                     ` Kugan Vivekanandarajah
2018-05-17  2:16               ` Kugan Vivekanandarajah
2018-05-17 10:05                 ` Bin.Cheng [this message]
2018-05-31  6:52                   ` Kugan Vivekanandarajah
2018-05-31 17:53                     ` Bin.Cheng
2018-06-01  9:57                       ` Kugan Vivekanandarajah
2018-06-01 10:06                         ` Bin.Cheng
2018-06-01 13:12                           ` Richard Biener
2018-06-05  9:02                             ` Kugan Vivekanandarajah
2018-06-05 11:25                               ` Richard Biener
2018-06-07  0:52                                 ` Kugan Vivekanandarajah
2018-06-07  9:21                                   ` Richard Biener
2018-06-12  3:10                                     ` Kugan Vivekanandarajah
2018-06-14 13:57                                       ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHFci2_4J3MgyjNDUPkBknTORQ8Uq1nExnN=KzFLnU_pQ47BOg@mail.gmail.com' \
    --to=amker.cheng@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kugan.vivekanandarajah@linaro.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).