[Bug tree-optimization/109088] GCC does not always vectorize conditional reduction

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "juzhe.zhong at rivai dot ai" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
Date: Wed, 27 Sep 2023 07:34:41 +0000	[thread overview]
Message-ID: <bug-109088-4-PcK74xZbZ2@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-109088-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #10 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Biener from comment #9)
> (In reply to JuzheZhong from comment #8)
> > It's because the order of the operations we are doing:
> > 
> > For code as follows:
> > 
> > result += mask ? a[i] + x : 0;
> > 
> > GCC:
> > result_ssa_1 = PHI <result_ssa_2, 0>
> > ...
> > STMT 1. tmp = a[i] + x;
> > STMT 2. tmp2 = tmp + result_ssa_1;
> > STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;
> > 
> > Here we can see both STMT 2 and STMT 3 are using 'result_ssa_1',
> > we end up with 2 uses of the PHI result. Then, we failed to vectorize.
> > 
> > Wheras LLVM:
> > 
> > result_ssa_1 = PHI <result_ssa_2, 0>
> > ...
> > IR 1. tmp = a[i] + x;
> > IR 2. tmp2 = mask ? tmp : 0;
> > IR 3. result_ssa_2 = tmp2 + result_ssa_1.
> 
> For floating point these are not equivalent (adding zero isn't a no-op).

Yes, I agree these are not equivalent for floating-point.
But I they are equivalent if we specify -ffast-math.

I have double checked LLVM, they failed to vectorize conditionl
floating-point reduction too by default.

However, if we specify LLVM -ffast-math, it will generate the same 
if-conversion IR sequence as integer, then vectorization succeed.

> 
> > LLVM only has 1 use.
> > 
> > Is it reasonable to swap the order in match.pd ?
> 
> if-conversion could be teached to swap this (it's if-conversion creating
> the IL for conditional reductions) when valid.  IIRC Robin Dapp also has
> a patch to make if-conversion emit .COND_ADD instead which should make
> it even better to vectorize.

I knew that patch, Robin is trying fixing the issue (in-order reduction)that I
posted.

I have confirm that patch can't help since it didn't modify the code for this
case, we will end up with multiple use in conditional reduction.

The reduction failed since:

  /* If this isn't a nested cycle or if the nested cycle reduction value
     is used ouside of the inner loop we cannot handle uses of the reduction
     value.  */
  if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1)
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "reduction used in loop.\n");
      return NULL;
    }

when  nphi_def_loop_uses  > 1, we failed to vectorize.

I have checked LLVM codes, and I think we can extend this function:

strip_nop_cond_scalar_reduction

We should be able to strip all the statement until we can reach the
use of PHI result, like this:

LLVM is able to handle this case:

for ()
  if (cond)
    result += a[i] + b[i] + c[i] + .... 

No matter how many variables are added in the condition reduction.
They well handle that since they keep iterating all the statement until
reach the result:

result_ssa_1 = PHI <>
tmp1 = result_ssa_1 + a[i];
tmp2 = tmp1 + b[i];
tmp3 = tmp2 + c[i];
....

We keep iterating until find the result_ssa_1 to hold the reduction variable.

Is this LLVM's approach reasonable to GCC?

If yes, I can translate LLVM code into GCC.

Thanks.

next prev parent reply	other threads:[~2023-09-27  7:34 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
2023-03-10 10:39 ` [Bug c/109088] " ubizjak at gmail dot com
2023-03-10 12:39 ` [Bug tree-optimization/109088] " rguenth at gcc dot gnu.org
2023-03-10 13:16 ` juzhe.zhong at rivai dot ai
2023-03-10 14:04 ` pinskia at gcc dot gnu.org
2023-03-10 14:09 ` [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction pinskia at gcc dot gnu.org
2023-09-26 12:14 ` juzhe.zhong at rivai dot ai
2023-09-27  2:45 ` juzhe.zhong at rivai dot ai
2023-09-27  2:58 ` juzhe.zhong at rivai dot ai
2023-09-27  7:15 ` rguenth at gcc dot gnu.org
2023-09-27  7:34 ` juzhe.zhong at rivai dot ai [this message]
2023-09-27  9:06 ` rguenth at gcc dot gnu.org
2023-09-27  9:27 ` juzhe.zhong at rivai dot ai
2023-09-27 14:11 ` juzhe.zhong at rivai dot ai
2023-10-06  9:44 ` rguenth at gcc dot gnu.org
2023-11-10 12:02 ` juzhe.zhong at rivai dot ai
2023-11-10 13:10 ` rguenth at gcc dot gnu.org
2023-11-10 13:42 ` juzhe.zhong at rivai dot ai
2023-11-15 14:09 ` rguenth at gcc dot gnu.org
2023-11-15 14:38 ` juzhe.zhong at rivai dot ai
2023-11-15 14:42 ` rguenther at suse dot de
2023-11-16  1:06 ` juzhe.zhong at rivai dot ai
2023-11-16  6:50 ` rguenther at suse dot de

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-109088-4-PcK74xZbZ2@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).