[Bug c/112457] New: Possible better vectorization of different reduction min/max reduction

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/112457] New: Possible better vectorization of different reduction min/max reduction
@ 2023-11-09 12:27 juzhe.zhong at rivai dot ai
  2023-11-09 12:29 ` [Bug c/112457] " juzhe.zhong at rivai dot ai
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-09 12:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112457

            Bug ID: 112457
           Summary: Possible better vectorization of different reduction
                    min/max reduction
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Hi, Richard.

GCC-14 almost has all features of RVV.

I am planning to participate on improving GCC loop vectorizer in GCC-15.

Fix FAILs of TSVC is one of my plan.

Currently we can vectorize this following case:

int idx = 0;
int max = 0;
void foo (int n, int * __restrict a){
for (int i = 0; i < n; ++i) {
  max = max < a[i] ? a[i] : max;
}
}

However, if we change this case it failed:

void foo2 (int n, int * __restrict a){
for (int i = 0; i < n; ++i) {
  if (max < a[i]) {
    max = a[i];
  } else
    max = max;
}
}

Now, I notice another interesting and possible vectorization enhancement which
inspired by this patch of LLVM:
https://reviews.llvm.org/D143465

And more advance case is which is case from LLVM patch:
which is vectorization reduction with index:

void foo3 (int n, int * __restrict a){
for (int i = 0; i < n; ++i) {
  if (max < a[i]) {
    idx = i;
    max = a[i];
  }
}
}

I wonder it is a valuable optimization ? If yes, it would be one of my TODO
list.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug c/112457] Possible better vectorization of different reduction min/max reduction
  2023-11-09 12:27 [Bug c/112457] New: Possible better vectorization of different reduction min/max reduction juzhe.zhong at rivai dot ai
@ 2023-11-09 12:29 ` juzhe.zhong at rivai dot ai
  2023-11-09 12:45 ` [Bug tree-optimization/112457] " rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-09 12:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112457

--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Reference: https://godbolt.org/z/9M1jWzMdx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/112457] Possible better vectorization of different reduction min/max reduction
  2023-11-09 12:27 [Bug c/112457] New: Possible better vectorization of different reduction min/max reduction juzhe.zhong at rivai dot ai
  2023-11-09 12:29 ` [Bug c/112457] " juzhe.zhong at rivai dot ai
@ 2023-11-09 12:45 ` rguenth at gcc dot gnu.org
  2024-01-02  8:58 ` juzhe.zhong at rivai dot ai
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-09 12:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112457

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |tree-optimization
             Blocks|                            |53947

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Well, this is because MAX_EXPR detection fails when store motion inserts flags
(the max = max is elided) to avoid store-data races.  Also when using
-Ofast we avoid this but then the next phiopt comes too late to discover
MAX after store motion is applied.

The more practical example is

int foo2 (int max, int n, int * __restrict a)
{
  for (int i = 0; i < n; ++i)
    if (max < a[i]) {
        max = a[i];
    }
  return max;
}

and that's handled OK.  For your second example, index reduction, there's
already bugreports.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/112457] Possible better vectorization of different reduction min/max reduction
  2023-11-09 12:27 [Bug c/112457] New: Possible better vectorization of different reduction min/max reduction juzhe.zhong at rivai dot ai
  2023-11-09 12:29 ` [Bug c/112457] " juzhe.zhong at rivai dot ai
  2023-11-09 12:45 ` [Bug tree-optimization/112457] " rguenth at gcc dot gnu.org
@ 2024-01-02  8:58 ` juzhe.zhong at rivai dot ai
  2024-01-02 16:03 ` xry111 at gcc dot gnu.org
  2024-01-08  9:00 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-02  8:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112457

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Created attachment 56973
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56973&action=edit
min/max reduction approach with index

Hi, Richi.

I have watch all PPT/video of 2023 llvm development meeting.

Turns out they already have a feasible solution/approach to support min/max
reduction with index.

Is it Ok that I support it by following the LLVM approach ?

The attachment is the PPT of LLVM development meeting that mentioned min/max
reduction with index.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/112457] Possible better vectorization of different reduction min/max reduction
  2023-11-09 12:27 [Bug c/112457] New: Possible better vectorization of different reduction min/max reduction juzhe.zhong at rivai dot ai
                   ` (2 preceding siblings ...)
  2024-01-02  8:58 ` juzhe.zhong at rivai dot ai
@ 2024-01-02 16:03 ` xry111 at gcc dot gnu.org
  2024-01-08  9:00 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: xry111 at gcc dot gnu.org @ 2024-01-02 16:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112457

Xi Ruoyao <xry111 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |xry111 at gcc dot gnu.org

--- Comment #4 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
There is also:

double
test (double *p)
{
  double ret = p[0];
  for (int i = 1; i < 4; i++)
    ret = __builtin_fmin (ret, p[i]);
  return ret;
}

This is not vectorized.

And

double
test (double *p)
{
  double ret = __builtin_inf(); /* or __builtin_nan("") */
  for (int i = 0; i < 4; i++)
    ret = __builtin_fmin (ret, p[i]);
  return ret;
}

is compiled to:

  _16 = .REDUC_FMIN (vect__4.7_17);
  _22 = .REDUC_FMIN ({  Inf,  Inf,  Inf,  Inf }); 
  _20 = .FMIN (_16, _22); [tail call]
  return _20;

So there is an redundant .FMIN operation.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/112457] Possible better vectorization of different reduction min/max reduction
  2023-11-09 12:27 [Bug c/112457] New: Possible better vectorization of different reduction min/max reduction juzhe.zhong at rivai dot ai
                   ` (3 preceding siblings ...)
  2024-01-02 16:03 ` xry111 at gcc dot gnu.org
@ 2024-01-08  9:00 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-08  9:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112457

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
You want to find the duplicate bugreport for the min/max + index reductions,
IIRC the issue is that we fail the reduction detection because of multi-use
and we should really have two conditional reductions, one on the value and
one on the index without trying to be too clever combining them into a single
one.

That is, don't try to invent sth completely new based on what LLVM does but
understand what's missing in GCCs handling of conditional reductions
(it can do conditional value and conditional index reductions just fine,
just not both at the same time IIRC).

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-08  9:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-09 12:27 [Bug c/112457] New: Possible better vectorization of different reduction min/max reduction juzhe.zhong at rivai dot ai
2023-11-09 12:29 ` [Bug c/112457] " juzhe.zhong at rivai dot ai
2023-11-09 12:45 ` [Bug tree-optimization/112457] " rguenth at gcc dot gnu.org
2024-01-02  8:58 ` juzhe.zhong at rivai dot ai
2024-01-02 16:03 ` xry111 at gcc dot gnu.org
2024-01-08  9:00 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).