public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102512] New: Redudant max/min operation for vector reduction
@ 2021-09-28  7:28 crazylht at gmail dot com
  2021-09-28  7:41 ` [Bug tree-optimization/102512] Redundant max/min operation before " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: crazylht at gmail dot com @ 2021-09-28  7:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102512

            Bug ID: 102512
           Summary: Redudant max/min operation for vector reduction
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---
              Host: x86_64-pc-linux-gnu
            Target: x86_64-*-* i?86-*-*

cat test.c

#define MAX(a, b) ((a) > (b) ? (a) : (b))

short
foo1 (short* p)
{
  short max = p[0];
  for (int i = 0; i != 8; i++)
    max = MAX(max, p[i]);
  return max;
}

short
foo2 (short* p)
{
  short max = p[0];
  for (int i = 1; i != 8; i++)
    max = MAX(max, p[i]);
  return max;
}

gcc -O3 -mavx2 -S 



in foo1 the first MAX_EXPR <_10, vect__4.7_13> is redundant since it's
contained by the latter .REDUC_MAX.
in foo2 vectorizer failed to recognize .REDUC_MAX pattern. 

;; Function foo1 (foo1, funcdef_no=0, decl_uid=2991, cgraph_uid=1,
symbol_order=0)

.248t.optimized
short int foo1 (short int * p)
{
  vector(8) short int vect_max_11.8;
  vector(8) short int vect__4.7;
  short int max;
  vector(8) short int _10;
  short int _20;

  <bb 2> [local count: 119292720]:
  max_9 = *p_8(D);
  _10 = {max_9, max_9, max_9, max_9, max_9, max_9, max_9, max_9};
  vect__4.7_13 = MEM <vector(8) short int> [(short int *)p_8(D)];
  vect_max_11.8_14 = MAX_EXPR <_10, vect__4.7_13>;
  _20 = .REDUC_MAX (vect_max_11.8_14); [tail call]
  return _20;

}



;; Function foo2 (foo2, funcdef_no=1, decl_uid=3000, cgraph_uid=2,
symbol_order=1)

short int foo2 (short int * p)
{
  short int stmp_max_11.21;
  vector(4) short int vect_max_11.20;
  vector(4) short int vect__4.19;
  short int max;
  short int _4;
  short int _25;
  vector(4) short int _30;
  short int _34;
  vector(4) short int _38;
  vector(4) short int _39;
  vector(4) short int _40;
  vector(4) short int _41;
  short int _44;
  short int _46;

  <bb 2> [local count: 268435454]:
  max_9 = *p_8(D);
  _30 = {max_9, max_9, max_9, max_9};
  vect__4.19_35 = MEM <vector(4) short int> [(short int *)p_8(D) + 2B];
  vect_max_11.20_36 = MAX_EXPR <_30, vect__4.19_35>;
  _38 = VEC_PERM_EXPR <vect_max_11.20_36, { 0, 0, 0, 0 }, { 2, 3, 4, 5 }>;
  _39 = MAX_EXPR <vect_max_11.20_36, _38>;
  _40 = VEC_PERM_EXPR <_39, { 0, 0, 0, 0 }, { 1, 2, 3, 4 }>;
  _41 = MAX_EXPR <_39, _40>;
  stmp_max_11.21_42 = BIT_FIELD_REF <_41, 16, 0>;
  _4 = MEM[(short int *)p_8(D) + 10B];
  _46 = MEM[(short int *)p_8(D) + 12B];
  _34 = MAX_EXPR <_4, _46>;
  _25 = MEM[(short int *)p_8(D) + 14B];
  _44 = MAX_EXPR <_25, stmp_max_11.21_42>;
  max_26 = MAX_EXPR <_34, _44>;
  return max_26;

}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/102512] Redundant max/min operation before vector reduction
  2021-09-28  7:28 [Bug tree-optimization/102512] New: Redudant max/min operation for vector reduction crazylht at gmail dot com
@ 2021-09-28  7:41 ` pinskia at gcc dot gnu.org
  2021-09-28  9:07 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-28  7:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102512

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-09-28
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I wonder if the prologue for the second case we if we know the size was
originally greater than 4 just do an overlap load and do the max.
This won't fix the issue fully but it will produce better code than we
currently do.

Otherwise confirmed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/102512] Redundant max/min operation before vector reduction
  2021-09-28  7:28 [Bug tree-optimization/102512] New: Redudant max/min operation for vector reduction crazylht at gmail dot com
  2021-09-28  7:41 ` [Bug tree-optimization/102512] Redundant max/min operation before " pinskia at gcc dot gnu.org
@ 2021-09-28  9:07 ` rguenth at gcc dot gnu.org
  2021-09-28  9:08 ` rguenth at gcc dot gnu.org
  2021-12-29  6:05 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-28  9:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102512

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
  max_9 = *p_8(D);
  _10 = {max_9, max_9, max_9, max_9, max_9, max_9, max_9, max_9};
  vect__4.7_13 = MEM <vector(8) short int> [(short int *)p_8(D)];
  vect_max_11.8_14 = MAX_EXPR <_10, vect__4.7_13>;
  _20 = .REDUC_MAX (vect_max_11.8_14); [tail call]

it's a bit difficult to improve here - match.pd doesn't like MEMs too much
and this all just collapses because _10 is a splat of element zero of
vect__4.7_13 ...

In theory the vectorizer could use the first full vector as initial value
or of course a vector of all SHORT_MIN.  But the intent of using the first
scalar value was that this would optimize better ...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/102512] Redundant max/min operation before vector reduction
  2021-09-28  7:28 [Bug tree-optimization/102512] New: Redudant max/min operation for vector reduction crazylht at gmail dot com
  2021-09-28  7:41 ` [Bug tree-optimization/102512] Redundant max/min operation before " pinskia at gcc dot gnu.org
  2021-09-28  9:07 ` rguenth at gcc dot gnu.org
@ 2021-09-28  9:08 ` rguenth at gcc dot gnu.org
  2021-12-29  6:05 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-28  9:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102512

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
>   max_9 = *p_8(D);
>   _10 = {max_9, max_9, max_9, max_9, max_9, max_9, max_9, max_9};
>   vect__4.7_13 = MEM <vector(8) short int> [(short int *)p_8(D)];
>   vect_max_11.8_14 = MAX_EXPR <_10, vect__4.7_13>;
>   _20 = .REDUC_MAX (vect_max_11.8_14); [tail call]
> 
> it's a bit difficult to improve here - match.pd doesn't like MEMs too much
> and this all just collapses because _10 is a splat of element zero of
> vect__4.7_13 ...
> 
> In theory the vectorizer could use the first full vector as initial value
> or of course a vector of all SHORT_MIN.  But the intent of using the first
> scalar value was that this would optimize better ...

That is, the alternative is to apply the 'short max = p[0]' "bias" after
the epilogue and have the initial value be { SHORT_MIN, ... }.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/102512] Redundant max/min operation before vector reduction
  2021-09-28  7:28 [Bug tree-optimization/102512] New: Redudant max/min operation for vector reduction crazylht at gmail dot com
                   ` (2 preceding siblings ...)
  2021-09-28  9:08 ` rguenth at gcc dot gnu.org
@ 2021-12-29  6:05 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-29  6:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102512

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-29  6:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28  7:28 [Bug tree-optimization/102512] New: Redudant max/min operation for vector reduction crazylht at gmail dot com
2021-09-28  7:41 ` [Bug tree-optimization/102512] Redundant max/min operation before " pinskia at gcc dot gnu.org
2021-09-28  9:07 ` rguenth at gcc dot gnu.org
2021-09-28  9:08 ` rguenth at gcc dot gnu.org
2021-12-29  6:05 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).