public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can
@ 2024-08-12  6:50 kugan at gcc dot gnu.org
  2024-08-12  7:03 ` [Bug tree-optimization/116338] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: kugan at gcc dot gnu.org @ 2024-08-12  6:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338

            Bug ID: 116338
           Summary: GCC is not vectoring TSVC s255 while clang can
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kugan at gcc dot gnu.org
  Target Milestone: ---

reduced test case:

typedef float real_t;
extern __attribute__((aligned(64))) real_t a[32000], b[32000];

void s255()
{   
    real_t x, y;
    x = b[32000 -1];
    y = b[32000 -2];
    for (int i = 0; i < 32000; i++) {
        a[i] = (b[i] + x + y) * (real_t).333;
        y = x;
        x = b[i];
    }

}

gcc is not able to vectorize the loop whereas clang can. See
https://godbolt.org/z/64Kxaahqr

gcc -v
Using built-in specs.
COLLECT_GCC=/home/kvivekananda/install/bin/gcc
COLLECT_LTO_WRAPPER=/home/kvivekananda/install/libexec/gcc/aarch64-unknown-linux-gnu/15.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc_base/configure --prefix=/home/kvivekananda/install/
--enable-languages=c,c++,fortran,lto,objc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 15.0.0 20240618 (experimental) (GCC)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can
  2024-08-12  6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
@ 2024-08-12  7:03 ` pinskia at gcc dot gnu.org
  2024-08-19 14:15 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-12  7:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-08-12
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can
  2024-08-12  6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
  2024-08-12  7:03 ` [Bug tree-optimization/116338] " pinskia at gcc dot gnu.org
@ 2024-08-19 14:15 ` rguenth at gcc dot gnu.org
  2024-08-20  7:24 ` kugan at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-08-19 14:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is the recurrence

  <bb 2> [local count: 10737416]:
  x_10 = b[31999];
  y_11 = b[31998];

  <bb 3> [local count: 1063004408]:
  # x_18 = PHI <_1(5), x_10(2)>
  # y_19 = PHI <x_18(5), y_11(2)>
  _1 = b[i_20];
..

  <bb 5> [local count: 1052266995]:
  goto <bb 3>; [100.00%]

we handle some cases via vect_phi_first_order_recurrence_p, somebody needs
to dig in why this one isn't (or can't be) handled with that mechanism.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can
  2024-08-12  6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
  2024-08-12  7:03 ` [Bug tree-optimization/116338] " pinskia at gcc dot gnu.org
  2024-08-19 14:15 ` rguenth at gcc dot gnu.org
@ 2024-08-20  7:24 ` kugan at gcc dot gnu.org
  2024-08-20  7:43 ` rguenth at gcc dot gnu.org
  2024-08-21  3:45 ` kugan at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: kugan at gcc dot gnu.org @ 2024-08-20  7:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338

--- Comment #3 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #2)
> The issue is the recurrence
> 
>   <bb 2> [local count: 10737416]:
>   x_10 = b[31999];
>   y_11 = b[31998];
> 
>   <bb 3> [local count: 1063004408]:
>   # x_18 = PHI <_1(5), x_10(2)>
>   # y_19 = PHI <x_18(5), y_11(2)>
>   _1 = b[i_20];
> ..
> 
>   <bb 5> [local count: 1052266995]:
>   goto <bb 3>; [100.00%]
> 
> we handle some cases via vect_phi_first_order_recurrence_p, somebody needs
> to dig in why this one isn't (or can't be) handled with that mechanism.

  /* Ensure the loop latch definition is from within the loop.  */
  edge latch = loop_latch_edge (loop);
  tree ldef = PHI_ARG_DEF_FROM_EDGE (phi, latch);
  if (TREE_CODE (ldef) != SSA_NAME
      || SSA_NAME_IS_DEFAULT_DEF (ldef)
      || is_a <gphi *> (SSA_NAME_DEF_STMT (ldef))
      || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (ldef))))
    return false;

(gdb) p debug_tree (ldef)
 <ssa_name 0xfffff7979900
    type <real_type 0xfffff796d0a8 real_t sizes-gimplified SF
        size <integer_cst 0xfffff7a86150 constant 32>
        unit-size <integer_cst 0xfffff7a86168 constant 4>
        align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0xfffff7a8b2a0 precision:32
        pointer_to_this <pointer_type 0xfffff79b2b28>>
    visited var <var_decl 0xfffff79b1510 x>
    def_stmt x_18 = PHI <_1(5), x_10(2)>
    version:18>
$1 = void


That is PHI arg defined along the loop latch is also PHI stmt in the case.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can
  2024-08-12  6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-08-20  7:24 ` kugan at gcc dot gnu.org
@ 2024-08-20  7:43 ` rguenth at gcc dot gnu.org
  2024-08-21  3:45 ` kugan at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-08-20  7:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
You can try to see whether adding a SSA copy would make this supported, it
seems not allowing a PHI is simply a missed feature.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can
  2024-08-12  6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-08-20  7:43 ` rguenth at gcc dot gnu.org
@ 2024-08-21  3:45 ` kugan at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: kugan at gcc dot gnu.org @ 2024-08-21  3:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338

--- Comment #5 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #4)
> You can try to see whether adding a SSA copy would make this supported, it
> seems not allowing a PHI is simply a missed feature.

We now fail in
 /* If this isn't a nested cycle or if the nested cycle reduction value
     is used ouside of the inner loop we cannot handle uses of the reduction
     value.  */
  if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1)

Even if I comment this, I see:
t1.c:16:25: note:   worklist: examine stmt: _22 = x_18 + y_19;
t1.c:16:25: note:   vect_is_simple_use: operand x_18 = PHI <_1(5), x_10(2)>,
type of def: unknown
t1.c:16:25: missed:   Unsupported pattern.
t1.c:10:6: missed:   not vectorized: unsupported use in stmt.
t1.c:16:25: missed:  unexpected pattern.
t1.c:16:25: note:  ***** Analysis failed with vector mode V4SF

Do we need to somehow mark both the PHI stents as part of the first order
reduction?


  <bb 3> [local count: 1063004408]:
  # x_18 = PHI <_1(5), x_10(2)>
  # y_19 = PHI <x_18(5), y_11(2)>
  # i_20 = PHI <i_13(5), 0(2)>
  # ivtmp_17 = PHI <ivtmp_16(5), 32000(2)>
  _1 = b[i_20];
  _22 = x_18 + y_19;
  _3 = _1 + _22;
  _4 = _3 * 3.33000004291534423828125e-1;
  a[i_20] = _4;
  i_13 = i_20 + 1;
  ivtmp_16 = ivtmp_17 - 1;
  if (ivtmp_16 != 0)
    goto <bb 5>; [98.99%]
  else
    goto <bb 4>; [1.01%]

  <bb 5> [local count: 1052266995]:
  goto <bb 3>; [100.00%]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-08-21  3:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-12  6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
2024-08-12  7:03 ` [Bug tree-optimization/116338] " pinskia at gcc dot gnu.org
2024-08-19 14:15 ` rguenth at gcc dot gnu.org
2024-08-20  7:24 ` kugan at gcc dot gnu.org
2024-08-20  7:43 ` rguenth at gcc dot gnu.org
2024-08-21  3:45 ` kugan at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).