public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/106989] New: GCC fail to vectorize and clang succeed
@ 2022-09-20 23:05 juzhe.zhong at rivai dot ai
  2022-09-20 23:21 ` [Bug tree-optimization/106989] " pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2022-09-20 23:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989

            Bug ID: 106989
           Summary: GCC fail to vectorize and clang succeed
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

https://godbolt.org/z/v5arbjh3n

This case ARM-clang can vectorize but ARM-GCC failed.
Can anyone fix it? Or give me some guideline to fix it?


code:
typedef float real_t;

#define iterations 100000
#define LEN_1D 32000
#define LEN_2D 256
real_t flat_2d_array[LEN_2D*LEN_2D];

real_t x[LEN_1D];

real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];

int indx[LEN_1D];

real_t* __restrict__ xx;
real_t* yy;
real_t s243(void)
{
    for (int nl = 0; nl < iterations; nl++) {
        for (int i = 0; i < LEN_1D-1; i++) {
            a[i] = b[i] + c[i  ] * d[i];
            b[i] = a[i] + d[i  ] * e[i];
            a[i] = b[i] + a[i+1] * d[i];
        }
    }
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
  2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
@ 2022-09-20 23:21 ` pinskia at gcc dot gnu.org
  2022-09-20 23:46 ` crazylht at gmail dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-20 23:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947
           Keywords|                            |missed-optimization
          Component|c                           |tree-optimization
           Severity|normal                      |enhancement


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
  2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
  2022-09-20 23:21 ` [Bug tree-optimization/106989] " pinskia at gcc dot gnu.org
@ 2022-09-20 23:46 ` crazylht at gmail dot com
  2022-09-21  0:42 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2022-09-20 23:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---

> real_t* __restrict__ xx;
> real_t* yy;
> real_t s243(void)
> {
>     for (int nl = 0; nl < iterations; nl++) {
>         for (int i = 0; i < LEN_1D-1; i++) {
>             a[i] = b[i] + c[i  ] * d[i];
>             b[i] = a[i] + d[i  ] * e[i];
>             a[i] = b[i] + a[i+1] * d[i];
>         }
>     }
> }

Manually change the code to below, gcc can vectorize the loop.
real_t s243(void)
{
    for (int nl = 0; nl < iterations; nl++) {
        for (int i = 0; i < LEN_1D-1; i++) {
//          a[i] = b[i] + c[i  ] * d[i]; propagate it into next line.
            b[i] = b[i] + c[i  ] * d[i] + d[i  ] * e[i];
            a[i] = b[i] + a[i+1] * d[i];
        }
    }
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
  2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
  2022-09-20 23:21 ` [Bug tree-optimization/106989] " pinskia at gcc dot gnu.org
  2022-09-20 23:46 ` crazylht at gmail dot com
@ 2022-09-21  0:42 ` pinskia at gcc dot gnu.org
  2022-09-21  0:45 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-21  0:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
/app/example.cpp:20:25: note:   Detected interleaving store a[i_27] and a[i_27]
/app/example.cpp:20:25: note:   Queuing group with duplicate access for fixup
/app/example.cpp:20:25: note:   zero step in outer loop.
/app/example.cpp:20:25: note:   zero step in outer loop.
/app/example.cpp:20:25: missed:   not vectorized: complicated access pattern.
/app/example.cpp:22:18: missed:   not vectorized: complicated access pattern.
/app/example.cpp:20:25: missed:  bad data access.
...

/app/example.cpp:21:27: note:   dependence distance  = 0.
/app/example.cpp:21:27: note:   dependence distance == 0 between b[i_27] and
b[i_27]
/app/example.cpp:21:27: note:   dependence distance  = 1.
/app/example.cpp:22:18: missed:   not vectorized, possible dependence between
data-refs a[i_27] and a[_9]
/app/example.cpp:21:27: missed:  bad data dependence.
/app/example.cpp:21:27: note:  ***** Analysis  failed with vector mode V4SF

There is a missing DSE before hand:
  # VUSE <.MEM_28>
  _1 = bD.3768[i_27];
  # VUSE <.MEM_28>
  _2 = cD.3769[i_27];
  # VUSE <.MEM_28>
  _3 = dD.3770[i_27];
  _4 = _2 * _3;
  _5 = _1 + _4;
  # .MEM_19 = VDEF <.MEM_28>
  aD.3767[i_27] = _5;
  # VUSE <.MEM_19>
  _6 = eD.3771[i_27];
  _7 = _3 * _6;
  _8 = _5 + _7;
  # .MEM_20 = VDEF <.MEM_19>
  bD.3768[i_27] = _8;
  # RANGE [irange] int [1, 31999] NONZERO 0x7fff
  _9 = i_27 + 1;
  # VUSE <.MEM_20>
  _10 = aD.3767[_9];
  _11 = _3 * _10;
  _12 = _8 + _11;
  # .MEM_21 = VDEF <.MEM_20>
  aD.3767[i_27] = _12;

DSE does not notice the store defining MEM_19 does touch the load:
  # VUSE <.MEM_20>
  _10 = aD.3767[_9];

And that it is redudent with the store defining MEM_21.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
  2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
                   ` (2 preceding siblings ...)
  2022-09-21  0:42 ` pinskia at gcc dot gnu.org
@ 2022-09-21  0:45 ` pinskia at gcc dot gnu.org
  2022-09-21  0:50 ` juzhe.zhong at rivai dot ai
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-21  0:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The DSE happens but only at the RTL level ....

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
  2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
                   ` (3 preceding siblings ...)
  2022-09-21  0:45 ` pinskia at gcc dot gnu.org
@ 2022-09-21  0:50 ` juzhe.zhong at rivai dot ai
  2022-09-21  8:11 ` rguenth at gcc dot gnu.org
  2022-09-22 15:56 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2022-09-21  0:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989

--- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Andrew Pinski from comment #3)
> The DSE happens but only at the RTL level ....

Is it a good idea to do data-ref in DSE and remove the first redundant store?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
  2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
                   ` (4 preceding siblings ...)
  2022-09-21  0:50 ` juzhe.zhong at rivai dot ai
@ 2022-09-21  8:11 ` rguenth at gcc dot gnu.org
  2022-09-22 15:56 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-21  8:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-09-21
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #4)
> (In reply to Andrew Pinski from comment #3)
> > The DSE happens but only at the RTL level ....
> 
> Is it a good idea to do data-ref in DSE and remove the first redundant store?

Probably - of course that makes things more expensive.  The case at hand
can probably be handled by what LIMs mem_refs_may_alias_p does, using

  get_inner_reference_aff (mem1->mem.ref, &off1, &size1);
  get_inner_reference_aff (mem2->mem.ref, &off2, &size2);
  aff_combination_expand (&off1, ttae_cache);
  aff_combination_expand (&off2, ttae_cache);
  aff_combination_scale (&off1, -1);
  aff_combination_add (&off2, &off1);

  if (aff_comb_cannot_overlap_p (&off2, size1, size2))
    return false;

but I'm not sure we should add more code doing things like that ...

If we think that firing up dataref analysis at the point we discover the
possible use is too expensive we could also optimistically queue them
and only when we find a killing def (and thus the store would be dead),
process the queued uses, checking them if they are really important.

But well, maybe just try the simplest approach and measure the compile-time
effect.  That is, in

          /* If the statement is a use the store is not dead.  */
          else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
            {
              /* Handle common cases where we can easily build an ao_ref
                 structure for USE_STMT and in doing so we find that the
                 references hit non-live bytes and thus can be ignored.

for a gimple assignment, check dr_may_alias_p after analyzing both
stmts (we can of course at least cache the DR for 'stmt').  Guarded
with flag_expensive_optimizations (thus -O2+).  You also need to then
initialize loops and SCEV.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
  2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
                   ` (5 preceding siblings ...)
  2022-09-21  8:11 ` rguenth at gcc dot gnu.org
@ 2022-09-22 15:56 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-22 15:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |DUPLICATE
             Status|NEW                         |RESOLVED

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Exact dup of bug 99407.

*** This bug has been marked as a duplicate of bug 99407 ***

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-22 15:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
2022-09-20 23:21 ` [Bug tree-optimization/106989] " pinskia at gcc dot gnu.org
2022-09-20 23:46 ` crazylht at gmail dot com
2022-09-21  0:42 ` pinskia at gcc dot gnu.org
2022-09-21  0:45 ` pinskia at gcc dot gnu.org
2022-09-21  0:50 ` juzhe.zhong at rivai dot ai
2022-09-21  8:11 ` rguenth at gcc dot gnu.org
2022-09-22 15:56 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).