[Bug tree-optimization/110428] New: missed CSE with VLA vectors

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/110428] New: missed CSE with VLA vectors
@ 2023-06-27  8:37 rguenth at gcc dot gnu.org
  2023-06-27  8:39 ` [Bug tree-optimization/110428] " rguenth at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-27  8:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428

            Bug ID: 110428
           Summary: missed CSE with VLA vectors
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

#include <stdint.h>

void __attribute__((noinline,noclone))
foo (uint16_t *out, uint16_t *res)
{
  int mask[] = { 0, 1, 1, 1, 1, 1, 1, 1 };
  int i;
  for (i = 0; i < 8; ++i)
    {
      if (mask[i])
        out[i] = 33;
    }
  uint16_t o0 = out[0];
  uint16_t o7 = out[3];
  uint16_t o14 = out[6];
  uint16_t o15 = out[7];
  res[0] = o0;
  res[2] = o7;
  res[4] = o14;
  res[6] = o15;
}

With -march=armv9.3-a -O3 -g0 -fno-vect-cost-model we fail to CSE the
out[] loads after vectorization.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/110428] missed CSE with VLA vectors
  2023-06-27  8:37 [Bug tree-optimization/110428] New: missed CSE with VLA vectors rguenth at gcc dot gnu.org
@ 2023-06-27  8:39 ` rguenth at gcc dot gnu.org
  2023-06-27 10:12 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-27  8:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |aarch64
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
On x86_64 for example with -march=znver4 we can perform the required CSE.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/110428] missed CSE with VLA vectors
  2023-06-27  8:37 [Bug tree-optimization/110428] New: missed CSE with VLA vectors rguenth at gcc dot gnu.org
  2023-06-27  8:39 ` [Bug tree-optimization/110428] " rguenth at gcc dot gnu.org
@ 2023-06-27 10:12 ` rguenth at gcc dot gnu.org
  2023-06-27 10:23 ` juzhe.zhong at rivai dot ai
  2023-06-27 10:34 ` juzhe.zhong at rivai dot ai
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-27 10:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
.

*** This bug has been marked as a duplicate of bug 110430 ***

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/110428] missed CSE with VLA vectors
  2023-06-27  8:37 [Bug tree-optimization/110428] New: missed CSE with VLA vectors rguenth at gcc dot gnu.org
  2023-06-27  8:39 ` [Bug tree-optimization/110428] " rguenth at gcc dot gnu.org
  2023-06-27 10:12 ` rguenth at gcc dot gnu.org
@ 2023-06-27 10:23 ` juzhe.zhong at rivai dot ai
  2023-06-27 10:34 ` juzhe.zhong at rivai dot ai
  3 siblings, 0 replies; 5+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-06-27 10:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |juzhe.zhong at rivai dot ai

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi, I think for VLS vectors, we should be able the enhance CSE for this
following case:

#include <stdint.h>

void __attribute__((noinline,noclone))
foo (int *out, int *res, unsigned int n)
{
  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
  int i;
  for (i = 0; i < n+16; ++i)
    {
      if (mask[i])
        out[i] = i;
    }
  int o0 = out[0];
  int o7 = out[7];
  int o14 = out[14];
  int o15 = out[15];
  res[0] = o0;
  res[2] = o7;
  res[4] = o14;
  res[6] = o15;
}

since n is unsigned int number, i < n + 16, ARM SVE fail to CSE.
Is it right?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/110428] missed CSE with VLA vectors
  2023-06-27  8:37 [Bug tree-optimization/110428] New: missed CSE with VLA vectors rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-06-27 10:23 ` juzhe.zhong at rivai dot ai
@ 2023-06-27 10:34 ` juzhe.zhong at rivai dot ai
  3 siblings, 0 replies; 5+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-06-27 10:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428

--- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to JuzheZhong from comment #3)
> Hi, I think for VLS vectors, we should be able the enhance CSE for this
> following case:
> 
> #include <stdint.h>
> 
> void __attribute__((noinline,noclone))
> foo (int *out, int *res, unsigned int n)
> {
>   int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
>   int i;
>   for (i = 0; i < n+16; ++i)
>     {
>       if (mask[i])
>         out[i] = i;
>     }
>   int o0 = out[0];
>   int o7 = out[7];
>   int o14 = out[14];
>   int o15 = out[15];
>   res[0] = o0;
>   res[2] = o7;
>   res[4] = o14;
>   res[6] = o15;
> }
> 
> since n is unsigned int number, i < n + 16, ARM SVE fail to CSE.
> Is it right?


Maybe this case is too complicated, I try this following case:


void __attribute__((noinline,noclone))
foo (int *out, int *res, unsigned int n)
{
  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
  int i;
  for (i = 0; i < 16; ++i)
    {
      if (mask[i])
        out[i] = i;
    }
  for (i = 16; i < n + 16; ++i)
    {
      if (mask[i])
        out[i] = i;
    }
  int o0 = out[0];
  int o7 = out[7];
  int o14 = out[14];
  int o15 = out[15];
  res[0] = o0;
  res[2] = o7;
  res[4] = o14;
  res[6] = o15;
}

Such case is simpler, it should be CSE? I tried on SVE, GCC failed to CSE.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-06-27 10:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-27  8:37 [Bug tree-optimization/110428] New: missed CSE with VLA vectors rguenth at gcc dot gnu.org
2023-06-27  8:39 ` [Bug tree-optimization/110428] " rguenth at gcc dot gnu.org
2023-06-27 10:12 ` rguenth at gcc dot gnu.org
2023-06-27 10:23 ` juzhe.zhong at rivai dot ai
2023-06-27 10:34 ` juzhe.zhong at rivai dot ai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).