[Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
@ 2015-05-14 14:18 ysrumyan at gmail dot com
  2015-05-14 14:19 ` [Bug tree-optimization/66142] " ysrumyan at gmail dot com
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: ysrumyan at gmail dot com @ 2015-05-14 14:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

            Bug ID: 66142
           Summary: Loop is not vectorized because not sufficient support
                    for GOMP_SIMD_LANE
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ysrumyan at gmail dot com
  Target Milestone: ---

The attached test-case compiled with "-Ofast -fopenmp -march=core-avx2" options
contains loop marked with pragma omp simd which is not vectorized:
t1.cpp:66:20: note: not vectorized: data ref analysis failed MEM[(struct vec_
*)_9] = 1.0e+0;
where index is defined as
  _12 = GOMP_SIMD_LANE (simduid.0_11(D));
  _9 = &D.4231[_12].org;

Note that this test was extracted from important benchmark and loop in question
is vectorized if we revert fix for 59984.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
@ 2015-05-14 14:19 ` ysrumyan at gmail dot com
  2015-05-22 10:09 ` jakub at gcc dot gnu.org
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: ysrumyan at gmail dot com @ 2015-05-14 14:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #1 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
Created attachment 35541
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35541&action=edit
test-case to reproduce

Must be compiled with -Ofast -fopenmp -march=core-avx2 options.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
  2015-05-14 14:19 ` [Bug tree-optimization/66142] " ysrumyan at gmail dot com
@ 2015-05-22 10:09 ` jakub at gcc dot gnu.org
  2015-05-22 11:12 ` jakub at gcc dot gnu.org
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-05-22 10:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
On this testcase, I believe it is most important to change FRE and SRA.
Without -fopenmp, we have before fre2:
  MEM[(struct ray_ *)&ray] = 1.0e+0;
  MEM[(struct ray_ *)&ray + 4B] = _8;
  ray.dir.x = x_10;
  ray.dir.y = y_11;
...
  _23 = ray.org.x;
  _26 = ray.org.y;
  _29 = ray.dir.x;
  _31 = ray.dir.y;
and fre2 optimizes this into the values stored in there.  With -fopenmp, we
have the SIMD_LANE based array references:
  _12 = GOMP_SIMD_LANE (simduid.0_11(D));
  _9 = &D.2881[_12].org;
  MEM[(struct vec_ *)_9] = 1.0e+0;
  MEM[(struct vec_ *)_9 + 4B] = _8;
  D.2881[_12].dir.x = x_13;
  D.2881[_12].dir.y = y_14;
...
  _26 = MEM[(const struct Ray *)&D.2881][_12].org.x;
  _29 = MEM[(const struct Ray *)&D.2881][_12].org.y;
  _32 = MEM[(const struct Ray *)&D.2881][_12].dir.x;
  _34 = MEM[(const struct Ray *)&D.2881][_12].dir.y;
that fre2 isn't able to optimize.  Doing something here on the vectorizer side
is too late, we really need to scalarize those or remove completely (through
fre2).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
  2015-05-14 14:19 ` [Bug tree-optimization/66142] " ysrumyan at gmail dot com
  2015-05-22 10:09 ` jakub at gcc dot gnu.org
@ 2015-05-22 11:12 ` jakub at gcc dot gnu.org
  2015-05-22 11:40 ` jakub at gcc dot gnu.org
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-05-22 11:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ah, but fre2 is able to optimize those:
  D.2881[_12].dir.x = x_13;
...
  _32 = MEM[(const struct Ray *)&D.2881][_12].dir.x;
cases to
  _32 = x_13;
The problem in FRE2 is only with the MEM_REFs.
If in the testcase
-      ray.org = p;
+      ray.org.x = p.x;
+      ray.org.y = p.y;
then FRE2 handles it completely and the problem is the same between -fopenmp
and -fno-openmp - ifcvt not ifconverting the loop.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (2 preceding siblings ...)
  2015-05-22 11:12 ` jakub at gcc dot gnu.org
@ 2015-05-22 11:40 ` jakub at gcc dot gnu.org
  2015-05-22 11:54 ` jakub at gcc dot gnu.org
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-05-22 11:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ah, and the reason for the ifcvt failure is that we have an unoptimized
virtual PHI in the loop that ifcvt gives up on:
  # .MEM_55 = PHI <.MEM_55(7), .MEM_5(D)(2)>
Only phicprop2 pass is able to optimize that away (and replace all uses
of .MEM_55 with .MEM_5(D)).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (3 preceding siblings ...)
  2015-05-22 11:40 ` jakub at gcc dot gnu.org
@ 2015-05-22 11:54 ` jakub at gcc dot gnu.org
  2015-05-22 14:51 ` jakub at gcc dot gnu.org
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-05-22 11:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 35596
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35596&action=edit
gcc6-pr66142.patch

Fix for the ifcvt issue.  With this the modified testcase is vectorized.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (4 preceding siblings ...)
  2015-05-22 11:54 ` jakub at gcc dot gnu.org
@ 2015-05-22 14:51 ` jakub at gcc dot gnu.org
  2015-05-25  9:35 ` jakub at gcc dot gnu.org
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-05-22 14:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
As for the FRE (actually SCCVN) issue, seems this affects more than
GOMP_SIMD_LANE, seems there is insufficient canonicalization of the references
performed by SCCVN, so the same references, but in slightly different form,
aren't considered the same.
Please see e.g.
struct A { float x, y; };
struct B { struct A u, v; };
void bar (struct A *);

float
f1 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  struct A *q = &x[y].u;
  *q = p;
  x[y].v.x = 3.0f;
  x[y].v.y = 4.0f;
  float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y;
  bar (&p);
  return f;
}

float
f2 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  x[y].u = p;
  x[y].v.x = 3.0f;
  x[y].v.y = 4.0f;
  float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y;
  bar (&p);
  return f;
}

float
f3 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  struct A *q = &x[y].u;
  __builtin_memcpy (&q, &p.x, sizeof (float));
  __builtin_memcpy (&q->y, &p.y, sizeof (float));
  *q = p;
  x[y].v.x = 3.0f;
  x[y].v.y = 4.0f;
  float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y;
  bar (&p);
  return f;
}

float
f4 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  __builtin_memcpy (&x[y].u.x, &p.x, sizeof (float));
  __builtin_memcpy (&x[y].u.y, &p.y, sizeof (float));
  x[y].v.x = 3.0f;
  x[y].v.y = 4.0f;
  float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y;
  bar (&p);
  return f;
}

at -O2, FRE is able to optimize f2 and f4, but not f1 and f3, despite all 4
functions doing the same thing.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (5 preceding siblings ...)
  2015-05-22 14:51 ` jakub at gcc dot gnu.org
@ 2015-05-25  9:35 ` jakub at gcc dot gnu.org
  2015-05-25 12:48 ` jakub at gcc dot gnu.org
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-05-25  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Slightly improved variant of #c6 testcase:
struct A { float x, y; };
struct B { struct A u; };
void bar (struct A *);

float
f1 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  struct A *q = &x[y].u;
  *q = p;
  float f = x[y].u.x + x[y].u.y;
  bar (&p);
  return f;
}

float
f2 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  x[y].u = p;
  float f = x[y].u.x + x[y].u.y;
  bar (&p);
  return f;
}

float
f3 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  struct A *q = &x[y].u;
  __builtin_memcpy (&q->x, &p.x, sizeof (float));
  __builtin_memcpy (&q->y, &p.y, sizeof (float));
  *q = p;
  float f = x[y].u.x + x[y].u.y;
  bar (&p);
  return f;
}

float
f4 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  __builtin_memcpy (&x[y].u.x, &p.x, sizeof (float));
  __builtin_memcpy (&x[y].u.y, &p.y, sizeof (float));
  float f = x[y].u.x + x[y].u.y;
  bar (&p);
  return f;
}


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (6 preceding siblings ...)
  2015-05-25  9:35 ` jakub at gcc dot gnu.org
@ 2015-05-25 12:48 ` jakub at gcc dot gnu.org
  2015-05-26  8:28 ` rguenther at suse dot de
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-05-25 12:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I guess it depends on what exactly SCCVN uses the operand vectors
created/optimized by copy_reference_ops_from_ref/valueize_refs_1 for.
If it is used only for tracking what is the value of the reference (the value
stored into that piece of memory), or also for canonicalization of references
to certain more canonical form, or both.  If only the value of the reference,
then
supposedly it would be nice to get far more canonicalization in the references
(e.g. turn (perhaps multiple nested) COMPONENT_REFs into MEM_REFs with offset
(except perhaps for bitfields, perhaps even do something about ARRAY_REFs, so
that say even:
struct A { float x, y; };
struct B { struct A u; };
void bar (struct A *);

float
f2 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  char *z = (char *) x;
  z += y * sizeof (struct B);
  z += __builtin_offsetof (struct B, u.y);
  x[y].u = p;
  float f = *(float *) z;
  bar (&p);
  return f;
}
is handled).  But, supposedly it would be undesirable to canonicalize all the
COMPONENT_REFs to the MEM_REFs (at least early on, e.g. for
__builtin_object_size (, 1) purposes, or aliasing etc.).
Richi, any thoughts on this?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (7 preceding siblings ...)
  2015-05-25 12:48 ` jakub at gcc dot gnu.org
@ 2015-05-26  8:28 ` rguenther at suse dot de
  2015-05-26  8:42 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenther at suse dot de @ 2015-05-26  8:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #9 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 25 May 2015, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142
> 
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |rguenth at gcc dot gnu.org
> 
> --- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> --- I guess 
> it depends on what exactly SCCVN uses the operand vectors 
> created/optimized by copy_reference_ops_from_ref/valueize_refs_1 for. If 
> it is used only for tracking what is the value of the reference (the 
> value stored into that piece of memory), or also for canonicalization of 
> references to certain more canonical form, or both.  If only the value 

It's only for tracking the value (and to get a canonical form for
its internal hash-tables and compare function).  I think this bug
may be related to PR63916.

> of the reference, then supposedly it would be nice to get far more 
> canonicalization in the references (e.g. turn (perhaps multiple nested) 
> COMPONENT_REFs into MEM_REFs with offset (except perhaps for bitfields, 
> perhaps even do something about ARRAY_REFs, so that say even: struct A { 
> float x, y; }; struct B { struct A u; }; void bar (struct A *);
> 
> float
> f2 (struct B *x, int y)
> {
>   struct A p;
>   p.x = 1.0f;
>   p.y = 2.0f;
>   char *z = (char *) x;
>   z += y * sizeof (struct B);
>   z += __builtin_offsetof (struct B, u.y);
>   x[y].u = p;
>   float f = *(float *) z;
>   bar (&p);
>   return f;
> }
> is handled).  But, supposedly it would be undesirable to canonicalize all the
> COMPONENT_REFs to the MEM_REFs (at least early on, e.g. for
> __builtin_object_size (, 1) purposes, or aliasing etc.).
> Richi, any thoughts on this?

On the f1 testcase it probably fails to look through the aggregate
copy in vn_reference_lookup_3.  I'll have a short look later.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (8 preceding siblings ...)
  2015-05-26  8:28 ` rguenther at suse dot de
@ 2015-05-26  8:42 ` rguenth at gcc dot gnu.org
  2015-05-26 13:56 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-05-26  8:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2015-05-26
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Testing

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c        (revision 223574)
+++ gcc/tree-ssa-sccvn.c        (working copy)
@@ -1894,7 +1894,12 @@ vn_reference_lookup_3 (ao_ref *ref, tree
       size2 = lhs_ref.size;
       maxsize2 = lhs_ref.max_size;
       if (maxsize2 == -1
-         || (base != base2 && !operand_equal_p (base, base2, 0))
+         || (base != base2
+             && (TREE_CODE (base) != MEM_REF
+                 || TREE_CODE (base2) != MEM_REF
+                 || TREE_OPERAND (base, 0) != TREE_OPERAND (base2, 0)
+                 || !tree_int_cst_equal (TREE_OPERAND (base, 1),
+                                         TREE_OPERAND (base2, 1))))
          || offset2 > offset
          || offset2 + size2 < offset + maxsize)
        return (void *)-1;

which fixes the comment#7 testcase.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (9 preceding siblings ...)
  2015-05-26  8:42 ` rguenth at gcc dot gnu.org
@ 2015-05-26 13:56 ` rguenth at gcc dot gnu.org
  2015-05-26 14:43 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-05-26 13:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Tue May 26 13:55:40 2015
New Revision: 223697

URL: https://gcc.gnu.org/viewcvs?rev=223697&root=gcc&view=rev
Log:
2015-05-26  Richard Biener  <rguenther@suse.de>

        PR tree-optimization/66142
        * tree-ssa-sccvn.c (vn_reference_lookup_3): Manually compare
        MEM_REFs for the same base address.

        * gcc.dg/tree-ssa/ssa-fre-44.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-sccvn.c


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (10 preceding siblings ...)
  2015-05-26 13:56 ` rguenth at gcc dot gnu.org
@ 2015-05-26 14:43 ` rguenth at gcc dot gnu.org
  2015-05-27  7:56 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-05-26 14:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed(?).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (11 preceding siblings ...)
  2015-05-26 14:43 ` rguenth at gcc dot gnu.org
@ 2015-05-27  7:56 ` rguenth at gcc dot gnu.org
  2015-05-28 13:25 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-05-27  7:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
As said in comment #9, this is PR63916 I think.  We have

  _9 = &D.3665[_11].org;
  MEM[(struct vec_ *)_9] = 1.0e+0;
  MEM[(struct vec_ *)_9 + 4B] = _8;
...
  _24 = MEM[(const struct Ray *)&D.3665][_11].org.x;
  _27 = MEM[(const struct Ray *)&D.3665][_11].org.y;

which ultimately is from inlining of

  D.3665[D.3663].org = p;
  D.3665[D.3663].dir.x = x;
  D.3665[D.3663].dir.y = y;
  D.3664[D.3663].t = 1.0e+10;
  D.3664[D.3663].h = 0;
  D.3667 = &D.3665[D.3663];
  D.3668 = &D.3664[D.3663];
  bar (D.3668, D.3667, &spheres[0]);

to

  p.x = 1.0e+0;  
  p.y = _9;
...
  D.3665[_15].org = p;
  D.3665[_15].dir.x = x_16;
  D.3665[_15].dir.y = y_17;
  D.3664[_15].t = 1.0e+10;
  D.3664[_15].h = 0;
  _23 = &D.3665[_15];
  _24 = &D.3664[_15];
  _32 = MEM[(const struct Ray *)_23].org.x;

forwprop cannot forward &D.3665[_15] into the dereference (it would create
a bogus access paths even if types were matching and TBAA would be fine).
And SCCVN isn't able to "canonicalize" the accesses (because it doesn't
look at stmt defs when building the canonical address of an access).
Note that PRE uses the canonical form for expression insertion so we have
to be careful what we put in the canonical form (or separate the address
compute from the actual dereference).

Yes, lowering the access more in the SCCVN machinery would be possible
(say, use the affine combination machinery with the powers of
affine-comb-expand (and making that use the VN lattice...)), but the
issue with this is always PRE insertion and us _not_ wanting to have
all PRE inserted loads be fully lowered to address-compute plus indirection.

I hope to get to PR63916 this stage1.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (12 preceding siblings ...)
  2015-05-27  7:56 ` rguenth at gcc dot gnu.org
@ 2015-05-28 13:25 ` rguenth at gcc dot gnu.org
  2015-05-29 13:07 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-05-28 13:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Thu May 28 13:24:53 2015
New Revision: 223816

URL: https://gcc.gnu.org/viewcvs?rev=223816&root=gcc&view=rev
Log:
2015-05-28  Richard Biener  <rguenther@suse.de>

        PR tree-optimization/66142
        * tree-ssa-sccvn.c (vn_reference_lookup_3): Handle non-GIMPLE
        values better in memcpy destination handling.  Handle non-aliasing
        we discover here.

        * gcc.dg/tree-ssa/ssa-fre-44.c: Fixup.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
    trunk/gcc/tree-ssa-sccvn.c


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (13 preceding siblings ...)
  2015-05-28 13:25 ` rguenth at gcc dot gnu.org
@ 2015-05-29 13:07 ` jakub at gcc dot gnu.org
  2015-06-02 13:18 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-05-29 13:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Author: jakub
Date: Fri May 29 13:06:23 2015
New Revision: 223863

URL: https://gcc.gnu.org/viewcvs?rev=223863&root=gcc&view=rev
Log:
        PR tree-optimization/66142
        * tree-if-conv.c (if_convertible_phi_p): Don't give up on
        virtual phis that feed themselves.

        * gcc.dg/vect/pr66142.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr66142.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-if-conv.c


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (14 preceding siblings ...)
  2015-05-29 13:07 ` jakub at gcc dot gnu.org
@ 2015-06-02 13:18 ` rguenth at gcc dot gnu.org
  2015-06-02 13:34 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-06-02 13:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so even with PR63916 rudimentary fixed we hit the issue that in

  _9 = &D.3665[_11].org;
  MEM[(struct vec_ *)_9] = 1.0e+0;
  MEM[(struct vec_ *)_9 + 4B] = _8;
...
  _24 = MEM[(const struct Ray *)&D.3665][_11].org.x;

the store to MEM[(struct vec_ *)_9 + 4B] aliases MEM[(const struct Ray
*)&D.3665][_11].org.x.

And the rudimentary fix for PR63916 doesn't allow for the forwarding to succeed
into MEM[(struct vec_ *)_9 + 4B] either.

Without SRA we get

  D.3665[_11].org = p;
  D.3665[_11].dir.x = x_12;
  D.3665[_11].dir.y = y_13;
  D.3664[_11].t = 1.0e+10;
  D.3664[_11].h = 0;
  _25 = MEM[(const struct Ray *)&D.3665][_11].org.x;

and the look-through-aggregate-copy code bails out at

      /* See if the assignment kills REF.  */
      base2 = ao_ref_base (&lhs_ref);
      offset2 = lhs_ref.offset;
      size2 = lhs_ref.size;
      maxsize2 = lhs_ref.max_size;
      if (maxsize2 == -1
          || (base != base2
              && (TREE_CODE (base) != MEM_REF
                  || TREE_CODE (base2) != MEM_REF
                  || TREE_OPERAND (base, 0) != TREE_OPERAND (base2, 0)
                  || !tree_int_cst_equal (TREE_OPERAND (base, 1),
                                          TREE_OPERAND (base2, 1))))
          || offset2 > offset
          || offset2 + size2 < offset + maxsize)
        return (void *)-1;

stmt_kills_ref_p fails to compare
MEM[(const struct Ray *)&D.3665][_15].org and D.3665[_15].org
equal via operand_equal_p.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (15 preceding siblings ...)
  2015-06-02 13:18 ` rguenth at gcc dot gnu.org
@ 2015-06-02 13:34 ` rguenth at gcc dot gnu.org
  2015-06-03  8:03 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-06-02 13:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
Let's re-open this for the original testcase.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (16 preceding siblings ...)
  2015-06-02 13:34 ` rguenth at gcc dot gnu.org
@ 2015-06-03  8:03 ` rguenth at gcc dot gnu.org
  2015-09-14 14:10 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-06-03  8:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142
Bug 66142 depends on bug 63916, which changed state.

Bug 63916 Summary: value-numbering fails to forward variable addresses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63916

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (17 preceding siblings ...)
  2015-06-03  8:03 ` rguenth at gcc dot gnu.org
@ 2015-09-14 14:10 ` rguenth at gcc dot gnu.org
  2015-09-17 12:36 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-09-14 14:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
Yes, I plan to come back to this during stage3.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (18 preceding siblings ...)
  2015-09-14 14:10 ` rguenth at gcc dot gnu.org
@ 2015-09-17 12:36 ` rguenth at gcc dot gnu.org
  2015-09-18  7:57 ` rguenth at gcc dot gnu.org
  2021-05-04 13:17 ` rguenth at gcc dot gnu.org
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-09-17 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so if we fix SCCVN further we hit its (designed) limitation with stmt
walking not following backedges (but the use is in a loop and the def outside).
We avoid doing the work to determine if sth is loop invariant here.
This was trying to improve things with SRA disabled - I'm going to still push
those changes if testing works out fine.

Now with SRA enabled we have a better situation in this regard but face

  _9 = &D.4309[_12].org;
  MEM[(struct vec_ *)_9] = 1.0e+0;
  MEM[(struct vec_ *)_9 + 4B] = _8;
  D.4309[_12].dir.x = x_13;
  D.4309[_12].dir.y = y_14;
  D.4308[_12].t = 1.0e+10;
  D.4308[_12].h = 0;
  _20 = &D.4308[_12];
  _26 = MEM[(const struct Ray *)&D.4309][_12].org.x;

which as said isn't friendly enough for the simplistic address forward
propagation we have.  The issue here is that we can't disambiguate
against

  MEM[(struct vec_ *)_9 + 4B] = _8;

when looking up MEM[(const struct Ray *)&D.4309][_12].org.x; because
vn_reference_maybe_forwprop_address doesn't forward into MEM[(struct vec_ *)_9
+ 4B] as its offset is not zero:

      /* If that didn't work because the address isn't invariant propagate
         the reference tree from the address operation in case the current
         dereference isn't offsetted.  */
      if (!addr_base
          && *i_p == ops->length () - 1
          && off == 0
          /* This makes us disable this transform for PRE where the
             reference ops might be also used for code insertion which
             is invalid.  */
          && default_vn_walk_kind == VN_WALKREWRITE)
        {
          auto_vec<vn_reference_op_s, 32> tem;
          copy_reference_ops_from_ref (TREE_OPERAND (addr, 0), &tem);
          ops->pop ();
          ops->pop ();
          ops->safe_splice (tem);
          --*i_p;
          return true;

it also likely wouldn't help vn_reference_lookup_3 to do the disambiguation
as that performs just

      lhs_ops = valueize_refs_1 (lhs_ops, &valueized_anything);
      if (valueized_anything)
        {
          lhs_ref_ok = ao_ref_init_from_vn_reference (&lhs_ref,
                                                      get_alias_set (lhs),
                                                      TREE_TYPE (lhs),
lhs_ops);
          if (lhs_ref_ok
              && !refs_may_alias_p_1 (ref, &lhs_ref, true))
            return NULL;

but there is still the variable offset to cope with.  We'd need to perform
some disambiguation on lhs_ref vs. vr.

But it's not clear how to represent &D.4309[_12].org forwarded into
MEM[(struct vec_ *)_9 + 4B].  It's kind-of a COMPONENT_REF with just
a offset added.

Let's collect the improvements I have sofar in my local tree and see if they
otherwise work.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (19 preceding siblings ...)
  2015-09-17 12:36 ` rguenth at gcc dot gnu.org
@ 2015-09-18  7:57 ` rguenth at gcc dot gnu.org
  2021-05-04 13:17 ` rguenth at gcc dot gnu.org
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-09-18  7:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #22 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Fri Sep 18 07:57:00 2015
New Revision: 227896

URL: https://gcc.gnu.org/viewcvs?rev=227896&root=gcc&view=rev
Log:
2015-09-18  Richard Biener  <rguenther@suse.de>

        PR tree-optimization/66142
        * fold-const.c (operand_equal_p): When OEP_ADDRESS_OF
        treat MEM[&x] and x the same.
        * tree-ssa-sccvn.h (vn_reference_fold_indirect): Remove.
        * tree-ssa-sccvn.c (vn_reference_fold_indirect): Return true
        when we simplified sth.
        (vn_reference_maybe_forwprop_address): Likewise.
        (valueize_refs_1): When we simplified through
        vn_reference_fold_indirect or vn_reference_maybe_forwprop_address
        set valueized_anything to true.
        (vn_reference_lookup_3): Use stmt_kills_ref_p to see whether
        one ref kills the other instead of just a offset-based test.
        * tree-ssa-alias.c (stmt_kills_ref_p): Use OEP_ADDRESS_OF
        for the operand_equal_p test to compare bases and also compare
        sizes.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c
    trunk/gcc/tree-ssa-alias.c
    trunk/gcc/tree-ssa-sccvn.c
    trunk/gcc/tree-ssa-sccvn.h


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
  2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
                   ` (20 preceding siblings ...)
  2015-09-18  7:57 ` rguenth at gcc dot gnu.org
@ 2021-05-04 13:17 ` rguenth at gcc dot gnu.org
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-05-04 13:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142
Bug 66142 depends on bug 66379, which changed state.

Bug 66379 Summary: SCCVN doesn't handle aggregate array element accesses very well
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66379

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-05-04 13:17 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-14 14:18 [Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE ysrumyan at gmail dot com
2015-05-14 14:19 ` [Bug tree-optimization/66142] " ysrumyan at gmail dot com
2015-05-22 10:09 ` jakub at gcc dot gnu.org
2015-05-22 11:12 ` jakub at gcc dot gnu.org
2015-05-22 11:40 ` jakub at gcc dot gnu.org
2015-05-22 11:54 ` jakub at gcc dot gnu.org
2015-05-22 14:51 ` jakub at gcc dot gnu.org
2015-05-25  9:35 ` jakub at gcc dot gnu.org
2015-05-25 12:48 ` jakub at gcc dot gnu.org
2015-05-26  8:28 ` rguenther at suse dot de
2015-05-26  8:42 ` rguenth at gcc dot gnu.org
2015-05-26 13:56 ` rguenth at gcc dot gnu.org
2015-05-26 14:43 ` rguenth at gcc dot gnu.org
2015-05-27  7:56 ` rguenth at gcc dot gnu.org
2015-05-28 13:25 ` rguenth at gcc dot gnu.org
2015-05-29 13:07 ` jakub at gcc dot gnu.org
2015-06-02 13:18 ` rguenth at gcc dot gnu.org
2015-06-02 13:34 ` rguenth at gcc dot gnu.org
2015-06-03  8:03 ` rguenth at gcc dot gnu.org
2015-09-14 14:10 ` rguenth at gcc dot gnu.org
2015-09-17 12:36 ` rguenth at gcc dot gnu.org
2015-09-18  7:57 ` rguenth at gcc dot gnu.org
2021-05-04 13:17 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).