[Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure_call in loop body
@ 2020-12-07 13:57 wwwhhhyyy333 at gmail dot com
  2020-12-07 14:18 ` [Bug tree-optimization/98176] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2020-12-07 13:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176

            Bug ID: 98176
           Summary: Loop invariant memory could not be hoisted when
                    nonpure_call in loop body
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wwwhhhyyy333 at gmail dot com
  Target Milestone: ---

For testcase

#include <cmath>

void foo(float *x, float tx, float *ret, int n)
{

#pragma omp simd
    for (int i = 0; i < n; i++)
    {
        float s,c;

        sincosf(x[i] * tx, &s, &c);

        *ret += s * c;   
    }
}

It could not be vectorized with -Ofast -fopenmp-simd -std=c++11

https://gcc.godbolt.org/z/ba77az

By manually hoist it could be vectorized with simd clone

void foo(float *x, float tx, float *ret, int n)
{
    float tmp = 0.0f;

#pragma omp simd
    for (int i = 0; i < n; i++)
    {
        float s,c;

        sincosf(x[i] * tx, &s, &c);

        tmp += s*c;      
    }
    *ret += tmp;
}

https://gcc.godbolt.org/z/bea17x

Is it possible for lim to perform store motion on case like this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/98176] Loop invariant memory could not be hoisted when nonpure_call in loop body
  2020-12-07 13:57 [Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure_call in loop body wwwhhhyyy333 at gmail dot com
@ 2020-12-07 14:18 ` rguenth at gcc dot gnu.org
  2020-12-08  2:19 ` wwwhhhyyy333 at gmail dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-12-07 14:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that x[i] may alias *ret and that PRE did half of the store-motion
job only so we can't recognize the reduction pattern.  I'm not sure whether
#pragma omp simd guarantees there's no forward dependence between x[]
and ret but I guess it's at least valid that x == ret, no?

The vectorizer relies on reductions being "scalar" so something would need to
do the store-motion for us.

I doubt the call is the issue btw.  The following fails the same way:

#include <cmath>

void foo(float *x, float tx, float *ret, int n)
{

#pragma omp simd
    for (int i = 0; i < n; i++)
    {
        float s,c;

        s = c = x[i] * tx;

        *ret += s * c;
    }
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/98176] Loop invariant memory could not be hoisted when nonpure_call in loop body
  2020-12-07 13:57 [Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure_call in loop body wwwhhhyyy333 at gmail dot com
  2020-12-07 14:18 ` [Bug tree-optimization/98176] " rguenth at gcc dot gnu.org
@ 2020-12-08  2:19 ` wwwhhhyyy333 at gmail dot com
  2020-12-08  7:09 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2020-12-08  2:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176

--- Comment #2 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
>> I doubt the call is the issue btw.

The aliasing could be removed by 

float foo(int *x, int n, float tx)   
{
        float ret[n];

        #pragma omp simd
        for (int i = 0; i < n; i++)
        {
            float s, c;                    

            s = c = tx * x[i];

            ret[0] += s*c;
        }

        return ret[0];
}

This is successfully vectorized, and the dump from lim2 has:

Moving statement
ret.1__I_lsm.7 = (*ret.1_18)[0];

But for 

float foo(int *x, int n, float tx)   
{
        float ret[n];

        #pragma omp simd
        for (int i = 0; i < n; i++)
        {
            float s, c;                    

            sincosf( tx * x[i] , &s, &c );

            ret[0] += s*c;
        }

        return ret[0];
}

It still could not be vectorized. I did initial debugging and see
tree-ssa-loop-im.c has

if (nonpure_call_p (stmt))
  {                                        
     maybe_never = true; 
     outermost = NULL;                      
  }

So no store-motion chance for any future statement in such block.

As a comparison, this could also be vectorized with simd clone:

float foo(int *x, int n, float tx)   
{
        float ret[n];

        #pragma omp simd
        for (int i = 0; i < n; i++)
        {
            float s, c;                    

            s = c = sinf( tx * x[i]);

            ret[0] += s*c;
        }

        return ret[0];
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/98176] Loop invariant memory could not be hoisted when nonpure_call in loop body
  2020-12-07 13:57 [Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure_call in loop body wwwhhhyyy333 at gmail dot com
  2020-12-07 14:18 ` [Bug tree-optimization/98176] " rguenth at gcc dot gnu.org
  2020-12-08  2:19 ` wwwhhhyyy333 at gmail dot com
@ 2020-12-08  7:09 ` rguenth at gcc dot gnu.org
  2020-12-08  8:29 ` wwwhhhyyy333 at gmail dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-12-08  7:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongyu Wang from comment #2)
> >> I doubt the call is the issue btw.
> 
> The aliasing could be removed by 
> 
> float foo(int *x, int n, float tx)   
> {
>         float ret[n];
> 
>         #pragma omp simd
>         for (int i = 0; i < n; i++)
>         {
>             float s, c;                    
> 
>             s = c = tx * x[i];
>  
>             ret[0] += s*c;
>         }
> 
>         return ret[0];
> }
> 
> This is successfully vectorized, and the dump from lim2 has:
> 
> Moving statement
> ret.1__I_lsm.7 = (*ret.1_18)[0];
> 
> But for 
> 
> float foo(int *x, int n, float tx)   
> {
>         float ret[n];
> 
>         #pragma omp simd
>         for (int i = 0; i < n; i++)
>         {
>             float s, c;                    
> 
>             sincosf( tx * x[i] , &s, &c );
>  
>             ret[0] += s*c;
>         }
> 
>         return ret[0];
> }
> 
> It still could not be vectorized. I did initial debugging and see
> tree-ssa-loop-im.c has

I see ret[0] has store-motion applied.  You don't see it vectorized
because GCC doesn't know how to vectorize sincos (or cexpi which is
what it lowers it to).

If you replace sincosf with a random call then you'll hit the issue
that LIMs dependence analysis doesn't handle it at all since it cannot
represent it.  That will block further optimization in the loop.

That can possibly be improved.

> if (nonpure_call_p (stmt))
>   {                                        
>      maybe_never = true; 
>      outermost = NULL;                      
>   }
> 
> So no store-motion chance for any future statement in such block.

That's another issue - the call may not return.  Here the granularity
is per BB and thus loads/stores in the same BB are not considered for
sinking.

> As a comparison, this could also be vectorized with simd clone:
> 
> float foo(int *x, int n, float tx)   
> {
>         float ret[n];
> 
>         #pragma omp simd
>         for (int i = 0; i < n; i++)
>         {
>             float s, c;                    
> 
>             s = c = sinf( tx * x[i]);
>  
>             ret[0] += s*c;
>         }
> 
>         return ret[0];
> }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/98176] Loop invariant memory could not be hoisted when nonpure_call in loop body
  2020-12-07 13:57 [Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure_call in loop body wwwhhhyyy333 at gmail dot com
                   ` (2 preceding siblings ...)
  2020-12-08  7:09 ` rguenth at gcc dot gnu.org
@ 2020-12-08  8:29 ` wwwhhhyyy333 at gmail dot com
  2020-12-08  9:25 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2020-12-08  8:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176

--- Comment #4 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
(In reply to Richard Biener from comment #3)

> I see ret[0] has store-motion applied.  You don't see it vectorized
> because GCC doesn't know how to vectorize sincos (or cexpi which is
> what it lowers it to).

I doubt so, after manually store motion

#include <cmath>

float foo(                
                int *x,                      
                int n,                 
                float tx
                )   
{
        float ret[n];
        float tmp;

        #pragma omp simd
        for (int i = 0; i < n; i++)
        {
            float s, c;                    

            sincosf( tx * x[i] , &s, &c );  

            tmp += s*c; 
        }

        ret[0] += tmp; 

        return ret[0];
}

with -Ofast -fopenmp-simd -std=c++11 it could be vectorized to call   
_ZGVbN4vvv_sincosf

ret[0] is moved for sinf() case, but not sincosf() with above options.

> 
> If you replace sincosf with a random call then you'll hit the issue
> that LIMs dependence analysis doesn't handle it at all since it cannot
> represent it.  That will block further optimization in the loop.
> 
> That can possibly be improved.
> 

So could LIMs dependence analysis handle known library function and just
analyze their memory parameter? Random call may have unknown behavior.

> > if (nonpure_call_p (stmt))
> >   {                                        
> >      maybe_never = true; 
> >      outermost = NULL;                      
> >   }
> > 
> > So no store-motion chance for any future statement in such block.
> 
> That's another issue - the call may not return.  Here the granularity
> is per BB and thus loads/stores in the same BB are not considered for
> sinking.
> 

IMHO the condition may be too strict for known library calls.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/98176] Loop invariant memory could not be hoisted when nonpure_call in loop body
  2020-12-07 13:57 [Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure_call in loop body wwwhhhyyy333 at gmail dot com
                   ` (3 preceding siblings ...)
  2020-12-08  8:29 ` wwwhhhyyy333 at gmail dot com
@ 2020-12-08  9:25 ` rguenth at gcc dot gnu.org
  2020-12-08 10:36 ` wwwhhhyyy333 at gmail dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-12-08  9:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
                 CC|                            |hubicka at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-12-08

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongyu Wang from comment #4)
> (In reply to Richard Biener from comment #3)
>  
> > I see ret[0] has store-motion applied.  You don't see it vectorized
> > because GCC doesn't know how to vectorize sincos (or cexpi which is
> > what it lowers it to).
> 
> I doubt so, after manually store motion
> 
> #include <cmath>
> 
> float foo(                
>                 int *x,                      
>                 int n,                 
>                 float tx
>                 )   
> {
>         float ret[n];
>         float tmp;
> 
>         #pragma omp simd
>         for (int i = 0; i < n; i++)
>         {
>             float s, c;                    
> 
>             sincosf( tx * x[i] , &s, &c );  
> 
>             tmp += s*c; 
>         }
> 
>         ret[0] += tmp; 
> 
>         return ret[0];
> }
> 
> with -Ofast -fopenmp-simd -std=c++11 it could be vectorized to call   
> _ZGVbN4vvv_sincosf
> 
> ret[0] is moved for sinf() case, but not sincosf() with above options.

What target are you targeting?  Can you provide the sincosf prototype
from your math.h?  (please attach preprocessed source).

I cannot reproduce sincosf _not_ being lowered to cexpif and thus
no longer having memory writes.

> > 
> > If you replace sincosf with a random call then you'll hit the issue
> > that LIMs dependence analysis doesn't handle it at all since it cannot
> > represent it.  That will block further optimization in the loop.
> > 
> > That can possibly be improved.
> > 
> 
> So could LIMs dependence analysis handle known library function and just
> analyze their memory parameter? Random call may have unknown behavior.

Well, in gather_mem_refs_stmt we have

  mem = simple_mem_ref_in_stmt (stmt, &is_stored);
  if (!mem)
    {
      /* We use the shared mem_ref for all unanalyzable refs.  */
      id = UNANALYZABLE_MEM_ID;

so in this path you could in theory special-case sincos{,f,l} and
handle it by adding two refs.  Note this will likely break down once
LIM decides to apply some motion to one of the refs since it doesn't
know how to re-materialize them (they'll be treated as independent
but are really tied together).

So the other option is to improve things by having multiple
"UNANALYZABLE_MEM_ID"s and while not handling them as candidates for
invariant or store motion handle them in the dependence analysis part,
possibly by just using the classical stmt-based dependence tests.
But this requires quite some refactoring.

> > > if (nonpure_call_p (stmt))
> > >   {                                        
> > >      maybe_never = true; 
> > >      outermost = NULL;                      
> > >   }
> > > 
> > > So no store-motion chance for any future statement in such block.
> > 
> > That's another issue - the call may not return.  Here the granularity
> > is per BB and thus loads/stores in the same BB are not considered for
> > sinking.
> > 
> 
> IMHO the condition may be too strict for known library calls.

Yes.

For a LIM testcase an example with a memcpy might be more practically
relevant.

For refactoring I'd start with classifying the unanalyzable refs as
separate ref ID, marking it with another bit like ref_unanalyzed in
in_mem_ref and asserting there's a single access of such refs.
The mem_refs_may_alias_p code then needs to use stmt-based alias
queries instead of refs_may_alias_p_1 using accesses_in_loop[0]->stmt.

And code testing for UNANALYZABLE_MEM_ID now needs to look at the
ref_unanalyzed flag to not consider those refs for transforms.

Note this may blow up the memory requirements for testcases with lots
of "unanalyzable" refs.

The nonpure-call code is more difficult to improve, even sincos can not return
when the access to s or c traps.  Analyzing the arguments might help here.
If you disregard that detail I think all ECF_LEAF|ECF_NOTHROW functions
return normally.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/98176] Loop invariant memory could not be hoisted when nonpure_call in loop body
  2020-12-07 13:57 [Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure_call in loop body wwwhhhyyy333 at gmail dot com
                   ` (4 preceding siblings ...)
  2020-12-08  9:25 ` rguenth at gcc dot gnu.org
@ 2020-12-08 10:36 ` wwwhhhyyy333 at gmail dot com
  2020-12-15  9:13 ` wwwhhhyyy333 at gmail dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2020-12-08 10:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176

--- Comment #6 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
(In reply to Richard Biener from comment #5)
> (In reply to Hongyu Wang from comment #4)
> > (In reply to Richard Biener from comment #3)
> >  
> > > I see ret[0] has store-motion applied.  You don't see it vectorized
> > > because GCC doesn't know how to vectorize sincos (or cexpi which is
> > > what it lowers it to).
> > 
> > I doubt so, after manually store motion
> > 
> > #include <cmath>
> > 
> > float foo(                
> >                 int *x,                      
> >                 int n,                 
> >                 float tx
> >                 )   
> > {
> >         float ret[n];
> >         float tmp;
> > 
> >         #pragma omp simd
> >         for (int i = 0; i < n; i++)
> >         {
> >             float s, c;                    
> > 
> >             sincosf( tx * x[i] , &s, &c );  
> > 
> >             tmp += s*c; 
> >         }
> > 
> >         ret[0] += tmp; 
> > 
> >         return ret[0];
> > }
> > 
> > with -Ofast -fopenmp-simd -std=c++11 it could be vectorized to call   
> > _ZGVbN4vvv_sincosf
> > 
> > ret[0] is moved for sinf() case, but not sincosf() with above options.
> 
> What target are you targeting?  Can you provide the sincosf prototype
> from your math.h?  (please attach preprocessed source).
> 
> I cannot reproduce sincosf _not_ being lowered to cexpif and thus
> no longer having memory writes.
> 

I used g++ on godbolt:

https://gcc.godbolt.org/z/rv45MK

Below extern is sufficient for g++ to vectorize the code

__attribute__ ((__simd__ ("notinbranch"))) extern void sincosf (float __x,
float *__sinx, float *__cosx); 

compiled with -Ofast -fopenmp-simd -std=c++11 -march=x86-64

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/98176] Loop invariant memory could not be hoisted when nonpure_call in loop body
  2020-12-07 13:57 [Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure_call in loop body wwwhhhyyy333 at gmail dot com
                   ` (5 preceding siblings ...)
  2020-12-08 10:36 ` wwwhhhyyy333 at gmail dot com
@ 2020-12-15  9:13 ` wwwhhhyyy333 at gmail dot com
  2021-07-07 12:34 ` rguenth at gcc dot gnu.org
  2021-07-08  5:25 ` wwwhhhyyy333 at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2020-12-15  9:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176

--- Comment #7 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
(In reply to Richard Biener from comment #5)
> Yes.
> 
> For a LIM testcase an example with a memcpy might be more practically
> relevant.
> 
> For refactoring I'd start with classifying the unanalyzable refs as
> separate ref ID, marking it with another bit like ref_unanalyzed in
> in_mem_ref and asserting there's a single access of such refs.
> The mem_refs_may_alias_p code then needs to use stmt-based alias
> queries instead of refs_may_alias_p_1 using accesses_in_loop[0]->stmt.
> 
> And code testing for UNANALYZABLE_MEM_ID now needs to look at the
> ref_unanalyzed flag to not consider those refs for transforms.
> 
> Note this may blow up the memory requirements for testcases with lots
> of "unanalyzable" refs.
> 
> The nonpure-call code is more difficult to improve, even sincos can not
> return
> when the access to s or c traps.  Analyzing the arguments might help here.
> If you disregard that detail I think all ECF_LEAF|ECF_NOTHROW functions
> return normally.

Thanks for the suggestion, I did some refactor accordingly and this case could
be vectorized. 

diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 92e5a8dd774..3e3e81bc36f 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -119,6 +119,8 @@ public:
                                   (its index in memory_accesses.refs_list)  */
   unsigned ref_canonical : 1;   /* Whether mem.ref was canonicalized.  */
   unsigned ref_decomposed : 1;  /* Whether the ref was hashed from mem.  */
+  unsigned ref_unanalyzed : 1; /* Whether the ref was unanalyzed memory.  */
+
   hashval_t hash;              /* Its hash value.  */

   /* The memory access itself and associated caching of alias-oracle
@@ -260,7 +262,14 @@ static bool refs_independent_p (im_mem_ref *, im_mem_ref
*, bool = true);
 #define UNANALYZABLE_MEM_ID 0

 /* Whether the reference was analyzable.  */
-#define MEM_ANALYZABLE(REF) ((REF)->id != UNANALYZABLE_MEM_ID)
+#define MEM_ANALYZABLE(REF) ((REF)->id != UNANALYZABLE_MEM_ID          \
+                            && !(REF)->ref_unanalyzed)
+
+#define REF_ID_UNANALYZABLE(id)                                               
\
+  (id == UNANALYZABLE_MEM_ID                                           \
+   || ((memory_accesses.refs_list[id])                                 \
+       && (memory_accesses.refs_list[id]->ref_unanalyzed)) \
+   )

 static struct lim_aux_data *
 init_lim_data (gimple *stmt)
@@ -829,7 +838,8 @@ set_profitable_level (gimple *stmt)
   set_level (stmt, gimple_bb (stmt)->loop_father, get_lim_data
(stmt)->max_loop);
 }

-/* Returns true if STMT is a call that has side effects.  */
+/* Returns true if STMT is a call that has side effects, or it is
+   not a function call with ECF_LEAF | ECF_NOTHROW.  */

 static bool
 nonpure_call_p (gimple *stmt)
@@ -837,6 +847,11 @@ nonpure_call_p (gimple *stmt)
   if (gimple_code (stmt) != GIMPLE_CALL)
     return false;

+  /* Simplified here, better to analyze call parameter.  */
+  int flags = gimple_call_flags (stmt);
+  if (flags & (ECF_LEAF | ECF_NOTHROW))
+    return false;
+
   return gimple_has_side_effects (stmt);
 }

@@ -1377,6 +1392,7 @@ mem_ref_alloc (ao_ref *mem, unsigned hash, unsigned id)
   ref->id = id;
   ref->ref_canonical = false;
   ref->ref_decomposed = false;
+  ref->ref_unanalyzed = false;
   ref->hash = hash;
   ref->stored = NULL;
   ref->loaded = NULL;
@@ -1461,9 +1477,13 @@ gather_mem_refs_stmt (class loop *loop, gimple *stmt)
   mem = simple_mem_ref_in_stmt (stmt, &is_stored);
   if (!mem)
     {
-      /* We use the shared mem_ref for all unanalyzable refs.  */
-      id = UNANALYZABLE_MEM_ID;
-      ref = memory_accesses.refs_list[id];
+      /* Mark unanaylzable refs with different id and skip analysis. */
+      id = memory_accesses.refs_list.length ();
+      ref = mem_ref_alloc (NULL, 0, id);
+      ref->ref_unanalyzed = true;
+      memory_accesses.refs_list.safe_push (ref);
+      record_mem_ref_loc (ref, stmt, NULL);
+
       if (dump_file && (dump_flags & TDF_DETAILS))
        {
          fprintf (dump_file, "Unanalyzed memory reference %u: ", id);
@@ -1576,7 +1596,7 @@ gather_mem_refs_stmt (class loop *loop, gimple *stmt)
       mark_ref_stored (ref, loop);
     }
   /* A not simple memory op is also a read when it is a write.  */
-  if (!is_stored || id == UNANALYZABLE_MEM_ID)
+  if (!is_stored || REF_ID_UNANALYZABLE (id))
     {
       bitmap_set_bit (&memory_accesses.refs_loaded_in_loop[loop->num],
ref->id);
       mark_ref_loaded (ref, loop);
@@ -1701,6 +1721,31 @@ mem_refs_may_alias_p (im_mem_ref *mem1, im_mem_ref
*mem2,
   poly_widest_int size1, size2;
   aff_tree off1, off2;

+  /* For refs marked as unanalyzed, use stmt_based alias analysis
+     and returns false when one mem_ref used by this unanalyzed stmt*/
+  if (mem1->ref_unanalyzed
+      || mem2->ref_unanalyzed)
+    {
+      if (mem1->ref_unanalyzed
+         && !mem2->ref_unanalyzed)
+       {
+         gcc_assert (mem1->accesses_in_loop.length() == 1);
+         gimple *stmt = mem1->accesses_in_loop[0].stmt;
+         if (ref_maybe_used_by_stmt_p (stmt, &mem2->mem, tbaa_p))
+           return true;
+       }
+      else if(!mem1->ref_unanalyzed)
+       {
+         gcc_assert (mem2->accesses_in_loop.length() == 1);
+         gimple *stmt = mem2->accesses_in_loop[0].stmt;
+         if (ref_maybe_used_by_stmt_p (stmt, &mem1->mem, tbaa_p))
+           return true;
+       }
+      else
+       return true;
+      return false;
+    }
+
   /* Perform basic offset and type-based disambiguation.  */
   if (!refs_may_alias_p_1 (&mem1->mem, &mem2->mem, tbaa_p))
     return false;
@@ -2423,7 +2468,7 @@ sm_seq_valid_bb (class loop *loop, basic_block bb, tree
vdef,
        }
       lim_aux_data *data = get_lim_data (def);
       gcc_assert (data);
-      if (data->ref == UNANALYZABLE_MEM_ID)
+      if (REF_ID_UNANALYZABLE (data->ref) )
        return -1;
       /* One of the stores we want to apply SM to and we've not yet seen.  */
       else if (bitmap_clear_bit (refs_not_in_seq, data->ref))
-- 

Does it looks correct? 

And this includes too many unanalyzed ids when I tried to compile a big
program... Is there any possibility to avoid the memory usage?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/98176] Loop invariant memory could not be hoisted when nonpure_call in loop body
  2020-12-07 13:57 [Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure_call in loop body wwwhhhyyy333 at gmail dot com
                   ` (6 preceding siblings ...)
  2020-12-15  9:13 ` wwwhhhyyy333 at gmail dot com
@ 2021-07-07 12:34 ` rguenth at gcc dot gnu.org
  2021-07-08  5:25 ` wwwhhhyyy333 at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-07 12:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2020-12-08 00:00:00         |2021-7-7

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
I've just pushed the change g:9f34b780b0461ec7b2b2defe96e44ab616ea2aa3 which
allows store-motion to handle some unanalyzed refs (but not yet calls because I
wanted a testcase showing usefulness).

I'm failing to reproduce with the sincos example since sincos is transformed
to __builtin_cexpi for me.  When using

float frexpf(float x, int *exp);
float foo(int *x, int n, float tx)
{
        float ret[n];
        int exp; 
        for (int i = 0; i < n; i++)
          ret[0] += frexpf (tx, &exp);
        return ret[0];
}

as alternate example we are not able to move ret[] as it seems we consider
the access to not always happen since it seems we consider frexpf possibly
not returning because it's a non-pure call:

/* Fills ALWAYS_EXECUTED_IN information for basic blocks, i.e.
   for each such basic block bb records the outermost loop for that execution
   of its header implies execution of bb.  */

static void
fill_always_executed_in (void)
{
  basic_block bb;
  class loop *loop;

  auto_sbitmap contains_call (last_basic_block_for_fn (cfun));
  bitmap_clear (contains_call);
  FOR_EACH_BB_FN (bb, cfun)
    {
      gimple_stmt_iterator gsi;
      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
        {
          if (nonpure_call_p (gsi_stmt (gsi)))
            break;
        }

      if (!gsi_end_p (gsi))
        bitmap_set_bit (contains_call, bb->index);
    }

  for (loop = current_loops->tree_root->inner; loop; loop = loop->next)
    fill_always_executed_in_1 (loop, contains_call);
}

so I don't think it buys us anything to handle calls yet.  sincos would
also be considered as possibly not returning.

I don't think we have any IPA analysis helping us here (yet).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/98176] Loop invariant memory could not be hoisted when nonpure_call in loop body
  2020-12-07 13:57 [Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure_call in loop body wwwhhhyyy333 at gmail dot com
                   ` (7 preceding siblings ...)
  2021-07-07 12:34 ` rguenth at gcc dot gnu.org
@ 2021-07-08  5:25 ` wwwhhhyyy333 at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2021-07-08  5:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176

--- Comment #9 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
(In reply to Richard Biener from comment #8)

> I'm failing to reproduce with the sincos example since sincos is transformed
> to __builtin_cexpi for me.  When using

I always generate sincosf with g++ -Ofast -fopenmp-simd -std=c++11, perhaps it
is related to libm? I'm using RHEL8 with glibc 2.28.

> so I don't think it buys us anything to handle calls yet.  sincos would
> also be considered as possibly not returning.
> 

Perhaps, since the sincosf case could only be vectorized with #pragma omp simd.
But I think it is better to allow those functions with libmvec implementation
if the input params are proved to be safe (such as local variables).

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-07-08  5:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-07 13:57 [Bug tree-optimization/98176] New: Loop invariant memory could not be hoisted when nonpure_call in loop body wwwhhhyyy333 at gmail dot com
2020-12-07 14:18 ` [Bug tree-optimization/98176] " rguenth at gcc dot gnu.org
2020-12-08  2:19 ` wwwhhhyyy333 at gmail dot com
2020-12-08  7:09 ` rguenth at gcc dot gnu.org
2020-12-08  8:29 ` wwwhhhyyy333 at gmail dot com
2020-12-08  9:25 ` rguenth at gcc dot gnu.org
2020-12-08 10:36 ` wwwhhhyyy333 at gmail dot com
2020-12-15  9:13 ` wwwhhhyyy333 at gmail dot com
2021-07-07 12:34 ` rguenth at gcc dot gnu.org
2021-07-08  5:25 ` wwwhhhyyy333 at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).