On Fri, 14 Jun 2019, Jan Hubicka wrote: > Hi, > this patch removes nonoverlapping_component_refs_of_decl_p and replaces > it by nonoverlapping_component_refs_p. Those functions are basically > equivalent, however the first assumes that the outer type of memory access > is type of decl which is not true in the gimple memory model. > > nonoverlapping_component_refs_p may wind up more expensive by doing > quick sort. If the outer types match, we could use the same walk as in > nonoverlapping_component_refs_of_decl_p (and I can easilly implement it > in there saing some qsort for the indirect oracles too) but since > nonoverlapping_component_refs_p already handles common case of shallow > access paths and does not show in profiles I am not convinced it is > worth the code duplication. > > I think the oracle is mostly redudnant with aliasing_component_refs - in fact I > have troubles constructing testcases it would catch that the second would not > except for expliting the -1 corner cases of same_types_for_tbaa, so I decided > to leave this for later. The only case where this function is stronger is the > case of match of types in the middle of access path. This require some > non-const indexed array refs to exploit it. Perhaps we could integrate both > tests and do nonoverlapping_component_refs_p only if one of many handled > components walks we do earlier concludes that it has chance to suceed. > > Point of this patch is however to fix the code duplication and also add > missing checks for view converts - I don't see how it would be safe to use > the access paths here when it is not in the other two oracles. nonoverlapping_component_refs_of_decl_p was added by Eric before I moved nonoverlapping_component_refs_p from RTL to trees. It's nice to see them merged. Still I'd like to see it done in two steps, first making them more equivalent by adding missing checks, best with actually failing testcases (as said, GIMPLE testcases with arbitrary typed/TBAAed accesses are easy to write but even a C testcase should eventually work here). > I implemented assert checking that whenever nonoverlapping_component_refs_of_decl_p > so does nonoverlapping_component_refs_p and found two issues: > 1) the first does not give up seeing handled component that is not > COMPONENT_REF while other does. > This prevents it to disambiguate for example > foo.bar.array[1]; > from > foo.baz.array[1]; > where bar and baz are fields of structure foo. This is valid True. Copied from the RTL routine ;) > 2) It gives up upon seeing bitfield while it may disambiguate later in the > patch like. > foo.bar.bitfield1; > from > foo.baz.bitfield2; Not sure, it should first see bar/baz and disambiguate on that. > > Here we can not tell that bitfield1 is different from bitfield2 (at least > we do not do so currently claiming that it is not different for RTL, but > I think in that case we should not see bitfield in the access path) We do - this was added for actual miscompiles. We could of course make sure to strip components from MEM_EXPRs. This is all about RTL memory accesses being byte-granular while the GIMPLE alias oracle doing bit-granular disambiguations so the check is also on the conservative side. > but we can still disambiguate based on bar/baz. And we already should? Note nonoverlapping_component_refs_of_decl_p is quite cheap compared to nonoverlapping_component_refs_p (which at least is no longer quadratic as it was in the RTL implementation). As you said it has several correctness issues (explicit VIEW_CONVERT_EXPR also comes to my mind here). > Patch changes > Alias oracle query stats: refs_may_alias_p: 3021539 disambiguations, 3321136 queries ref_maybe_used_by_call_p: 7118 disambiguations, 3047133 queries call_may_clobber_ref_p: 817 disambiguations, 817 queries aliasing_component_ref_p: 2050 disambiguations, 19908 queries TBAA oracle: 1419961 disambiguations 2916692 queries 555158 are in alias set 0 575103 queries asked about the same object 0 queries asked about the same alias set 0 access volatile 252874 are dependent in the DAG 113596 are aritificially in conflict with void * > > to > > Alias oracle query stats: > refs_may_alias_p: 3021521 disambiguations, 3321121 queries > ref_maybe_used_by_call_p: 7118 disambiguations, 3047115 queries > call_may_clobber_ref_p: 817 disambiguations, 817 queries > nonoverlapping_component_refs_p: 25 disambiguations, 83528 queries > aliasing_component_refs_p: 2050 disambiguations, 19908 queries > TBAA oracle: 1419961 disambiguations 2916690 queries > 555158 are in alias set 0 > 575103 queries asked about the same object > 0 queries asked about the same alias set > 0 access volatile > 252872 are dependent in the DAG > 113596 are aritificially in conflict with void * > > So at least for tramp3d nonoverlapping_component_refs_p doesn't do any miracles. > There are fewer disambiguations caused by the new view convert verification. > > It however also causes: > > FAIL: gcc.dg/vect/pr66142.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops in function" 1 > FAIL: gcc.dg/vect/pr66142.c scan-tree-dump-times vect "vectorized 1 loops in function" 1 > > Test does: > > struct A { float x, y; }; > struct B { struct A t, w; }; > > static inline float > bar (const struct B *x) > { > struct A r; > float b, c, d; > r.x = x->t.x; > r.y = x->t.y; > > which gets inlined to > > void > foo (float *a, float *b, float *c) > { > int i; > float z = 0.0f; > float u = *a; > #pragma omp simd > for (i = 0; i < 32; i++) > { > float x = b[i]; > float y = c[i]; > struct B r; > r.t.x = 1.0f; > r.t.y = u; > r.w.x = x; > r.w.y = y; > z += bar (&r); > } > *a = z; > } > > SIMD code turns struct B r into array of size 32. > > The first difference is fre1. Here we give up on the oracle because of: > > @@ -488,15 +505,10 @@ > D.2200[_20].t.y = u_13; > D.2200[_20].w.x = x_22; > D.2200[_20].w.y = y_24; > - _30 = MEM [(const struct B *)&D.2200][_20].t.x; > - _34 = MEM [(const struct B *)&D.2200][_20].t.y; > - _35 = MEM [(const struct B *)&D.2200][_20].w.x; > - _36 = _30 * _35; > - _38 = y_24 * _34; > - b_39 = _36 + _38; > - _40 = _30 * _30; > - _41 = b_39 + _40; > - _42 = _34 * _34; > + _38 = u_13 * y_24; > + b_39 = x_22 + _38; > + _41 = b_39 + 1.0e+0; > + _42 = u_13 * u_13; > > It is truly silly to not optimize this, but there is a type mismatch > between "struct B[32]" and "struct B" which we perceive as a view > convert (same_type_for_tbaa_p return false which is bit ridiculout > because precisely this case could be supported). > (We would miss this if struct B was dynamically allocated previously > too.) I've noted elsewhere that what we consider a VIEW_CONVERT_EXPR should be improved (independently, maybe as a first step), factoring a mem_is_view_converting () check. > However in general we could run into this scenario since the type > mismatch is a result of forwprop1 handling > > D.2200[_20].t.x = 1.0e+0; > D.2200[_20].t.y = u_13; > D.2200[_20].w.x = x_22; > D.2200[_20].w.y = y_24; > _29 = &D.2200[_20]; > _30 = MEM[(const struct B *)_29].t.x; > _34 = MEM[(const struct B *)_29].t.y; > _35 = MEM[(const struct B *)_29].w.x; > _36 = _30 * _35; > _37 = MEM[(const struct B *)_29].w.y; > > So if there was outer struct, then the view convert would indeed happen. No, a forward propagation should never result in a view-convert perceived MEM unless the MEM was already view-converting. But the issue is that the special trick in forwprop that does /* If the RHS is a plain dereference and the value type is the same as that of the pointed-to type of the address we can put the dereferenced address on the RHS preserving the original alias-type. */ perserves the original alias-type. It's been too long that I remember all the details ;) > Any ideas how to solve this? I guess one option would be to delay such lossy > forward subtitutions for one of later forwprops/PRE since for most of SSA > optimizers the earlier sequence should be transparent enough and there > is ahope that one can put constant in. As said our notion what is a view-conversion and what not should probably allow simple component cases. Richard. > Shall I XFAIL the testcase? > > Bootstrapped/regtested x86_64-linux, OK? > > Honza > > > * tree-ssa-alias.c (alias_stats): Add > nonoverlapping_component_refs_p_may_alias and > nonoverlapping_component_refs_p_no_alias. > (dump_alias_stats): Dump it. > (nonoverlapping_component_refs_of_decl_p): Remove. > (nonoverlapping_component_refs_p): Walk all handled components; > do not give up on diambiguation on first failed match. > (decl_refs_may_alias_p): Watch for view converts; > use nonoverlapping_component_refs_of_decl_p. > > Index: tree-ssa-alias.c > =================================================================== > --- tree-ssa-alias.c (revision 272283) > +++ tree-ssa-alias.c (working copy) > @@ -100,6 +100,8 @@ static struct { > unsigned HOST_WIDE_INT call_may_clobber_ref_p_no_alias; > unsigned HOST_WIDE_INT aliasing_component_refs_p_may_alias; > unsigned HOST_WIDE_INT aliasing_component_refs_p_no_alias; > + unsigned HOST_WIDE_INT nonoverlapping_component_refs_p_may_alias; > + unsigned HOST_WIDE_INT nonoverlapping_component_refs_p_no_alias; > } alias_stats; > > void > @@ -124,7 +126,13 @@ dump_alias_stats (FILE *s) > alias_stats.call_may_clobber_ref_p_no_alias, > alias_stats.call_may_clobber_ref_p_no_alias > + alias_stats.call_may_clobber_ref_p_may_alias); > - fprintf (s, " aliasing_component_ref_p: " > + fprintf (s, " nonoverlapping_component_refs_p: " > + HOST_WIDE_INT_PRINT_DEC" disambiguations, " > + HOST_WIDE_INT_PRINT_DEC" queries\n", > + alias_stats.nonoverlapping_component_refs_p_no_alias, > + alias_stats.nonoverlapping_component_refs_p_no_alias > + + alias_stats.nonoverlapping_component_refs_p_may_alias); > + fprintf (s, " aliasing_component_refs_p: " > HOST_WIDE_INT_PRINT_DEC" disambiguations, " > HOST_WIDE_INT_PRINT_DEC" queries\n", > alias_stats.aliasing_component_refs_p_no_alias, > @@ -1029,106 +1037,6 @@ aliasing_component_refs_p (tree ref1, > return false; > } > > -/* Return true if we can determine that component references REF1 and REF2, > - that are within a common DECL, cannot overlap. */ > - > -static bool > -nonoverlapping_component_refs_of_decl_p (tree ref1, tree ref2) > -{ > - auto_vec component_refs1; > - auto_vec component_refs2; > - > - /* Create the stack of handled components for REF1. */ > - while (handled_component_p (ref1)) > - { > - component_refs1.safe_push (ref1); > - ref1 = TREE_OPERAND (ref1, 0); > - } > - if (TREE_CODE (ref1) == MEM_REF) > - { > - if (!integer_zerop (TREE_OPERAND (ref1, 1))) > - return false; > - ref1 = TREE_OPERAND (TREE_OPERAND (ref1, 0), 0); > - } > - > - /* Create the stack of handled components for REF2. */ > - while (handled_component_p (ref2)) > - { > - component_refs2.safe_push (ref2); > - ref2 = TREE_OPERAND (ref2, 0); > - } > - if (TREE_CODE (ref2) == MEM_REF) > - { > - if (!integer_zerop (TREE_OPERAND (ref2, 1))) > - return false; > - ref2 = TREE_OPERAND (TREE_OPERAND (ref2, 0), 0); > - } > - > - /* Bases must be either same or uncomparable. */ > - gcc_checking_assert (ref1 == ref2 > - || (DECL_P (ref1) && DECL_P (ref2) > - && compare_base_decls (ref1, ref2) != 0)); > - > - /* Pop the stacks in parallel and examine the COMPONENT_REFs of the same > - rank. This is sufficient because we start from the same DECL and you > - cannot reference several fields at a time with COMPONENT_REFs (unlike > - with ARRAY_RANGE_REFs for arrays) so you always need the same number > - of them to access a sub-component, unless you're in a union, in which > - case the return value will precisely be false. */ > - while (true) > - { > - do > - { > - if (component_refs1.is_empty ()) > - return false; > - ref1 = component_refs1.pop (); > - } > - while (!RECORD_OR_UNION_TYPE_P (TREE_TYPE (TREE_OPERAND (ref1, 0)))); > - > - do > - { > - if (component_refs2.is_empty ()) > - return false; > - ref2 = component_refs2.pop (); > - } > - while (!RECORD_OR_UNION_TYPE_P (TREE_TYPE (TREE_OPERAND (ref2, 0)))); > - > - /* Beware of BIT_FIELD_REF. */ > - if (TREE_CODE (ref1) != COMPONENT_REF > - || TREE_CODE (ref2) != COMPONENT_REF) > - return false; > - > - tree field1 = TREE_OPERAND (ref1, 1); > - tree field2 = TREE_OPERAND (ref2, 1); > - > - /* ??? We cannot simply use the type of operand #0 of the refs here > - as the Fortran compiler smuggles type punning into COMPONENT_REFs > - for common blocks instead of using unions like everyone else. */ > - tree type1 = DECL_CONTEXT (field1); > - tree type2 = DECL_CONTEXT (field2); > - > - /* We cannot disambiguate fields in a union or qualified union. */ > - if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) > - return false; > - > - if (field1 != field2) > - { > - /* A field and its representative need to be considered the > - same. */ > - if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2 > - || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1) > - return false; > - /* Different fields of the same record type cannot overlap. > - ??? Bitfields can overlap at RTL level so punt on them. */ > - if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)) > - return false; > - return true; > - } > - } > - > - return false; > -} > - > /* qsort compare function to sort FIELD_DECLs after their > DECL_FIELD_CONTEXT TYPE_UID. */ > > @@ -1154,40 +1062,66 @@ nonoverlapping_component_refs_p (const_t > { > if (!flag_strict_aliasing > || !x || !y > - || TREE_CODE (x) != COMPONENT_REF > - || TREE_CODE (y) != COMPONENT_REF) > - return false; > + || !handled_component_p (x) > + || !handled_component_p (y)) > + { > + ++alias_stats.nonoverlapping_component_refs_p_may_alias; > + return false; > + } > > auto_vec fieldsx; > - while (TREE_CODE (x) == COMPONENT_REF) > + while (handled_component_p (x)) > { > - tree field = TREE_OPERAND (x, 1); > - tree type = DECL_FIELD_CONTEXT (field); > - if (TREE_CODE (type) == RECORD_TYPE) > - fieldsx.safe_push (field); > + if (TREE_CODE (x) == COMPONENT_REF) > + { > + tree field = TREE_OPERAND (x, 1); > + tree type = DECL_FIELD_CONTEXT (field); > + if (TREE_CODE (type) == RECORD_TYPE) > + fieldsx.safe_push (field); > + } > x = TREE_OPERAND (x, 0); > } > if (fieldsx.length () == 0) > - return false; > + { > + ++alias_stats.nonoverlapping_component_refs_p_may_alias; > + return false; > + } > auto_vec fieldsy; > - while (TREE_CODE (y) == COMPONENT_REF) > + while (handled_component_p (y)) > { > - tree field = TREE_OPERAND (y, 1); > - tree type = DECL_FIELD_CONTEXT (field); > - if (TREE_CODE (type) == RECORD_TYPE) > - fieldsy.safe_push (TREE_OPERAND (y, 1)); > + if (TREE_CODE (y) == COMPONENT_REF) > + { > + tree field = TREE_OPERAND (y, 1); > + tree type = DECL_FIELD_CONTEXT (field); > + if (TREE_CODE (type) == RECORD_TYPE) > + fieldsy.safe_push (TREE_OPERAND (y, 1)); > + } > y = TREE_OPERAND (y, 0); > } > if (fieldsy.length () == 0) > - return false; > + { > + ++alias_stats.nonoverlapping_component_refs_p_may_alias; > + return false; > + } > > /* Most common case first. */ > if (fieldsx.length () == 1 > && fieldsy.length () == 1) > - return ((DECL_FIELD_CONTEXT (fieldsx[0]) > - == DECL_FIELD_CONTEXT (fieldsy[0])) > - && fieldsx[0] != fieldsy[0] > - && !(DECL_BIT_FIELD (fieldsx[0]) && DECL_BIT_FIELD (fieldsy[0]))); > + { > + if ((DECL_FIELD_CONTEXT (fieldsx[0]) > + == DECL_FIELD_CONTEXT (fieldsy[0])) > + && fieldsx[0] != fieldsy[0] > + && !(DECL_BIT_FIELD (fieldsx[0]) && DECL_BIT_FIELD (fieldsy[0]))) > + { > + ++alias_stats.nonoverlapping_component_refs_p_no_alias; > + return true; > + } > + else > + { > + ++alias_stats.nonoverlapping_component_refs_p_may_alias; > + return false; > + } > + } > > if (fieldsx.length () == 2) > { > @@ -1215,19 +1149,26 @@ nonoverlapping_component_refs_p (const_t > if (typex == typey) > { > /* We're left with accessing different fields of a structure, > - no possible overlap. */ > + no possible overlap. > + If we can not disambiguate, do not give up and try to continue: > + it is possible that there are multiple matching types on the > + access patch and each of them may result in disambiguation. */ > if (fieldx != fieldy) > { > /* A field and its representative need to be considered the > same. */ > if (DECL_BIT_FIELD_REPRESENTATIVE (fieldx) == fieldy > || DECL_BIT_FIELD_REPRESENTATIVE (fieldy) == fieldx) > - return false; > + ; > /* Different fields of the same record type cannot overlap. > ??? Bitfields can overlap at RTL level so punt on them. */ > - if (DECL_BIT_FIELD (fieldx) && DECL_BIT_FIELD (fieldy)) > - return false; > - return true; > + else if (DECL_BIT_FIELD (fieldx) && DECL_BIT_FIELD (fieldy)) > + ; > + else > + { > + ++alias_stats.nonoverlapping_component_refs_p_no_alias; > + return true; > + } > } > } > if (TYPE_UID (typex) < TYPE_UID (typey)) > @@ -1245,10 +1186,11 @@ nonoverlapping_component_refs_p (const_t > } > while (1); > > + ++alias_stats.nonoverlapping_component_refs_p_may_alias; > return false; > } > > > /* Return true if two memory references based on the variables BASE1 > and BASE2 constrained to [OFFSET1, OFFSET1 + MAX_SIZE1) and > [OFFSET2, OFFSET2 + MAX_SIZE2) may alias. REF1 and REF2 > @@ -1271,11 +1212,41 @@ decl_refs_may_alias_p (tree ref1, tree b > if (!ranges_maybe_overlap_p (offset1, max_size1, offset2, max_size2)) > return false; > > + /* Access path oracle below needs to know both references. */ > + if (!ref1 || !ref2) > + return true; > + > + /* If the decl is accessed via a MEM_REF, reconstruct the base > + we can use for TBAA. */ > + tree dbase1 = ref1; > + > + while (handled_component_p (dbase1)) > + dbase1 = TREE_OPERAND (dbase1, 0); > + if (TREE_CODE (dbase1) == MEM_REF > + || TREE_CODE (dbase1) == TARGET_MEM_REF) > + { > + tree ptrtype1 = TREE_TYPE (TREE_OPERAND (dbase1, 1)); > + /* If first reference is view-converted, give up now. */ > + if (same_type_for_tbaa (TREE_TYPE (dbase1), TREE_TYPE (ptrtype1)) != 1) > + return true; > + } > + > + tree dbase2 = ref2; > + > + while (handled_component_p (dbase2)) > + dbase2 = TREE_OPERAND (dbase2, 0); > + if (TREE_CODE (dbase2) == MEM_REF > + || TREE_CODE (dbase2) == TARGET_MEM_REF) > + { > + tree ptrtype2 = TREE_TYPE (TREE_OPERAND (dbase2, 1)); > + /* If second reference is view-converted, give up now. */ > + if (same_type_for_tbaa (TREE_TYPE (dbase2), TREE_TYPE (ptrtype2)) != 1) > + return true; > + } > + > /* For components with variable position, the above test isn't sufficient, > so we disambiguate component references manually. */ > - if (ref1 && ref2 > - && handled_component_p (ref1) && handled_component_p (ref2) > - && nonoverlapping_component_refs_of_decl_p (ref1, ref2)) > + if (nonoverlapping_component_refs_p (ref1, ref2)) > return false; > > return true; > -- Richard Biener SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)