public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Teach gimple_canonical_types_compatible_p about incomplete types
@ 2015-05-25  1:32 Jan Hubicka
  2015-05-25 17:33 ` Bernhard Reutner-Fischer
  2015-05-26  8:45 ` Richard Biener
  0 siblings, 2 replies; 18+ messages in thread
From: Jan Hubicka @ 2015-05-25  1:32 UTC (permalink / raw)
  To: gcc-patches, rguenther

Richard,
here is next patch of the series.  It adds all the logic for defining type
equivalence that globs all complete types together in order to make incomplete
type equivalent to every complete variant.

Effect of recursing on pointers
===============================

This is, of course, intended for hashing pointers, but I did not bundle it
here.  When used in the obvious way to hash pointers (not included in this
patch), the number of distinct canonical types in Firefox increase by 50%:

[WPA] GIMPLE canonical type table: size 32749, 21596 elements, 1637039 searches, 222326 collisions (ratio: 0.135810)
[WPA] GIMPLE canonical type pointer-map: 21596 elements, 2271778 searches
to:
[WPA] GIMPLE canonical type table: size 65521, 32680 elements, 1637039 searches, 475073 collisions (ratio: 0.290203)
[WPA] GIMPLE canonical type pointer-map: 32680 elements, 2303368 searches       

I also noticed the increase in collision ratio; it is still not bad and we
can strenghten the hash little bit incrementally.

Here is a testcase where we improve:

truct a
{
  int a; int *b;
};
struct a *ptr;
void test (void);
void linker_error (void);

struct b
{
  int a; short *b;
};
struct b *ptr2;
inline void
test()
{
  ptr2->a=2;
}

void
main()
{
  asm ("":"=r"(ptr));
  asm ("":"=r"(ptr2));
  while (1) {
  ptr->a=1;
  test();
  if (ptr->a != 1)
    linker_error ();
   }
}

with -fno-early-inlining we are able to disambiguate ptr->a from ptr2->a
because we disambiguate struct a from struct b. For some reason we don't do
that if I move the secon ASM statement to test() that looks like missed
optimization.

Now the change does not really translate to great increase of disambiguations
for Firefox (it seems more in noise). The reason is the pointer_type globbing
in alias.c.

Globbing pointer types in alias.c
=================================

This code seemed very odd to me, but after some tought it does make sense for
C/C++.  It seemed to me that if we want to support alias of two different
pointers of same type but different qualification, we must make it part of the
TYPE_CANONICAL construction because that effectively unifies the compound types
build from them.  This is (fortunately) not really true: in C stores to array
types do not exist and structures are identified by their names, not by
structure. This means that this extra pointer equality does not propagate up
(only compound types in interest are pointers) and globbing them late indeed
allows to access pointer to A by pointer to B globally without fear of
disasters.

C cross-TU aliasing rules
and why I don't think globing pointers works for LTO of C only programs
========================================================================

In LTO however the situation is different.  We discussed following
cross-language unification rules

 1) type names does not matter (C standard is clear on that)
 2) field names does not matter (in C/C++ they do but we think matching the
    names between C and other languages may be impractical)
 3) qualifiers does not matter (they do for C, byt it may be impractical
    for other languages)
 4) references and pointers should be considered equal because we want to
    interface languages with references only to C.
    I suppose this is important for fortran

These rules do build structural equality up and thus do need to be part of
canonical type computation machinery.  
I looked in detail to what C standard say. It has notion of compatible type
and composite type.  The relevant clauses of C standard are:

  1  Two types have compatible type if their types are the same.
     Additional rules for determining whether two types are compatible are described
     in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers, and in 6.7.6 for
     declarators.  Moreover, two structure, union, or enumerated types declared
     in separate translation units are compatible if their tags and members satisfy
     the following requirements:  If one is declared with a tag, the other shall be
     declared with the same tag.  If both are completed anywhere within their
     respective translation units, then the following additional requirements apply:
     there shall be a one-to-one correspondence between their members such that each
     pair of corresponding members are declared with compatible types; if one member
     of the pair is declared with an alignment specifier, the other is declared with
     an equivalent alignment specifier; and if one member of the pair is declared
     with a name, the other is declared with the same name.  For two structures,
     corresponding members shall be declared in the same order.   For two
     structures or unions, corresponding bit-fields shall have the same widths.  For
     two enumerations, corresponding members shall have the same values.

(tag is "struct a"/"struct b", so C standard actually require names to match
 for structures/unions/enums to be compatible but by transitivity it needs
 to match compatible structures without a tag, so indeed we can not use type names
 and preserve transitivity of our notion of compatibility even considering C alone.

 We get wrong unions, because we insist on compatibility in order, while
 standard doesn't.

 We also do not compare alignments. This is probably not important)

  2  For two pointer types to be compatible, both shall be identically
     qualified and both shall be pointers to compatible types.

(here we skip the qualification for a reason by punting to TREE_CODE (TREE_TYPE (t)),
 but we get wrong some cases where compatible types have different codes, see
 bellow)

  6  For two array types to be compatible, both shall have compatible element
     types, and if both size specifiers are present, and are integer constant
     expressions, then both size specifiers shall have the same constant value.
     If the two array types are used in a context which requires them to be
     compatible, it is undefined behavior if the two size specifiers evaluate
     to unequal values.

(ok, here I think variable sized arrays are going to be fun, because such array
 of unknown size is a complete type and may be compatible with any array.
 Because we glob away arrays for alias.c purposes, this is less of a problem,
 but it propagates up to structure.  Strictly speaking struct a {int a[5];}
 is compatible with struct a {int a[b];} from other unit.

 Taking this to consequences, by transitivity, we may need to drop array sizes
 from consideration completely at least when some unit constains a variably sized array
 of compatible type that we do not really know in advance. Also when matching types
 of fields of structures and we may need to stop comparing field offsets after
 tripping over a first field that reucursively contains an array.

 So in particular pointer to variable sized array is compatible with pointer
 to constantly sized array that we get wrong because of the extra loop assigning
 TYPE_CANONICAL to read in function bodies.  I believe canonical hash need
 to be persistent for this reason and we need to keep canonical type computation
 incremental)

  15 For two function types to be compatible, both shall specify compatible
     return types.  Moreover, the parameter type lists, if both are present,
     shall agree in the number of parameters and in use of the ellipsis
     terminator; corresponding parameters shall have compatible types.  If one
     type has a parameter type list and the other type is specified by a
     function declarator that is not part of a function definition and that
     contains an empty identifier list, the parameter list shall not have an
     ellipsis terminator and the type of each parameter shall be compatible
     with the type that results from the application of the
     default argument promotions.  If one type has a parameter type list and the
     other type is specified by a function definition that contains a (possibly
     empty) identifier list, both shall agree in the number of parameters, and
     the type of each prototype parameter shall be compatible with the type
     that results from the application of the default argument promotions to
     the type of the corresponding identifier.   (In the determination of type
     compatibility and of a composite type, each parameter declared with
     function or array type is taken as having the adjusted type and each
     parameter declared with qualified type is taken as having the unqualified
     version of its declared type.)

(we do not match prototype with ellypsis to a declaration with additional
parameters.  Because function compatibility matters only for warnings and
for pointer types, where we probably want to use return value only, it is
not that much of trouble.

There are couple examples in standard that are interesting read
EXAMPLE 5    The following are all compatible function prototype declarators.
double maximum(int n, int m, double a[n][m]);
double maximum(int n, int m, double a[*][*]);
double maximum(int n, int m, double a[ ][*]);
double maximum(int n, int m, double a[ ][m]);
as are:
void f(double (* restrict a)[5]);
void f(double a[restrict][5]);
void f(double a[restrict 3][5]);
void f(double a[restrict static 3][5]);)

  2  Each enumerated type shall be compatible with char ,  a  signed integer
     type, or an unsigned integer type. The choice of type is
     implementation-defined, but  shall be capable of representing the values
     of all the members of the enumeration.    The enumerated type is
     incomplete until immediately after the that terminates the list of
     enumerator declarations, and complete thereafter.

(we ignore this completely as far as I know, it is easy to fix though, all
 we need is to make ENUMERATION_TYPE pretend to be INTEGER_TYPE)

  10 For two qualified types to be compatible, both shall have the identically
     qualified version of a compatible type; the order of type qualifiers
     within a list of specifiers or qualifiers does not affect the specified type.

Now I think in order to get C standard type compatiblity to imply
gimple_canonical_types_compatible we need to implement all the above globbing
rules as part of canonical type computation, not only punt at pointers in
alias.c

My reading is that for example

struct a {char *a;};

is compatible with

struct a {enum *a;};

defined in other compilation unit.

So in addition to globbing disucssed I suppose:

 5) we probably want to consider char * same as void * to support K&R and
    non-K&R C code mixing.
 6) we want to consider void * equivalent to void ** because GCC is doing
    type punning here (still?) and we think people generally mix them.
 7) pointers to array types should be same as pointers to elements
    again because C and C++ does not really have accesses to arrays but
    other languages may.
 8) i think to be correct by C language standard we need to glob enum with char
    tough I do not quite see how standard conforming program should use it given
    that standard does not say if it is char/unsigned char/signed char.

Now I think that if we want to support all of above, we need to make it part
of TYPE_CANONICAL calculation, not part of alias set computation. Otherwise
we won't.

C++, C++-C interpoperability
============================
For now I do not see why we can't handle C++ types as C types as the type
equivalence for C++ seems just monotonously more strict.  C++ types with
linkage specify also language linkage.  C++ standard consider two types
distinct if they differ by language linkage:

   The default language linkage of all function types, function names, and
   variable names is C++ language linkage. Two function types with different
   language linkages are distinct types even if they are otherwise identical
   C and C++ linkages are defined by standard.

I specify how types are compatible in C++ linkage and rest is left to
implementaiton.

I think we are safe to enforce C style rules on C++ type as C++ rules
are just monotonously strict.

I think in longer term I should extend ODR wanrings to complain about language
linkage violations (as specified by Section 7.5) and then we could be able to
use ODR to disambiguate canonical types on everything that is in C++ linkage
assuming that existing codebases are resonably free of the violations.

Fortran-C interoperability, universal pointer type, aggregates
==============================================================

I did my excercise and also read Fortran aliasing and C interoperability rules.
Fortran only aliasing seems just the same as aliasing within a single
translation unit and thus rather easy given that fortran has only one hack in
gfc_get_alias_set.

It has some interesting points WRT C language.  It has type
that is compatible with all C pointers (contraidcting C standard that
say they may have different representations), it also has type that is
compatible with both signed and unsigned char. It states that
aggregate compatibility only by types of elements (i.e. array and
structure of 4 ints is the same or single element array and union)
on those types.

Fortran explicitly marks C interoperable types and I think the
issues can be reproduced by fortran only testcases so I think we
want make Fortran FE to explicitly drop those odd guys to alias set 0.
I will look into producing testcases for bugzilla, but that will get us
around need to fix that all at LTO level.

Overall plan
============

So I propose the following:
 1) we add the incomplete types mode to gimple_canonical_types_compatible_p
    as suggested by this patch
 2) I will add the individual globbing rules to the functions one by one
 3) once done, we enable recursion on pointer
 4) we drop the alias.c globing of pointer_type for in_lto_p.

Independently on that I will send a patch for more fine grained globbing
within alias.c for non-LTO compilation so we avoid cases where LTO alias
sets would be actually stronger than what we do without LTO and also
because this adds a lot of extra disambiguations to Firefox.

I hope the machinery will become flexible enough so we can add extra globbing
rules if we find them necessary during the stage1. I would also porpose adding
C standard compliant mode to gimple_canonical_types_compatible_p and use it to
produce declaration mismatch warnings in lto-symtab when both decls come from
C/C++ the same way as we already do ODR.  That is very useful sanity check that
will warn us when our implementation is wrong and/or programs are often
breaking it. We can drop the weird testcases to testsuite to be sure we won't
regress in future.

Based on experience, we may switch to C/C++ compliant equivalence when we know
that whole unit is C/C++ and we do not have to deal with odd cases from other
languages.

Honza

	* tree.c (gimple_canonical_types_compatible_p): Turn
	TRUST_TYPE_CANONICAL into FLAGS parameter; support
	comparsion with MATCH_WITH_INCOMPLETE_TYPE.
Index: lto/lto.c
===================================================================
--- lto/lto.c	(revision 223632)
+++ lto/lto.c	(working copy)
@@ -295,7 +295,9 @@
 static unsigned long num_canonical_type_hash_entries;
 static unsigned long num_canonical_type_hash_queries;
 
-static void iterative_hash_canonical_type (tree type, inchash::hash &hstate);
+static void iterative_hash_canonical_type (tree type, inchash::hash &hstate,
+					   unsigned int flags
+					     = MATCH_TYPE_CANONICAL);
 static hashval_t gimple_canonical_type_hash (const void *p);
 static void gimple_register_canonical_type_1 (tree t, hashval_t hash);
 
@@ -302,26 +304,36 @@
 /* Returning a hash value for gimple type TYPE.
 
    The hash value returned is equal for types considered compatible
-   by gimple_canonical_types_compatible_p.  */
+   by gimple_canonical_types_compatible_p.  FLAGS values are interpretted
+   same way as by this function.  */
 
 static hashval_t
-hash_canonical_type (tree type)
+hash_canonical_type (tree type, unsigned int flags = MATCH_TYPE_CANONICAL)
 {
   inchash::hash hstate;
 
-  /* We compute alias sets only for types that needs them.
-     Be sure we do not recurse to something else as we can not hash incomplete
-     types in a way they would have same hash value as compatible complete
-     types.  */
-  gcc_checking_assert (type_with_alias_set_p (type));
+  /* Check that we encounter incomplete types only with
+     MATCH_WITH_INCOMPLETE_TYPE.
 
+     Also check that no one tries to use hashing in compbination with 
+     !MATCH_TYPE_CANONICAL.  In this case gimple_canonical_types_compatible_p
+     is not transitive and thus does not produce equivalence on all types.  */
+  gcc_checking_assert ((flags & MATCH_TYPE_CANONICAL)
+		       && ((flags & MATCH_WITH_INCOMPLETE_TYPE)
+			   || type_with_alias_set_p (type)));
+
   /* Combine a few common features of types so that types are grouped into
      smaller sets; when searching for existing matching types to merge,
      only existing types having the same features as the new type will be
      checked.  */
   hstate.add_int (TREE_CODE (type));
-  hstate.add_int (TYPE_MODE (type));
 
+  /* Incomplete arrays and aggregates do not have TYPE_MODE defined.  */
+  if (!(flags & MATCH_WITH_INCOMPLETE_TYPE)
+      || (TREE_CODE (type) != ARRAY_TYPE
+	  && !AGGREGATE_TYPE_P (type)))
+    hstate.add_int (TYPE_MODE (type));
+
   /* Incorporate common features of numerical types.  */
   if (INTEGRAL_TYPE_P (type)
       || SCALAR_FLOAT_TYPE_P (type)
@@ -355,7 +367,8 @@
     hstate.add_int (TYPE_STRING_FLAG (type));
 
   /* For array types hash the domain bounds and the string flag.  */
-  if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type))
+  if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type)
+      && !(flags & MATCH_WITH_INCOMPLETE_TYPE))
     {
       hstate.add_int (TYPE_STRING_FLAG (type));
       /* OMP lowering can introduce error_mark_node in place of
@@ -366,30 +379,35 @@
 	inchash::add_expr (TYPE_MAX_VALUE (TYPE_DOMAIN (type)), hstate);
     }
 
-  /* Recurse for aggregates with a single element type.  */
+  /* Recurse for aggregates with a single element type.
+     We are safe to drop MATCH_INCOMPLETE.  There is no way to build
+     an array of incomplete types.   */
   if (TREE_CODE (type) == ARRAY_TYPE
       || TREE_CODE (type) == COMPLEX_TYPE
       || TREE_CODE (type) == VECTOR_TYPE)
-    iterative_hash_canonical_type (TREE_TYPE (type), hstate);
+    iterative_hash_canonical_type (TREE_TYPE (type), hstate,
+				   flags & ~MATCH_WITH_INCOMPLETE_TYPE);
 
   /* Incorporate function return and argument types.  */
   if (TREE_CODE (type) == FUNCTION_TYPE || TREE_CODE (type) == METHOD_TYPE)
     {
-      unsigned na;
-      tree p;
+      iterative_hash_canonical_type (TREE_TYPE (type), hstate, flags);
 
-      iterative_hash_canonical_type (TREE_TYPE (type), hstate);
-
-      for (p = TYPE_ARG_TYPES (type), na = 0; p; p = TREE_CHAIN (p))
+      if (!(flags & MATCH_WITH_INCOMPLETE_TYPE)
+	  || TREE_CODE (type) == METHOD_TYPE)
 	{
-	  iterative_hash_canonical_type (TREE_VALUE (p), hstate);
-	  na++;
+	  unsigned na;
+	  tree p;
+	  for (p = TYPE_ARG_TYPES (type), na = 0; p; p = TREE_CHAIN (p))
+	    {
+	      iterative_hash_canonical_type (TREE_VALUE (p), hstate, flags);
+	      na++;
+	    }
+	  hstate.add_int (na);
 	}
-
-      hstate.add_int (na);
     }
 
-  if (RECORD_OR_UNION_TYPE_P (type))
+  if (RECORD_OR_UNION_TYPE_P (type) && !(flags & MATCH_WITH_INCOMPLETE_TYPE))
     {
       unsigned nf;
       tree f;
@@ -397,7 +415,7 @@
       for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f))
 	if (TREE_CODE (f) == FIELD_DECL)
 	  {
-	    iterative_hash_canonical_type (TREE_TYPE (f), hstate);
+	    iterative_hash_canonical_type (TREE_TYPE (f), hstate, flags);
 	    nf++;
 	  }
 
@@ -407,14 +425,22 @@
   return hstate.end();
 }
 
-/* Returning a hash value for gimple type TYPE combined with VAL.  */
+/* Returning a hash value for gimple type TYPE combined with VAL.
+   FLAGS are the same as for gimple_conanical_types_compatible_p.  */
 
 static void
-iterative_hash_canonical_type (tree type, inchash::hash &hstate)
+iterative_hash_canonical_type (tree type, inchash::hash &hstate,
+			       unsigned int flags)
 {
   hashval_t v;
+
+  /* TYPE_CANONICAL reflects equivalence classes with
+     !MATCH_WITH_INCOMPLETE_TYPE.   If we are matching incomplete types,
+     then we need to recurse.  */
+  if (flags & MATCH_WITH_INCOMPLETE_TYPE)
+    v = hash_canonical_type (type, flags);
   /* An already processed type.  */
-  if (TYPE_CANONICAL (type))
+  else if (TYPE_CANONICAL (type))
     {
       type = TYPE_CANONICAL (type);
       v = gimple_canonical_type_hash (type);
@@ -497,6 +523,14 @@
 {
   if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t))
     return;
+  /* Only types that can be handled in memory need canonical types.
+     Function and methods are never accessed. Also we do not need canonical
+     types for incomplete types with exception of arrays - structures may end
+     with incomplete arrays that may be referenced.  */
+  if (TREE_CODE (t) == FUNCTION_TYPE || TREE_CODE (t) == METHOD_TYPE
+      || (!COMPLETE_TYPE_P (t)
+	  && (TREE_CODE (t) != ARRAY_TYPE || !COMPLETE_TYPE_P (TREE_TYPE (t)))))
+    return;
 
   gimple_register_canonical_type_1 (t, hash_canonical_type (t));
 }
Index: tree.c
===================================================================
--- tree.c	(revision 223633)
+++ tree.c	(working copy)
@@ -12702,12 +12702,24 @@
 /* Return true iff T1 and T2 are structurally identical for what
    TBAA is concerned.  
    This function is used both by lto.c canonical type merging and by the
-   verifier.  If TRUST_TYPE_CANONICAL we do not look into structure of types
-   that have TYPE_CANONICAL defined and assume them equivalent.  */
+   verifier.  
 
+   If flags sets MATCH_TYPE_CANONICAL we assume that TYPE_CANONICAL is set in a
+   way that two types have the same canonical type if and only if
+   gimple_canonical_types_compatible_p (t1,t2, 0) is true.  This is used
+   to cut down recursion during LTO canonical type comptuation.  When this flag
+   is set we also sanity check that we are going to produce equivalence relation
+   that is needed to drive the hashtable in lto.c.
+
+   if MATCH_WITH_INCOMPLETE_TYPE is true, then we do not use any information
+   from complete types and thus i.e. all RECORD_TYPE are equivlaent to other
+   RECORD_TYPEs.  This is the only equivalence possible if one require
+   incomplete type to be in the same equivalence class with all its
+   completetions.  */
+
 bool
 gimple_canonical_types_compatible_p (const_tree t1, const_tree t2,
-				     bool trust_type_canonical)
+				     unsigned int flags)
 {
   /* Before starting to set up the SCC machinery handle simple cases.  */
 
@@ -12719,28 +12731,28 @@
   if (t1 == NULL_TREE || t2 == NULL_TREE)
     return false;
 
-  /* We consider complete types always compatible with incomplete type.
-     This does not make sense for canonical type calculation and thus we
-     need to ensure that we are never called on it.
+  /* Check that either flags allow incomplete types or both types are complete.
+     This is necessary to ensure transitivity for canonical type merging.
 
-     FIXME: For more correctness the function probably should have three modes
-	1) mode assuming that types are complete mathcing their structure
-	2) mode allowing incomplete types but producing equivalence classes
-	   and thus ignoring all info from complete types
-	3) mode allowing incomplete types to match complete but checking
-	   compatibility between complete types.
+     FIXME: with !MATCH_TYPE_CANONICAL we probably should allow match between
+     incomplete type and complete type as defined by language standards.  No
+     one however rely on it so far.  */
 
-     1 and 2 can be used for canonical type calculation. 3 is the real
-     definition of type compatibility that can be used i.e. for warnings during
-     declaration merging.  */
-
-  gcc_assert (!trust_type_canonical
+  gcc_assert ((flags & MATCH_WITH_INCOMPLETE_TYPE)
+	      || !(flags & MATCH_TYPE_CANONICAL)
 	      || (type_with_alias_set_p (t1) && type_with_alias_set_p (t2)));
   /* If the types have been previously registered and found equal
      they still are.  */
   if (TYPE_CANONICAL (t1) && TYPE_CANONICAL (t2)
-      && trust_type_canonical)
-    return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2);
+      && (flags & MATCH_TYPE_CANONICAL))
+    {
+      if (TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2))
+	return true;
+      /* TYPE_CANONICAL is always computed with an assumption that the type
+	 is complete.  */
+      if (!(flags & MATCH_WITH_INCOMPLETE_TYPE))
+	return false;
+    }
 
   /* Can't be the same type if the types don't have the same code.  */
   if (TREE_CODE (t1) != TREE_CODE (t2))
@@ -12753,8 +12765,12 @@
       || TREE_CODE (t1) == NULLPTR_TYPE)
     return true;
 
-  /* Can't be the same type if they have different mode.  */
-  if (TYPE_MODE (t1) != TYPE_MODE (t2))
+  /* Can't be the same type if they have different mode.
+     Incomplete arrays and aggregates do not have TYPE_MODE defined.  */
+  if ((!(flags & MATCH_WITH_INCOMPLETE_TYPE)
+       || (TREE_CODE (t1) != ARRAY_TYPE
+           && !AGGREGATE_TYPE_P (t1)))
+      && TYPE_MODE (t1) != TYPE_MODE (t2))
     return false;
 
   /* Non-aggregate types can be handled cheaply.  */
@@ -12783,8 +12799,7 @@
 	{
 	  if (TYPE_ADDR_SPACE (TREE_TYPE (t1))
 	      != TYPE_ADDR_SPACE (TREE_TYPE (t2)))
-	    return false;
-
+ 	    return false;
 	  if (TREE_CODE (TREE_TYPE (t1)) != TREE_CODE (TREE_TYPE (t2)))
 	    return false;
 	}
@@ -12792,9 +12807,9 @@
       /* Tail-recurse to components.  */
       if (TREE_CODE (t1) == VECTOR_TYPE
 	  || TREE_CODE (t1) == COMPLEX_TYPE)
-	return gimple_canonical_types_compatible_p (TREE_TYPE (t1),
-						    TREE_TYPE (t2),
-						    trust_type_canonical);
+	return gimple_canonical_types_compatible_p
+		 (TREE_TYPE (t1), TREE_TYPE (t2),
+		  flags & ~MATCH_WITH_INCOMPLETE_TYPE);
 
       return true;
     }
@@ -12804,12 +12819,20 @@
     {
     case ARRAY_TYPE:
       /* Array types are the same if the element types are the same and
-	 the number of elements are the same.  */
-      if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2),
-						trust_type_canonical)
+	 the number of elements are the same.
+
+	 When MATCH_WITH_INCOMPLETE_TYPE is set, bypass the check
+	 on number of elements.
+	 When recursing, clear MATCH_WITH_INCOMPLETE_TYPE because there is
+	 no way to make incomplete array of array.  */
+      if (!gimple_canonical_types_compatible_p
+	     (TREE_TYPE (t1), TREE_TYPE (t2),
+	      flags & ~MATCH_WITH_INCOMPLETE_TYPE)
 	  || TYPE_STRING_FLAG (t1) != TYPE_STRING_FLAG (t2)
 	  || TYPE_NONALIASED_COMPONENT (t1) != TYPE_NONALIASED_COMPONENT (t2))
 	return false;
+      else if (flags & MATCH_WITH_INCOMPLETE_TYPE)
+	return true;
       else
 	{
 	  tree i1 = TYPE_DOMAIN (t1);
@@ -12848,11 +12871,19 @@
     case METHOD_TYPE:
     case FUNCTION_TYPE:
       /* Function types are the same if the return type and arguments types
-	 are the same.  */
+	 are the same.
+	 It is possible that function pointers have return values and parameters
+	 of incomplete types; permit that by not clearing
+	 MATCH_WITH_INCOMPLETE_TYPE  */
       if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2),
-						trust_type_canonical))
+						flags))
 	return false;
 
+      /* We must permit a match between !prototype_p and prototype_p for
+	 functions; methods are never !prototype_p.  */
+      if ((flags & MATCH_WITH_INCOMPLETE_TYPE)
+	  && TREE_CODE (t1) == FUNCTION_TYPE)
+	return true;
       if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
 	return true;
       else
@@ -12864,8 +12895,7 @@
 	       parms1 = TREE_CHAIN (parms1), parms2 = TREE_CHAIN (parms2))
 	    {
 	      if (!gimple_canonical_types_compatible_p
-		     (TREE_VALUE (parms1), TREE_VALUE (parms2),
-		      trust_type_canonical))
+		     (TREE_VALUE (parms1), TREE_VALUE (parms2), flags))
 		return false;
 	    }
 
@@ -12881,6 +12911,15 @@
       {
 	tree f1, f2;
 
+	/* C standrad require incomplete structures and unions to be
+	   considered compatible with complete ones regardless their TYPE_NAME
+	   when they come from different translation units.
+	   We must consider transitive closure here, so 
+	   every structure/union is equivalent to each other.  */
+	   
+	if (flags & MATCH_WITH_INCOMPLETE_TYPE)
+	  return true;
+
 	/* For aggregate types, all the fields must be the same.  */
 	for (f1 = TYPE_FIELDS (t1), f2 = TYPE_FIELDS (t2);
 	     f1 || f2;
@@ -12897,8 +12936,7 @@
 	    if (DECL_NONADDRESSABLE_P (f1) != DECL_NONADDRESSABLE_P (f2)
 		|| !gimple_compare_field_offset (f1, f2)
 		|| !gimple_canonical_types_compatible_p
-		      (TREE_TYPE (f1), TREE_TYPE (f2),
-		       trust_type_canonical))
+		      (TREE_TYPE (f1), TREE_TYPE (f2), flags))
 	      return false;
 	  }
 
@@ -12961,7 +12999,7 @@
 	      with variably sized arrays because their sizes possibly
 	      gimplified to different variables.  */
 	   && !variably_modified_type_p (ct, NULL)
-	   && !gimple_canonical_types_compatible_p (t, ct, false))
+	   && !gimple_canonical_types_compatible_p (t, ct, 0))
     {
       error ("TYPE_CANONICAL is not compatible");
       debug_tree (ct);
Index: tree.h
===================================================================
--- tree.h	(revision 223632)
+++ tree.h	(working copy)
@@ -4569,9 +4569,21 @@
 extern unsigned int tree_map_base_hash (const void *);
 extern int tree_map_base_marked_p (const void *);
 extern void DEBUG_FUNCTION verify_type (const_tree t);
-extern bool gimple_canonical_types_compatible_p (const_tree, const_tree,
-						 bool trust_type_canonical = true);
 
+/* Flags used by gimple_canonical_types_compatible_p.  */
+enum gimple_canonical_types_compatible_flags
+  {
+    /* Asume that TYPE_CANONICAL is set in a way that two types have
+       the same canonical type if and only if
+       gimple_canonical_types_compatible_p (t1,t2, 0) is true.  */
+    MATCH_TYPE_CANONICAL = 1,
+    /* Match all types as if they were incomplete.  */
+    MATCH_WITH_INCOMPLETE_TYPE = 2
+  };
+extern bool gimple_canonical_types_compatible_p
+    (const_tree, const_tree,
+     unsigned int flags = MATCH_TYPE_CANONICAL);
+
 #define tree_map_eq tree_map_base_eq
 extern unsigned int tree_map_hash (const void *);
 #define tree_map_marked_p tree_map_base_marked_p

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-25  1:32 Teach gimple_canonical_types_compatible_p about incomplete types Jan Hubicka
@ 2015-05-25 17:33 ` Bernhard Reutner-Fischer
  2015-05-26  1:18   ` Jan Hubicka
  2015-05-26  8:45 ` Richard Biener
  1 sibling, 1 reply; 18+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-05-25 17:33 UTC (permalink / raw)
  To: Jan Hubicka, gcc-patches, rguenther

On May 25, 2015 1:49:45 AM GMT+02:00, Jan Hubicka <hubicka@ucw.cz> wrote:


>2  Each enumerated type shall be compatible with char ,  a  signed
>integer
>     type, or an unsigned integer type. The choice of type is
>implementation-defined, but  shall be capable of representing the
>values
>     of all the members of the enumeration.    The enumerated type is
>     incomplete until immediately after the that terminates the list of
>     enumerator declarations, and complete thereafter.
>
>(we ignore this completely as far as I know, it is easy to fix though,
>all
> we need is to make ENUMERATION_TYPE pretend to be INTEGER_TYPE)

Don't forget -fshort-enum though.

Thanks,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-25 17:33 ` Bernhard Reutner-Fischer
@ 2015-05-26  1:18   ` Jan Hubicka
  2015-05-29 21:37     ` Joseph Myers
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2015-05-26  1:18 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer; +Cc: Jan Hubicka, gcc-patches, rguenther

> On May 25, 2015 1:49:45 AM GMT+02:00, Jan Hubicka <hubicka@ucw.cz> wrote:
> 
> 
> >2  Each enumerated type shall be compatible with char ,  a  signed
> >integer
> >     type, or an unsigned integer type. The choice of type is
> >implementation-defined, but  shall be capable of representing the
> >values
> >     of all the members of the enumeration.    The enumerated type is
> >     incomplete until immediately after the that terminates the list of
> >     enumerator declarations, and complete thereafter.
> >
> >(we ignore this completely as far as I know, it is easy to fix though,
> >all
> > we need is to make ENUMERATION_TYPE pretend to be INTEGER_TYPE)
> 
> Don't forget -fshort-enum though.

I believe -fshort-enum is makes us non-complian to the C standard and thus
we are free to not follow this rule :)

Honza

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-25  1:32 Teach gimple_canonical_types_compatible_p about incomplete types Jan Hubicka
  2015-05-25 17:33 ` Bernhard Reutner-Fischer
@ 2015-05-26  8:45 ` Richard Biener
  2015-05-26 17:36   ` Jan Hubicka
  1 sibling, 1 reply; 18+ messages in thread
From: Richard Biener @ 2015-05-26  8:45 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

On Mon, 25 May 2015, Jan Hubicka wrote:

> Richard,
> here is next patch of the series.  It adds all the logic for defining type
> equivalence that globs all complete types together in order to make incomplete
> type equivalent to every complete variant.
> 
> Effect of recursing on pointers
> ===============================
> 
> This is, of course, intended for hashing pointers, but I did not bundle it
> here.  When used in the obvious way to hash pointers (not included in this
> patch), the number of distinct canonical types in Firefox increase by 50%:
> 
> [WPA] GIMPLE canonical type table: size 32749, 21596 elements, 1637039 searches, 222326 collisions (ratio: 0.135810)
> [WPA] GIMPLE canonical type pointer-map: 21596 elements, 2271778 searches
> to:
> [WPA] GIMPLE canonical type table: size 65521, 32680 elements, 1637039 searches, 475073 collisions (ratio: 0.290203)
> [WPA] GIMPLE canonical type pointer-map: 32680 elements, 2303368 searches       
> 
> I also noticed the increase in collision ratio; it is still not bad and we
> can strenghten the hash little bit incrementally.
> 
> Here is a testcase where we improve:
> 
> truct a
> {
>   int a; int *b;
> };
> struct a *ptr;
> void test (void);
> void linker_error (void);
> 
> struct b
> {
>   int a; short *b;
> };
> struct b *ptr2;
> inline void
> test()
> {
>   ptr2->a=2;
> }
> 
> void
> main()
> {
>   asm ("":"=r"(ptr));
>   asm ("":"=r"(ptr2));
>   while (1) {
>   ptr->a=1;
>   test();
>   if (ptr->a != 1)
>     linker_error ();
>    }
> }
> 
> with -fno-early-inlining we are able to disambiguate ptr->a from ptr2->a
> because we disambiguate struct a from struct b. For some reason we don't do
> that if I move the secon ASM statement to test() that looks like missed
> optimization.
> 
> Now the change does not really translate to great increase of disambiguations
> for Firefox (it seems more in noise). The reason is the pointer_type globbing
> in alias.c.

Yeah, we only get the improvement because of some "hack" in the tree
alias oracle which also uses the base object for TBAA.

> Globbing pointer types in alias.c
> =================================
> 
> This code seemed very odd to me, but after some tought it does make sense for
> C/C++.  It seemed to me that if we want to support alias of two different
> pointers of same type but different qualification, we must make it part of the
> TYPE_CANONICAL construction because that effectively unifies the compound types
> build from them.  This is (fortunately) not really true: in C stores to array
> types do not exist and structures are identified by their names, not by
> structure. This means that this extra pointer equality does not propagate up
> (only compound types in interest are pointers) and globbing them late indeed
> allows to access pointer to A by pointer to B globally without fear of
> disasters.
> 
> C cross-TU aliasing rules
> and why I don't think globing pointers works for LTO of C only programs
> ========================================================================
> 
> In LTO however the situation is different.  We discussed following
> cross-language unification rules
> 
>  1) type names does not matter (C standard is clear on that)
>  2) field names does not matter (in C/C++ they do but we think matching the
>     names between C and other languages may be impractical)
>  3) qualifiers does not matter (they do for C, byt it may be impractical
>     for other languages)
>  4) references and pointers should be considered equal because we want to
>     interface languages with references only to C.
>     I suppose this is important for fortran
> 
> These rules do build structural equality up and thus do need to be part of
> canonical type computation machinery.  
> I looked in detail to what C standard say. It has notion of compatible type
> and composite type.  The relevant clauses of C standard are:
> 
>   1  Two types have compatible type if their types are the same.
>      Additional rules for determining whether two types are compatible are described
>      in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers, and in 6.7.6 for
>      declarators.  Moreover, two structure, union, or enumerated types declared
>      in separate translation units are compatible if their tags and members satisfy
>      the following requirements:  If one is declared with a tag, the other shall be
>      declared with the same tag.  If both are completed anywhere within their
>      respective translation units, then the following additional requirements apply:
>      there shall be a one-to-one correspondence between their members such that each
>      pair of corresponding members are declared with compatible types; if one member
>      of the pair is declared with an alignment specifier, the other is declared with
>      an equivalent alignment specifier; and if one member of the pair is declared
>      with a name, the other is declared with the same name.  For two structures,
>      corresponding members shall be declared in the same order.   For two
>      structures or unions, corresponding bit-fields shall have the same widths.  For
>      two enumerations, corresponding members shall have the same values.
> 
> (tag is "struct a"/"struct b", so C standard actually require names to match
>  for structures/unions/enums to be compatible but by transitivity it needs
>  to match compatible structures without a tag, so indeed we can not use type names
>  and preserve transitivity of our notion of compatibility even considering C alone.
> 
>  We get wrong unions, because we insist on compatibility in order, while
>  standard doesn't.

Yeah, we should fix that.  And in fact, for cross-language LTO I don't
see why

  union { int a; char c; };

and

  union { int a; short s; };

should not be compatible - they have a common member after all.  So
I'd like to glob all unions that have the same size (and as improvement
over that, that have at least one compatible member).  That also get's
rid of the issue that we'd need to sort union members for the comparison
to avoid quadraticness (as long as we don't check for that one compatible
member).

Oh, and are

  union { int a; };

and

  struct { int a; };

not compatible?  They are layout-wise at least.  Likewise the struct
and union { int a; short s; } with the same argument as the two-union
case.

>  We also do not compare alignments. This is probably not important)

Correct - alignment doesn't enter TBAA.

>   2  For two pointer types to be compatible, both shall be identically
>      qualified and both shall be pointers to compatible types.
> 
> (here we skip the qualification for a reason by punting to TREE_CODE 
> (TREE_TYPE (t)),
>  but we get wrong some cases where compatible types have different codes, see
>  bellow)

Yes, using TREE_CODE is likely not correct.

>   6  For two array types to be compatible, both shall have compatible element
>      types, and if both size specifiers are present, and are integer constant
>      expressions, then both size specifiers shall have the same constant value.
>      If the two array types are used in a context which requires them to be
>      compatible, it is undefined behavior if the two size specifiers evaluate
>      to unequal values.
> 
> (ok, here I think variable sized arrays are going to be fun, because such array
>  of unknown size is a complete type and may be compatible with any array.
>  Because we glob away arrays for alias.c purposes, this is less of a problem,
>  but it propagates up to structure.  Strictly speaking struct a {int a[5];}
>  is compatible with struct a {int a[b];} from other unit.

Yeah, fun.

>  Taking this to consequences, by transitivity, we may need to drop array sizes
>  from consideration completely at least when some unit constains a variably sized array
>  of compatible type that we do not really know in advance. Also when matching types
>  of fields of structures and we may need to stop comparing field offsets after
>  tripping over a first field that reucursively contains an array.

Indeed, if we first see struct a { int a[3]; } and struct b { int a[2]; }
and then struct c { int a[b]; } we'd need to glob all structs to a single
TYPE_CANONICAL at that point (which of course doesn't work).

>  So in particular pointer to variable sized array is compatible with pointer
>  to constantly sized array that we get wrong because of the extra loop assigning
>  TYPE_CANONICAL to read in function bodies.  I believe canonical hash need
>  to be persistent for this reason and we need to keep canonical type computation
>  incremental)

Keeping the hash is fine with me (and using it in the function-code).  I
suppose we should incrementally fix that (and the union case above) before
adding any new fancy stuff.

>   15 For two function types to be compatible, both shall specify compatible
>      return types.  Moreover, the parameter type lists, if both are present,
>      shall agree in the number of parameters and in use of the ellipsis
>      terminator; corresponding parameters shall have compatible types.  If one
>      type has a parameter type list and the other type is specified by a
>      function declarator that is not part of a function definition and that
>      contains an empty identifier list, the parameter list shall not have an
>      ellipsis terminator and the type of each parameter shall be compatible
>      with the type that results from the application of the
>      default argument promotions.  If one type has a parameter type list and the
>      other type is specified by a function definition that contains a (possibly
>      empty) identifier list, both shall agree in the number of parameters, and
>      the type of each prototype parameter shall be compatible with the type
>      that results from the application of the default argument promotions to
>      the type of the corresponding identifier.   (In the determination of type
>      compatibility and of a composite type, each parameter declared with
>      function or array type is taken as having the adjusted type and each
>      parameter declared with qualified type is taken as having the unqualified
>      version of its declared type.)
> 
> (we do not match prototype with ellypsis to a declaration with additional
> parameters.  Because function compatibility matters only for warnings and
> for pointer types, where we probably want to use return value only, it is
> not that much of trouble.
> 
> There are couple examples in standard that are interesting read
> EXAMPLE 5    The following are all compatible function prototype declarators.
> double maximum(int n, int m, double a[n][m]);
> double maximum(int n, int m, double a[*][*]);
> double maximum(int n, int m, double a[ ][*]);
> double maximum(int n, int m, double a[ ][m]);
> as are:
> void f(double (* restrict a)[5]);
> void f(double a[restrict][5]);
> void f(double a[restrict 3][5]);
> void f(double a[restrict static 3][5]);)

Not sure why you get into functions here at all ...

>   2  Each enumerated type shall be compatible with char ,  a  signed integer
>      type, or an unsigned integer type. The choice of type is
>      implementation-defined, but  shall be capable of representing the values
>      of all the members of the enumeration.    The enumerated type is
>      incomplete until immediately after the that terminates the list of
>      enumerator declarations, and complete thereafter.
> 
> (we ignore this completely as far as I know, it is easy to fix though, all
>  we need is to make ENUMERATION_TYPE pretend to be INTEGER_TYPE)

Yes, we don't make a distinction between ENUMERAL_TYPE and INTEGER_TYPE.

>   10 For two qualified types to be compatible, both shall have the identically
>      qualified version of a compatible type; the order of type qualifiers
>      within a list of specifiers or qualifiers does not affect the specified type.
> 
> Now I think in order to get C standard type compatiblity to imply
> gimple_canonical_types_compatible we need to implement all the above globbing
> rules as part of canonical type computation, not only punt at pointers in
> alias.c
> 
> My reading is that for example
> 
> struct a {char *a;};
> 
> is compatible with
> 
> struct a {enum *a;};
> 
> defined in other compilation unit.

Yes, as said above the TREE_CODE trick in the pointer-type handing is
wrong.  We can as well just drop it ...

> So in addition to globbing disucssed I suppose:
> 
>  5) we probably want to consider char * same as void * to support K&R and
>     non-K&R C code mixing.
>  6) we want to consider void * equivalent to void ** because GCC is doing
>     type punning here (still?) and we think people generally mix them.

I think void * is equivalent to _any_ other pointer type.  It's certainly
used in that way in C (to provide abstraction).  As said to the alias.c
globbing, the choice was to either glob all pointer types or make
void * (and void **) have alias-set zero.  Experiments showed that doing
the latter is more harmful for code generation.

>  7) pointers to array types should be same as pointers to elements
>     again because C and C++ does not really have accesses to arrays but
>     other languages may.

Yes.

>  8) i think to be correct by C language standard we need to glob enum 
> with char
>     tough I do not quite see how standard conforming program should use 
>     it given that standard does not say if it is char/unsigned 
>     char/signed char.

I think it depends on the actual enum, no?  So for forward declarations
like

 enum Foo;
 struct X { enum Foo *p; };

you face the same issue as with void *.

> Now I think that if we want to support all of above, we need to make it part
> of TYPE_CANONICAL calculation, not part of alias set computation. Otherwise
> we won't.

Yes, strictly speaking we need to implement all alias.c globbing in
TYPE_CANONICAL calculation as well to get composition rules right.

> C++, C++-C interpoperability
> ============================
> For now I do not see why we can't handle C++ types as C types as the type
> equivalence for C++ seems just monotonously more strict.  C++ types with
> linkage specify also language linkage.  C++ standard consider two types
> distinct if they differ by language linkage:
> 
>    The default language linkage of all function types, function names, and
>    variable names is C++ language linkage. Two function types with different
>    language linkages are distinct types even if they are otherwise identical
>    C and C++ linkages are defined by standard.
> 
> I specify how types are compatible in C++ linkage and rest is left to
> implementaiton.
> 
> I think we are safe to enforce C style rules on C++ type as C++ rules
> are just monotonously strict.
> 
> I think in longer term I should extend ODR wanrings to complain about language
> linkage violations (as specified by Section 7.5) and then we could be able to
> use ODR to disambiguate canonical types on everything that is in C++ linkage
> assuming that existing codebases are resonably free of the violations.
> 
> Fortran-C interoperability, universal pointer type, aggregates
> ==============================================================
> 
> I did my excercise and also read Fortran aliasing and C interoperability rules.
> Fortran only aliasing seems just the same as aliasing within a single
> translation unit and thus rather easy given that fortran has only one hack in
> gfc_get_alias_set.
> 
> It has some interesting points WRT C language.  It has type
> that is compatible with all C pointers (contraidcting C standard that
> say they may have different representations), it also has type that is
> compatible with both signed and unsigned char. It states that
> aggregate compatibility only by types of elements (i.e. array and
> structure of 4 ints is the same or single element array and union)
> on those types.
> 
> Fortran explicitly marks C interoperable types and I think the
> issues can be reproduced by fortran only testcases so I think we
> want make Fortran FE to explicitly drop those odd guys to alias set 0.
> I will look into producing testcases for bugzilla, but that will get us
> around need to fix that all at LTO level.
> 
> Overall plan
> ============
> 
> So I propose the following:
>  1) we add the incomplete types mode to gimple_canonical_types_compatible_p
>     as suggested by this patch
>  2) I will add the individual globbing rules to the functions one by one
>  3) once done, we enable recursion on pointer
>  4) we drop the alias.c globing of pointer_type for in_lto_p.

I don't think we want to do 4), especially not only for in_lto_p.  I'm
not sure we should spend time on improving accuracy before fixing
correctness (on the current precision level).

> Independently on that I will send a patch for more fine grained globbing
> within alias.c for non-LTO compilation so we avoid cases where LTO alias
> sets would be actually stronger than what we do without LTO and also
> because this adds a lot of extra disambiguations to Firefox.
> 
> I hope the machinery will become flexible enough so we can add extra globbing
> rules if we find them necessary during the stage1. I would also porpose adding
> C standard compliant mode to gimple_canonical_types_compatible_p and use it to
> produce declaration mismatch warnings in lto-symtab when both decls come from
> C/C++ the same way as we already do ODR.  That is very useful sanity check that
> will warn us when our implementation is wrong and/or programs are often
> breaking it. We can drop the weird testcases to testsuite to be sure we won't
> regress in future.
> 
> Based on experience, we may switch to C/C++ compliant equivalence when we know
> that whole unit is C/C++ and we do not have to deal with odd cases from other
> languages.

I'm not convinced the patch is a step in the right direction.

Can we please first fix pointer handing for TYPE_CANONICAL by dropping
the TREE_CODE handling and fix union handling by only looking at
union size?

Originally I wanted to make LTO type compatibility really layout-based
(thus struct { int i[2]; } and struct { int i; int j; } match for 
example).

I wonder if for non-cross-language-LTO we shouldn't simply stream
TYPE_CANONICAL, at stream-in record a TYPE_CANONICAL vs. uses vector
map and "fix" up canonical types globally (we'd need to stream
"local" canonical types in the global trees then).

Richard.

> Honza
> 
> 	* tree.c (gimple_canonical_types_compatible_p): Turn
> 	TRUST_TYPE_CANONICAL into FLAGS parameter; support
> 	comparsion with MATCH_WITH_INCOMPLETE_TYPE.
> Index: lto/lto.c
> ===================================================================
> --- lto/lto.c	(revision 223632)
> +++ lto/lto.c	(working copy)
> @@ -295,7 +295,9 @@
>  static unsigned long num_canonical_type_hash_entries;
>  static unsigned long num_canonical_type_hash_queries;
>  
> -static void iterative_hash_canonical_type (tree type, inchash::hash &hstate);
> +static void iterative_hash_canonical_type (tree type, inchash::hash &hstate,
> +					   unsigned int flags
> +					     = MATCH_TYPE_CANONICAL);
>  static hashval_t gimple_canonical_type_hash (const void *p);
>  static void gimple_register_canonical_type_1 (tree t, hashval_t hash);
>  
> @@ -302,26 +304,36 @@
>  /* Returning a hash value for gimple type TYPE.
>  
>     The hash value returned is equal for types considered compatible
> -   by gimple_canonical_types_compatible_p.  */
> +   by gimple_canonical_types_compatible_p.  FLAGS values are interpretted
> +   same way as by this function.  */
>  
>  static hashval_t
> -hash_canonical_type (tree type)
> +hash_canonical_type (tree type, unsigned int flags = MATCH_TYPE_CANONICAL)
>  {
>    inchash::hash hstate;
>  
> -  /* We compute alias sets only for types that needs them.
> -     Be sure we do not recurse to something else as we can not hash incomplete
> -     types in a way they would have same hash value as compatible complete
> -     types.  */
> -  gcc_checking_assert (type_with_alias_set_p (type));
> +  /* Check that we encounter incomplete types only with
> +     MATCH_WITH_INCOMPLETE_TYPE.
>  
> +     Also check that no one tries to use hashing in compbination with 
> +     !MATCH_TYPE_CANONICAL.  In this case gimple_canonical_types_compatible_p
> +     is not transitive and thus does not produce equivalence on all types.  */
> +  gcc_checking_assert ((flags & MATCH_TYPE_CANONICAL)
> +		       && ((flags & MATCH_WITH_INCOMPLETE_TYPE)
> +			   || type_with_alias_set_p (type)));
> +
>    /* Combine a few common features of types so that types are grouped into
>       smaller sets; when searching for existing matching types to merge,
>       only existing types having the same features as the new type will be
>       checked.  */
>    hstate.add_int (TREE_CODE (type));
> -  hstate.add_int (TYPE_MODE (type));
>  
> +  /* Incomplete arrays and aggregates do not have TYPE_MODE defined.  */
> +  if (!(flags & MATCH_WITH_INCOMPLETE_TYPE)
> +      || (TREE_CODE (type) != ARRAY_TYPE
> +	  && !AGGREGATE_TYPE_P (type)))
> +    hstate.add_int (TYPE_MODE (type));
> +
>    /* Incorporate common features of numerical types.  */
>    if (INTEGRAL_TYPE_P (type)
>        || SCALAR_FLOAT_TYPE_P (type)
> @@ -355,7 +367,8 @@
>      hstate.add_int (TYPE_STRING_FLAG (type));
>  
>    /* For array types hash the domain bounds and the string flag.  */
> -  if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type))
> +  if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type)
> +      && !(flags & MATCH_WITH_INCOMPLETE_TYPE))
>      {
>        hstate.add_int (TYPE_STRING_FLAG (type));
>        /* OMP lowering can introduce error_mark_node in place of
> @@ -366,30 +379,35 @@
>  	inchash::add_expr (TYPE_MAX_VALUE (TYPE_DOMAIN (type)), hstate);
>      }
>  
> -  /* Recurse for aggregates with a single element type.  */
> +  /* Recurse for aggregates with a single element type.
> +     We are safe to drop MATCH_INCOMPLETE.  There is no way to build
> +     an array of incomplete types.   */
>    if (TREE_CODE (type) == ARRAY_TYPE
>        || TREE_CODE (type) == COMPLEX_TYPE
>        || TREE_CODE (type) == VECTOR_TYPE)
> -    iterative_hash_canonical_type (TREE_TYPE (type), hstate);
> +    iterative_hash_canonical_type (TREE_TYPE (type), hstate,
> +				   flags & ~MATCH_WITH_INCOMPLETE_TYPE);
>  
>    /* Incorporate function return and argument types.  */
>    if (TREE_CODE (type) == FUNCTION_TYPE || TREE_CODE (type) == METHOD_TYPE)
>      {
> -      unsigned na;
> -      tree p;
> +      iterative_hash_canonical_type (TREE_TYPE (type), hstate, flags);
>  
> -      iterative_hash_canonical_type (TREE_TYPE (type), hstate);
> -
> -      for (p = TYPE_ARG_TYPES (type), na = 0; p; p = TREE_CHAIN (p))
> +      if (!(flags & MATCH_WITH_INCOMPLETE_TYPE)
> +	  || TREE_CODE (type) == METHOD_TYPE)
>  	{
> -	  iterative_hash_canonical_type (TREE_VALUE (p), hstate);
> -	  na++;
> +	  unsigned na;
> +	  tree p;
> +	  for (p = TYPE_ARG_TYPES (type), na = 0; p; p = TREE_CHAIN (p))
> +	    {
> +	      iterative_hash_canonical_type (TREE_VALUE (p), hstate, flags);
> +	      na++;
> +	    }
> +	  hstate.add_int (na);
>  	}
> -
> -      hstate.add_int (na);
>      }
>  
> -  if (RECORD_OR_UNION_TYPE_P (type))
> +  if (RECORD_OR_UNION_TYPE_P (type) && !(flags & MATCH_WITH_INCOMPLETE_TYPE))
>      {
>        unsigned nf;
>        tree f;
> @@ -397,7 +415,7 @@
>        for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f))
>  	if (TREE_CODE (f) == FIELD_DECL)
>  	  {
> -	    iterative_hash_canonical_type (TREE_TYPE (f), hstate);
> +	    iterative_hash_canonical_type (TREE_TYPE (f), hstate, flags);
>  	    nf++;
>  	  }
>  
> @@ -407,14 +425,22 @@
>    return hstate.end();
>  }
>  
> -/* Returning a hash value for gimple type TYPE combined with VAL.  */
> +/* Returning a hash value for gimple type TYPE combined with VAL.
> +   FLAGS are the same as for gimple_conanical_types_compatible_p.  */
>  
>  static void
> -iterative_hash_canonical_type (tree type, inchash::hash &hstate)
> +iterative_hash_canonical_type (tree type, inchash::hash &hstate,
> +			       unsigned int flags)
>  {
>    hashval_t v;
> +
> +  /* TYPE_CANONICAL reflects equivalence classes with
> +     !MATCH_WITH_INCOMPLETE_TYPE.   If we are matching incomplete types,
> +     then we need to recurse.  */
> +  if (flags & MATCH_WITH_INCOMPLETE_TYPE)
> +    v = hash_canonical_type (type, flags);
>    /* An already processed type.  */
> -  if (TYPE_CANONICAL (type))
> +  else if (TYPE_CANONICAL (type))
>      {
>        type = TYPE_CANONICAL (type);
>        v = gimple_canonical_type_hash (type);
> @@ -497,6 +523,14 @@
>  {
>    if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t))
>      return;
> +  /* Only types that can be handled in memory need canonical types.
> +     Function and methods are never accessed. Also we do not need canonical
> +     types for incomplete types with exception of arrays - structures may end
> +     with incomplete arrays that may be referenced.  */
> +  if (TREE_CODE (t) == FUNCTION_TYPE || TREE_CODE (t) == METHOD_TYPE
> +      || (!COMPLETE_TYPE_P (t)
> +	  && (TREE_CODE (t) != ARRAY_TYPE || !COMPLETE_TYPE_P (TREE_TYPE (t)))))
> +    return;
>  
>    gimple_register_canonical_type_1 (t, hash_canonical_type (t));
>  }
> Index: tree.c
> ===================================================================
> --- tree.c	(revision 223633)
> +++ tree.c	(working copy)
> @@ -12702,12 +12702,24 @@
>  /* Return true iff T1 and T2 are structurally identical for what
>     TBAA is concerned.  
>     This function is used both by lto.c canonical type merging and by the
> -   verifier.  If TRUST_TYPE_CANONICAL we do not look into structure of types
> -   that have TYPE_CANONICAL defined and assume them equivalent.  */
> +   verifier.  
>  
> +   If flags sets MATCH_TYPE_CANONICAL we assume that TYPE_CANONICAL is set in a
> +   way that two types have the same canonical type if and only if
> +   gimple_canonical_types_compatible_p (t1,t2, 0) is true.  This is used
> +   to cut down recursion during LTO canonical type comptuation.  When this flag
> +   is set we also sanity check that we are going to produce equivalence relation
> +   that is needed to drive the hashtable in lto.c.
> +
> +   if MATCH_WITH_INCOMPLETE_TYPE is true, then we do not use any information
> +   from complete types and thus i.e. all RECORD_TYPE are equivlaent to other
> +   RECORD_TYPEs.  This is the only equivalence possible if one require
> +   incomplete type to be in the same equivalence class with all its
> +   completetions.  */
> +
>  bool
>  gimple_canonical_types_compatible_p (const_tree t1, const_tree t2,
> -				     bool trust_type_canonical)
> +				     unsigned int flags)
>  {
>    /* Before starting to set up the SCC machinery handle simple cases.  */
>  
> @@ -12719,28 +12731,28 @@
>    if (t1 == NULL_TREE || t2 == NULL_TREE)
>      return false;
>  
> -  /* We consider complete types always compatible with incomplete type.
> -     This does not make sense for canonical type calculation and thus we
> -     need to ensure that we are never called on it.
> +  /* Check that either flags allow incomplete types or both types are complete.
> +     This is necessary to ensure transitivity for canonical type merging.
>  
> -     FIXME: For more correctness the function probably should have three modes
> -	1) mode assuming that types are complete mathcing their structure
> -	2) mode allowing incomplete types but producing equivalence classes
> -	   and thus ignoring all info from complete types
> -	3) mode allowing incomplete types to match complete but checking
> -	   compatibility between complete types.
> +     FIXME: with !MATCH_TYPE_CANONICAL we probably should allow match between
> +     incomplete type and complete type as defined by language standards.  No
> +     one however rely on it so far.  */
>  
> -     1 and 2 can be used for canonical type calculation. 3 is the real
> -     definition of type compatibility that can be used i.e. for warnings during
> -     declaration merging.  */
> -
> -  gcc_assert (!trust_type_canonical
> +  gcc_assert ((flags & MATCH_WITH_INCOMPLETE_TYPE)
> +	      || !(flags & MATCH_TYPE_CANONICAL)
>  	      || (type_with_alias_set_p (t1) && type_with_alias_set_p (t2)));
>    /* If the types have been previously registered and found equal
>       they still are.  */
>    if (TYPE_CANONICAL (t1) && TYPE_CANONICAL (t2)
> -      && trust_type_canonical)
> -    return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2);
> +      && (flags & MATCH_TYPE_CANONICAL))
> +    {
> +      if (TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2))
> +	return true;
> +      /* TYPE_CANONICAL is always computed with an assumption that the type
> +	 is complete.  */
> +      if (!(flags & MATCH_WITH_INCOMPLETE_TYPE))
> +	return false;
> +    }
>  
>    /* Can't be the same type if the types don't have the same code.  */
>    if (TREE_CODE (t1) != TREE_CODE (t2))
> @@ -12753,8 +12765,12 @@
>        || TREE_CODE (t1) == NULLPTR_TYPE)
>      return true;
>  
> -  /* Can't be the same type if they have different mode.  */
> -  if (TYPE_MODE (t1) != TYPE_MODE (t2))
> +  /* Can't be the same type if they have different mode.
> +     Incomplete arrays and aggregates do not have TYPE_MODE defined.  */
> +  if ((!(flags & MATCH_WITH_INCOMPLETE_TYPE)
> +       || (TREE_CODE (t1) != ARRAY_TYPE
> +           && !AGGREGATE_TYPE_P (t1)))
> +      && TYPE_MODE (t1) != TYPE_MODE (t2))
>      return false;
>  
>    /* Non-aggregate types can be handled cheaply.  */
> @@ -12783,8 +12799,7 @@
>  	{
>  	  if (TYPE_ADDR_SPACE (TREE_TYPE (t1))
>  	      != TYPE_ADDR_SPACE (TREE_TYPE (t2)))
> -	    return false;
> -
> + 	    return false;
>  	  if (TREE_CODE (TREE_TYPE (t1)) != TREE_CODE (TREE_TYPE (t2)))
>  	    return false;
>  	}
> @@ -12792,9 +12807,9 @@
>        /* Tail-recurse to components.  */
>        if (TREE_CODE (t1) == VECTOR_TYPE
>  	  || TREE_CODE (t1) == COMPLEX_TYPE)
> -	return gimple_canonical_types_compatible_p (TREE_TYPE (t1),
> -						    TREE_TYPE (t2),
> -						    trust_type_canonical);
> +	return gimple_canonical_types_compatible_p
> +		 (TREE_TYPE (t1), TREE_TYPE (t2),
> +		  flags & ~MATCH_WITH_INCOMPLETE_TYPE);
>  
>        return true;
>      }
> @@ -12804,12 +12819,20 @@
>      {
>      case ARRAY_TYPE:
>        /* Array types are the same if the element types are the same and
> -	 the number of elements are the same.  */
> -      if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2),
> -						trust_type_canonical)
> +	 the number of elements are the same.
> +
> +	 When MATCH_WITH_INCOMPLETE_TYPE is set, bypass the check
> +	 on number of elements.
> +	 When recursing, clear MATCH_WITH_INCOMPLETE_TYPE because there is
> +	 no way to make incomplete array of array.  */
> +      if (!gimple_canonical_types_compatible_p
> +	     (TREE_TYPE (t1), TREE_TYPE (t2),
> +	      flags & ~MATCH_WITH_INCOMPLETE_TYPE)
>  	  || TYPE_STRING_FLAG (t1) != TYPE_STRING_FLAG (t2)
>  	  || TYPE_NONALIASED_COMPONENT (t1) != TYPE_NONALIASED_COMPONENT (t2))
>  	return false;
> +      else if (flags & MATCH_WITH_INCOMPLETE_TYPE)
> +	return true;
>        else
>  	{
>  	  tree i1 = TYPE_DOMAIN (t1);
> @@ -12848,11 +12871,19 @@
>      case METHOD_TYPE:
>      case FUNCTION_TYPE:
>        /* Function types are the same if the return type and arguments types
> -	 are the same.  */
> +	 are the same.
> +	 It is possible that function pointers have return values and parameters
> +	 of incomplete types; permit that by not clearing
> +	 MATCH_WITH_INCOMPLETE_TYPE  */
>        if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2),
> -						trust_type_canonical))
> +						flags))
>  	return false;
>  
> +      /* We must permit a match between !prototype_p and prototype_p for
> +	 functions; methods are never !prototype_p.  */
> +      if ((flags & MATCH_WITH_INCOMPLETE_TYPE)
> +	  && TREE_CODE (t1) == FUNCTION_TYPE)
> +	return true;
>        if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
>  	return true;
>        else
> @@ -12864,8 +12895,7 @@
>  	       parms1 = TREE_CHAIN (parms1), parms2 = TREE_CHAIN (parms2))
>  	    {
>  	      if (!gimple_canonical_types_compatible_p
> -		     (TREE_VALUE (parms1), TREE_VALUE (parms2),
> -		      trust_type_canonical))
> +		     (TREE_VALUE (parms1), TREE_VALUE (parms2), flags))
>  		return false;
>  	    }
>  
> @@ -12881,6 +12911,15 @@
>        {
>  	tree f1, f2;
>  
> +	/* C standrad require incomplete structures and unions to be
> +	   considered compatible with complete ones regardless their TYPE_NAME
> +	   when they come from different translation units.
> +	   We must consider transitive closure here, so 
> +	   every structure/union is equivalent to each other.  */
> +	   
> +	if (flags & MATCH_WITH_INCOMPLETE_TYPE)
> +	  return true;
> +
>  	/* For aggregate types, all the fields must be the same.  */
>  	for (f1 = TYPE_FIELDS (t1), f2 = TYPE_FIELDS (t2);
>  	     f1 || f2;
> @@ -12897,8 +12936,7 @@
>  	    if (DECL_NONADDRESSABLE_P (f1) != DECL_NONADDRESSABLE_P (f2)
>  		|| !gimple_compare_field_offset (f1, f2)
>  		|| !gimple_canonical_types_compatible_p
> -		      (TREE_TYPE (f1), TREE_TYPE (f2),
> -		       trust_type_canonical))
> +		      (TREE_TYPE (f1), TREE_TYPE (f2), flags))
>  	      return false;
>  	  }
>  
> @@ -12961,7 +12999,7 @@
>  	      with variably sized arrays because their sizes possibly
>  	      gimplified to different variables.  */
>  	   && !variably_modified_type_p (ct, NULL)
> -	   && !gimple_canonical_types_compatible_p (t, ct, false))
> +	   && !gimple_canonical_types_compatible_p (t, ct, 0))
>      {
>        error ("TYPE_CANONICAL is not compatible");
>        debug_tree (ct);
> Index: tree.h
> ===================================================================
> --- tree.h	(revision 223632)
> +++ tree.h	(working copy)
> @@ -4569,9 +4569,21 @@
>  extern unsigned int tree_map_base_hash (const void *);
>  extern int tree_map_base_marked_p (const void *);
>  extern void DEBUG_FUNCTION verify_type (const_tree t);
> -extern bool gimple_canonical_types_compatible_p (const_tree, const_tree,
> -						 bool trust_type_canonical = true);
>  
> +/* Flags used by gimple_canonical_types_compatible_p.  */
> +enum gimple_canonical_types_compatible_flags
> +  {
> +    /* Asume that TYPE_CANONICAL is set in a way that two types have
> +       the same canonical type if and only if
> +       gimple_canonical_types_compatible_p (t1,t2, 0) is true.  */
> +    MATCH_TYPE_CANONICAL = 1,
> +    /* Match all types as if they were incomplete.  */
> +    MATCH_WITH_INCOMPLETE_TYPE = 2
> +  };
> +extern bool gimple_canonical_types_compatible_p
> +    (const_tree, const_tree,
> +     unsigned int flags = MATCH_TYPE_CANONICAL);
> +
>  #define tree_map_eq tree_map_base_eq
>  extern unsigned int tree_map_hash (const void *);
>  #define tree_map_marked_p tree_map_base_marked_p
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-26  8:45 ` Richard Biener
@ 2015-05-26 17:36   ` Jan Hubicka
  2015-05-27  8:38     ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2015-05-26 17:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, gcc-patches

> > Now the change does not really translate to great increase of disambiguations
> > for Firefox (it seems more in noise). The reason is the pointer_type globbing
> > in alias.c.
> 
> Yeah, we only get the improvement because of some "hack" in the tree
> alias oracle which also uses the base object for TBAA.

Why that is hack? Dereferencing a pointer makes it clear the type of memory location
pointed to is known, we should use that info.
> 
> Yeah, we should fix that.  And in fact, for cross-language LTO I don't
> see why
> 
>   union { int a; char c; };
> 
> and
> 
>   union { int a; short s; };
> 
> should not be compatible - they have a common member after all.  So
> I'd like to glob all unions that have the same size (and as improvement

Well, none of language standards I saw so far expect this to happen.
Going to extremes, you can always put variable sized char array to union
and by transitivity glob everything with everything.

> over that, that have at least one compatible member).  That also get's
> rid of the issue that we'd need to sort union members for the comparison
> to avoid quadraticness (as long as we don't check for that one compatible
> member).

Yeah, sorting is possible by using the hash values.
> 
> Oh, and are
> 
>   union { int a; };
> 
> and
> 
>   struct { int a; };
> 
> not compatible?  They are layout-wise at least.  Likewise the struct
> and union { int a; short s; } with the same argument as the two-union
> case.

Applying this rule you have

union { char a[n]; } compatible with every union and thus also
union {int a;}
struct { int a;}
int a;

Which would disable TBAA completely.
> 
> >  We also do not compare alignments. This is probably not important)
> 
> Correct - alignment doesn't enter TBAA.

Yep, I think the alignment compare in C standard basically is there to say that
structures must have same lyaout.
> > void f(double (* restrict a)[5]);
> > void f(double a[restrict][5]);
> > void f(double a[restrict 3][5]);
> > void f(double a[restrict static 3][5]);)
> 
> Not sure why you get into functions here at all ...

Basically it matters only if we want to disambiguate function pointers.
> 
> >   2  Each enumerated type shall be compatible with char ,  a  signed integer
> >      type, or an unsigned integer type. The choice of type is
> >      implementation-defined, but  shall be capable of representing the values
> >      of all the members of the enumeration.    The enumerated type is
> >      incomplete until immediately after the that terminates the list of
> >      enumerator declarations, and complete thereafter.
> > 
> > (we ignore this completely as far as I know, it is easy to fix though, all
> >  we need is to make ENUMERATION_TYPE pretend to be INTEGER_TYPE)
> 
> Yes, we don't make a distinction between ENUMERAL_TYPE and INTEGER_TYPE.

hstate.add_int (TREE_CODE (type));

makes them different.  I think we want to produce "simplified" code that turns
REFERENCE_TYPE to POINTER_TYPE and ENUMERAL_TYPE to INTEGER_TYPE.
I will send patch fo that.
> 
> >   10 For two qualified types to be compatible, both shall have the identically
> >      qualified version of a compatible type; the order of type qualifiers
> >      within a list of specifiers or qualifiers does not affect the specified type.
> > 
> > Now I think in order to get C standard type compatiblity to imply
> > gimple_canonical_types_compatible we need to implement all the above globbing
> > rules as part of canonical type computation, not only punt at pointers in
> > alias.c
> > 
> > My reading is that for example
> > 
> > struct a {char *a;};
> > 
> > is compatible with
> > 
> > struct a {enum *a;};
> > 
> > defined in other compilation unit.
> 
> Yes, as said above the TREE_CODE trick in the pointer-type handing is
> wrong.  We can as well just drop it ...

struct a {char a;};

is compatible with

struct a {enum a;};

I would say we just want to simplify the codes and peel for
pointers instead of TREE_TYPE (t) compare look for actual pointed to type
(peeling out POINTER_TYPE/RECORD_TYPE/ARRAY_TYPEs)
> >  8) i think to be correct by C language standard we need to glob enum 
> > with char
> >     tough I do not quite see how standard conforming program should use 
> >     it given that standard does not say if it is char/unsigned 
> >     char/signed char.
> 
> I think it depends on the actual enum, no?  So for forward declarations
> like
> 
>  enum Foo;
>  struct X { enum Foo *p; };
> 
> you face the same issue as with void *.

Well, handling all enums as integers should solve this.
> > Overall plan
> > ============
> > 
> > So I propose the following:
> >  1) we add the incomplete types mode to gimple_canonical_types_compatible_p
> >     as suggested by this patch
> >  2) I will add the individual globbing rules to the functions one by one
> >  3) once done, we enable recursion on pointer
> >  4) we drop the alias.c globing of pointer_type for in_lto_p.
> 
> I don't think we want to do 4), especially not only for in_lto_p.  I'm
> not sure we should spend time on improving accuracy before fixing
> correctness (on the current precision level).

Yep, fixing correctness is 2).  I will look into that.
> > Based on experience, we may switch to C/C++ compliant equivalence when we know
> > that whole unit is C/C++ and we do not have to deal with odd cases from other
> > languages.
> 
> I'm not convinced the patch is a step in the right direction.
> 
> Can we please first fix pointer handing for TYPE_CANONICAL by dropping
> the TREE_CODE handling and fix union handling by only looking at
> union size?

OK, I will send patch for TYPE_CODE compare.
I am not quite convinced about the unions as per example abov.
> 
> Originally I wanted to make LTO type compatibility really layout-based
> (thus struct { int i[2]; } and struct { int i; int j; } match for 
> example).
> 
> I wonder if for non-cross-language-LTO we shouldn't simply stream
> TYPE_CANONICAL, at stream-in record a TYPE_CANONICAL vs. uses vector
> map and "fix" up canonical types globally (we'd need to stream
> "local" canonical types in the global trees then).

Hmm, can you explain bit more what you have in mind?  I did play
with non-cross-language canonical type streaming for anonymous
namespace C++ types (that by definition can not be compatible
with anything from other language).
I can return to that.

Honza
> 
> Richard.
> 
> > Honza
> > 
> > 	* tree.c (gimple_canonical_types_compatible_p): Turn
> > 	TRUST_TYPE_CANONICAL into FLAGS parameter; support
> > 	comparsion with MATCH_WITH_INCOMPLETE_TYPE.
> > Index: lto/lto.c
> > ===================================================================
> > --- lto/lto.c	(revision 223632)
> > +++ lto/lto.c	(working copy)
> > @@ -295,7 +295,9 @@
> >  static unsigned long num_canonical_type_hash_entries;
> >  static unsigned long num_canonical_type_hash_queries;
> >  
> > -static void iterative_hash_canonical_type (tree type, inchash::hash &hstate);
> > +static void iterative_hash_canonical_type (tree type, inchash::hash &hstate,
> > +					   unsigned int flags
> > +					     = MATCH_TYPE_CANONICAL);
> >  static hashval_t gimple_canonical_type_hash (const void *p);
> >  static void gimple_register_canonical_type_1 (tree t, hashval_t hash);
> >  
> > @@ -302,26 +304,36 @@
> >  /* Returning a hash value for gimple type TYPE.
> >  
> >     The hash value returned is equal for types considered compatible
> > -   by gimple_canonical_types_compatible_p.  */
> > +   by gimple_canonical_types_compatible_p.  FLAGS values are interpretted
> > +   same way as by this function.  */
> >  
> >  static hashval_t
> > -hash_canonical_type (tree type)
> > +hash_canonical_type (tree type, unsigned int flags = MATCH_TYPE_CANONICAL)
> >  {
> >    inchash::hash hstate;
> >  
> > -  /* We compute alias sets only for types that needs them.
> > -     Be sure we do not recurse to something else as we can not hash incomplete
> > -     types in a way they would have same hash value as compatible complete
> > -     types.  */
> > -  gcc_checking_assert (type_with_alias_set_p (type));
> > +  /* Check that we encounter incomplete types only with
> > +     MATCH_WITH_INCOMPLETE_TYPE.
> >  
> > +     Also check that no one tries to use hashing in compbination with 
> > +     !MATCH_TYPE_CANONICAL.  In this case gimple_canonical_types_compatible_p
> > +     is not transitive and thus does not produce equivalence on all types.  */
> > +  gcc_checking_assert ((flags & MATCH_TYPE_CANONICAL)
> > +		       && ((flags & MATCH_WITH_INCOMPLETE_TYPE)
> > +			   || type_with_alias_set_p (type)));
> > +
> >    /* Combine a few common features of types so that types are grouped into
> >       smaller sets; when searching for existing matching types to merge,
> >       only existing types having the same features as the new type will be
> >       checked.  */
> >    hstate.add_int (TREE_CODE (type));
> > -  hstate.add_int (TYPE_MODE (type));
> >  
> > +  /* Incomplete arrays and aggregates do not have TYPE_MODE defined.  */
> > +  if (!(flags & MATCH_WITH_INCOMPLETE_TYPE)
> > +      || (TREE_CODE (type) != ARRAY_TYPE
> > +	  && !AGGREGATE_TYPE_P (type)))
> > +    hstate.add_int (TYPE_MODE (type));
> > +
> >    /* Incorporate common features of numerical types.  */
> >    if (INTEGRAL_TYPE_P (type)
> >        || SCALAR_FLOAT_TYPE_P (type)
> > @@ -355,7 +367,8 @@
> >      hstate.add_int (TYPE_STRING_FLAG (type));
> >  
> >    /* For array types hash the domain bounds and the string flag.  */
> > -  if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type))
> > +  if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type)
> > +      && !(flags & MATCH_WITH_INCOMPLETE_TYPE))
> >      {
> >        hstate.add_int (TYPE_STRING_FLAG (type));
> >        /* OMP lowering can introduce error_mark_node in place of
> > @@ -366,30 +379,35 @@
> >  	inchash::add_expr (TYPE_MAX_VALUE (TYPE_DOMAIN (type)), hstate);
> >      }
> >  
> > -  /* Recurse for aggregates with a single element type.  */
> > +  /* Recurse for aggregates with a single element type.
> > +     We are safe to drop MATCH_INCOMPLETE.  There is no way to build
> > +     an array of incomplete types.   */
> >    if (TREE_CODE (type) == ARRAY_TYPE
> >        || TREE_CODE (type) == COMPLEX_TYPE
> >        || TREE_CODE (type) == VECTOR_TYPE)
> > -    iterative_hash_canonical_type (TREE_TYPE (type), hstate);
> > +    iterative_hash_canonical_type (TREE_TYPE (type), hstate,
> > +				   flags & ~MATCH_WITH_INCOMPLETE_TYPE);
> >  
> >    /* Incorporate function return and argument types.  */
> >    if (TREE_CODE (type) == FUNCTION_TYPE || TREE_CODE (type) == METHOD_TYPE)
> >      {
> > -      unsigned na;
> > -      tree p;
> > +      iterative_hash_canonical_type (TREE_TYPE (type), hstate, flags);
> >  
> > -      iterative_hash_canonical_type (TREE_TYPE (type), hstate);
> > -
> > -      for (p = TYPE_ARG_TYPES (type), na = 0; p; p = TREE_CHAIN (p))
> > +      if (!(flags & MATCH_WITH_INCOMPLETE_TYPE)
> > +	  || TREE_CODE (type) == METHOD_TYPE)
> >  	{
> > -	  iterative_hash_canonical_type (TREE_VALUE (p), hstate);
> > -	  na++;
> > +	  unsigned na;
> > +	  tree p;
> > +	  for (p = TYPE_ARG_TYPES (type), na = 0; p; p = TREE_CHAIN (p))
> > +	    {
> > +	      iterative_hash_canonical_type (TREE_VALUE (p), hstate, flags);
> > +	      na++;
> > +	    }
> > +	  hstate.add_int (na);
> >  	}
> > -
> > -      hstate.add_int (na);
> >      }
> >  
> > -  if (RECORD_OR_UNION_TYPE_P (type))
> > +  if (RECORD_OR_UNION_TYPE_P (type) && !(flags & MATCH_WITH_INCOMPLETE_TYPE))
> >      {
> >        unsigned nf;
> >        tree f;
> > @@ -397,7 +415,7 @@
> >        for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f))
> >  	if (TREE_CODE (f) == FIELD_DECL)
> >  	  {
> > -	    iterative_hash_canonical_type (TREE_TYPE (f), hstate);
> > +	    iterative_hash_canonical_type (TREE_TYPE (f), hstate, flags);
> >  	    nf++;
> >  	  }
> >  
> > @@ -407,14 +425,22 @@
> >    return hstate.end();
> >  }
> >  
> > -/* Returning a hash value for gimple type TYPE combined with VAL.  */
> > +/* Returning a hash value for gimple type TYPE combined with VAL.
> > +   FLAGS are the same as for gimple_conanical_types_compatible_p.  */
> >  
> >  static void
> > -iterative_hash_canonical_type (tree type, inchash::hash &hstate)
> > +iterative_hash_canonical_type (tree type, inchash::hash &hstate,
> > +			       unsigned int flags)
> >  {
> >    hashval_t v;
> > +
> > +  /* TYPE_CANONICAL reflects equivalence classes with
> > +     !MATCH_WITH_INCOMPLETE_TYPE.   If we are matching incomplete types,
> > +     then we need to recurse.  */
> > +  if (flags & MATCH_WITH_INCOMPLETE_TYPE)
> > +    v = hash_canonical_type (type, flags);
> >    /* An already processed type.  */
> > -  if (TYPE_CANONICAL (type))
> > +  else if (TYPE_CANONICAL (type))
> >      {
> >        type = TYPE_CANONICAL (type);
> >        v = gimple_canonical_type_hash (type);
> > @@ -497,6 +523,14 @@
> >  {
> >    if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t))
> >      return;
> > +  /* Only types that can be handled in memory need canonical types.
> > +     Function and methods are never accessed. Also we do not need canonical
> > +     types for incomplete types with exception of arrays - structures may end
> > +     with incomplete arrays that may be referenced.  */
> > +  if (TREE_CODE (t) == FUNCTION_TYPE || TREE_CODE (t) == METHOD_TYPE
> > +      || (!COMPLETE_TYPE_P (t)
> > +	  && (TREE_CODE (t) != ARRAY_TYPE || !COMPLETE_TYPE_P (TREE_TYPE (t)))))
> > +    return;
> >  
> >    gimple_register_canonical_type_1 (t, hash_canonical_type (t));
> >  }
> > Index: tree.c
> > ===================================================================
> > --- tree.c	(revision 223633)
> > +++ tree.c	(working copy)
> > @@ -12702,12 +12702,24 @@
> >  /* Return true iff T1 and T2 are structurally identical for what
> >     TBAA is concerned.  
> >     This function is used both by lto.c canonical type merging and by the
> > -   verifier.  If TRUST_TYPE_CANONICAL we do not look into structure of types
> > -   that have TYPE_CANONICAL defined and assume them equivalent.  */
> > +   verifier.  
> >  
> > +   If flags sets MATCH_TYPE_CANONICAL we assume that TYPE_CANONICAL is set in a
> > +   way that two types have the same canonical type if and only if
> > +   gimple_canonical_types_compatible_p (t1,t2, 0) is true.  This is used
> > +   to cut down recursion during LTO canonical type comptuation.  When this flag
> > +   is set we also sanity check that we are going to produce equivalence relation
> > +   that is needed to drive the hashtable in lto.c.
> > +
> > +   if MATCH_WITH_INCOMPLETE_TYPE is true, then we do not use any information
> > +   from complete types and thus i.e. all RECORD_TYPE are equivlaent to other
> > +   RECORD_TYPEs.  This is the only equivalence possible if one require
> > +   incomplete type to be in the same equivalence class with all its
> > +   completetions.  */
> > +
> >  bool
> >  gimple_canonical_types_compatible_p (const_tree t1, const_tree t2,
> > -				     bool trust_type_canonical)
> > +				     unsigned int flags)
> >  {
> >    /* Before starting to set up the SCC machinery handle simple cases.  */
> >  
> > @@ -12719,28 +12731,28 @@
> >    if (t1 == NULL_TREE || t2 == NULL_TREE)
> >      return false;
> >  
> > -  /* We consider complete types always compatible with incomplete type.
> > -     This does not make sense for canonical type calculation and thus we
> > -     need to ensure that we are never called on it.
> > +  /* Check that either flags allow incomplete types or both types are complete.
> > +     This is necessary to ensure transitivity for canonical type merging.
> >  
> > -     FIXME: For more correctness the function probably should have three modes
> > -	1) mode assuming that types are complete mathcing their structure
> > -	2) mode allowing incomplete types but producing equivalence classes
> > -	   and thus ignoring all info from complete types
> > -	3) mode allowing incomplete types to match complete but checking
> > -	   compatibility between complete types.
> > +     FIXME: with !MATCH_TYPE_CANONICAL we probably should allow match between
> > +     incomplete type and complete type as defined by language standards.  No
> > +     one however rely on it so far.  */
> >  
> > -     1 and 2 can be used for canonical type calculation. 3 is the real
> > -     definition of type compatibility that can be used i.e. for warnings during
> > -     declaration merging.  */
> > -
> > -  gcc_assert (!trust_type_canonical
> > +  gcc_assert ((flags & MATCH_WITH_INCOMPLETE_TYPE)
> > +	      || !(flags & MATCH_TYPE_CANONICAL)
> >  	      || (type_with_alias_set_p (t1) && type_with_alias_set_p (t2)));
> >    /* If the types have been previously registered and found equal
> >       they still are.  */
> >    if (TYPE_CANONICAL (t1) && TYPE_CANONICAL (t2)
> > -      && trust_type_canonical)
> > -    return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2);
> > +      && (flags & MATCH_TYPE_CANONICAL))
> > +    {
> > +      if (TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2))
> > +	return true;
> > +      /* TYPE_CANONICAL is always computed with an assumption that the type
> > +	 is complete.  */
> > +      if (!(flags & MATCH_WITH_INCOMPLETE_TYPE))
> > +	return false;
> > +    }
> >  
> >    /* Can't be the same type if the types don't have the same code.  */
> >    if (TREE_CODE (t1) != TREE_CODE (t2))
> > @@ -12753,8 +12765,12 @@
> >        || TREE_CODE (t1) == NULLPTR_TYPE)
> >      return true;
> >  
> > -  /* Can't be the same type if they have different mode.  */
> > -  if (TYPE_MODE (t1) != TYPE_MODE (t2))
> > +  /* Can't be the same type if they have different mode.
> > +     Incomplete arrays and aggregates do not have TYPE_MODE defined.  */
> > +  if ((!(flags & MATCH_WITH_INCOMPLETE_TYPE)
> > +       || (TREE_CODE (t1) != ARRAY_TYPE
> > +           && !AGGREGATE_TYPE_P (t1)))
> > +      && TYPE_MODE (t1) != TYPE_MODE (t2))
> >      return false;
> >  
> >    /* Non-aggregate types can be handled cheaply.  */
> > @@ -12783,8 +12799,7 @@
> >  	{
> >  	  if (TYPE_ADDR_SPACE (TREE_TYPE (t1))
> >  	      != TYPE_ADDR_SPACE (TREE_TYPE (t2)))
> > -	    return false;
> > -
> > + 	    return false;
> >  	  if (TREE_CODE (TREE_TYPE (t1)) != TREE_CODE (TREE_TYPE (t2)))
> >  	    return false;
> >  	}
> > @@ -12792,9 +12807,9 @@
> >        /* Tail-recurse to components.  */
> >        if (TREE_CODE (t1) == VECTOR_TYPE
> >  	  || TREE_CODE (t1) == COMPLEX_TYPE)
> > -	return gimple_canonical_types_compatible_p (TREE_TYPE (t1),
> > -						    TREE_TYPE (t2),
> > -						    trust_type_canonical);
> > +	return gimple_canonical_types_compatible_p
> > +		 (TREE_TYPE (t1), TREE_TYPE (t2),
> > +		  flags & ~MATCH_WITH_INCOMPLETE_TYPE);
> >  
> >        return true;
> >      }
> > @@ -12804,12 +12819,20 @@
> >      {
> >      case ARRAY_TYPE:
> >        /* Array types are the same if the element types are the same and
> > -	 the number of elements are the same.  */
> > -      if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2),
> > -						trust_type_canonical)
> > +	 the number of elements are the same.
> > +
> > +	 When MATCH_WITH_INCOMPLETE_TYPE is set, bypass the check
> > +	 on number of elements.
> > +	 When recursing, clear MATCH_WITH_INCOMPLETE_TYPE because there is
> > +	 no way to make incomplete array of array.  */
> > +      if (!gimple_canonical_types_compatible_p
> > +	     (TREE_TYPE (t1), TREE_TYPE (t2),
> > +	      flags & ~MATCH_WITH_INCOMPLETE_TYPE)
> >  	  || TYPE_STRING_FLAG (t1) != TYPE_STRING_FLAG (t2)
> >  	  || TYPE_NONALIASED_COMPONENT (t1) != TYPE_NONALIASED_COMPONENT (t2))
> >  	return false;
> > +      else if (flags & MATCH_WITH_INCOMPLETE_TYPE)
> > +	return true;
> >        else
> >  	{
> >  	  tree i1 = TYPE_DOMAIN (t1);
> > @@ -12848,11 +12871,19 @@
> >      case METHOD_TYPE:
> >      case FUNCTION_TYPE:
> >        /* Function types are the same if the return type and arguments types
> > -	 are the same.  */
> > +	 are the same.
> > +	 It is possible that function pointers have return values and parameters
> > +	 of incomplete types; permit that by not clearing
> > +	 MATCH_WITH_INCOMPLETE_TYPE  */
> >        if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2),
> > -						trust_type_canonical))
> > +						flags))
> >  	return false;
> >  
> > +      /* We must permit a match between !prototype_p and prototype_p for
> > +	 functions; methods are never !prototype_p.  */
> > +      if ((flags & MATCH_WITH_INCOMPLETE_TYPE)
> > +	  && TREE_CODE (t1) == FUNCTION_TYPE)
> > +	return true;
> >        if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
> >  	return true;
> >        else
> > @@ -12864,8 +12895,7 @@
> >  	       parms1 = TREE_CHAIN (parms1), parms2 = TREE_CHAIN (parms2))
> >  	    {
> >  	      if (!gimple_canonical_types_compatible_p
> > -		     (TREE_VALUE (parms1), TREE_VALUE (parms2),
> > -		      trust_type_canonical))
> > +		     (TREE_VALUE (parms1), TREE_VALUE (parms2), flags))
> >  		return false;
> >  	    }
> >  
> > @@ -12881,6 +12911,15 @@
> >        {
> >  	tree f1, f2;
> >  
> > +	/* C standrad require incomplete structures and unions to be
> > +	   considered compatible with complete ones regardless their TYPE_NAME
> > +	   when they come from different translation units.
> > +	   We must consider transitive closure here, so 
> > +	   every structure/union is equivalent to each other.  */
> > +	   
> > +	if (flags & MATCH_WITH_INCOMPLETE_TYPE)
> > +	  return true;
> > +
> >  	/* For aggregate types, all the fields must be the same.  */
> >  	for (f1 = TYPE_FIELDS (t1), f2 = TYPE_FIELDS (t2);
> >  	     f1 || f2;
> > @@ -12897,8 +12936,7 @@
> >  	    if (DECL_NONADDRESSABLE_P (f1) != DECL_NONADDRESSABLE_P (f2)
> >  		|| !gimple_compare_field_offset (f1, f2)
> >  		|| !gimple_canonical_types_compatible_p
> > -		      (TREE_TYPE (f1), TREE_TYPE (f2),
> > -		       trust_type_canonical))
> > +		      (TREE_TYPE (f1), TREE_TYPE (f2), flags))
> >  	      return false;
> >  	  }
> >  
> > @@ -12961,7 +12999,7 @@
> >  	      with variably sized arrays because their sizes possibly
> >  	      gimplified to different variables.  */
> >  	   && !variably_modified_type_p (ct, NULL)
> > -	   && !gimple_canonical_types_compatible_p (t, ct, false))
> > +	   && !gimple_canonical_types_compatible_p (t, ct, 0))
> >      {
> >        error ("TYPE_CANONICAL is not compatible");
> >        debug_tree (ct);
> > Index: tree.h
> > ===================================================================
> > --- tree.h	(revision 223632)
> > +++ tree.h	(working copy)
> > @@ -4569,9 +4569,21 @@
> >  extern unsigned int tree_map_base_hash (const void *);
> >  extern int tree_map_base_marked_p (const void *);
> >  extern void DEBUG_FUNCTION verify_type (const_tree t);
> > -extern bool gimple_canonical_types_compatible_p (const_tree, const_tree,
> > -						 bool trust_type_canonical = true);
> >  
> > +/* Flags used by gimple_canonical_types_compatible_p.  */
> > +enum gimple_canonical_types_compatible_flags
> > +  {
> > +    /* Asume that TYPE_CANONICAL is set in a way that two types have
> > +       the same canonical type if and only if
> > +       gimple_canonical_types_compatible_p (t1,t2, 0) is true.  */
> > +    MATCH_TYPE_CANONICAL = 1,
> > +    /* Match all types as if they were incomplete.  */
> > +    MATCH_WITH_INCOMPLETE_TYPE = 2
> > +  };
> > +extern bool gimple_canonical_types_compatible_p
> > +    (const_tree, const_tree,
> > +     unsigned int flags = MATCH_TYPE_CANONICAL);
> > +
> >  #define tree_map_eq tree_map_base_eq
> >  extern unsigned int tree_map_hash (const void *);
> >  #define tree_map_marked_p tree_map_base_marked_p
> > 
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-26 17:36   ` Jan Hubicka
@ 2015-05-27  8:38     ` Richard Biener
  2015-05-27 22:53       ` Jan Hubicka
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2015-05-27  8:38 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

On Tue, 26 May 2015, Jan Hubicka wrote:

> > > Now the change does not really translate to great increase of disambiguations
> > > for Firefox (it seems more in noise). The reason is the pointer_type globbing
> > > in alias.c.
> > 
> > Yeah, we only get the improvement because of some "hack" in the tree
> > alias oracle which also uses the base object for TBAA.
> 
> Why that is hack? Dereferencing a pointer makes it clear the type of memory location
> pointed to is known, we should use that info.
> > 
> > Yeah, we should fix that.  And in fact, for cross-language LTO I don't
> > see why
> > 
> >   union { int a; char c; };
> > 
> > and
> > 
> >   union { int a; short s; };
> > 
> > should not be compatible - they have a common member after all.  So
> > I'd like to glob all unions that have the same size (and as improvement
> 
> Well, none of language standards I saw so far expect this to happen.
> Going to extremes, you can always put variable sized char array to union
> and by transitivity glob everything with everything.

I'm speaking of cross-language LTO - that leaves the language standards
territorry and requires us to apply common sense.

> > over that, that have at least one compatible member).  That also get's
> > rid of the issue that we'd need to sort union members for the comparison
> > to avoid quadraticness (as long as we don't check for that one compatible
> > member).
> 
> Yeah, sorting is possible by using the hash values.
> > 
> > Oh, and are
> > 
> >   union { int a; };
> > 
> > and
> > 
> >   struct { int a; };
> > 
> > not compatible?  They are layout-wise at least.  Likewise the struct
> > and union { int a; short s; } with the same argument as the two-union
> > case.
> 
> Applying this rule you have
> 
> union { char a[n]; } compatible with every union and thus also
> union {int a;}
> struct { int a;}
> int a;
> 
> Which would disable TBAA completely.

See ;)  At least we have the int a; vs. struct { int a; } issue
with Fortran vs. C compatibility (there is even a PR about this).

> > 
> > >  We also do not compare alignments. This is probably not important)
> > 
> > Correct - alignment doesn't enter TBAA.
> 
> Yep, I think the alignment compare in C standard basically is there to 
> say that structures must have same lyaout.
> > > void f(double (* restrict a)[5]);
> > > void f(double a[restrict][5]);
> > > void f(double a[restrict 3][5]);
> > > void f(double a[restrict static 3][5]);)
> > 
> > Not sure why you get into functions here at all ...
> 
> Basically it matters only if we want to disambiguate function pointers.
> > 
> > >   2  Each enumerated type shall be compatible with char ,  a  signed integer
> > >      type, or an unsigned integer type. The choice of type is
> > >      implementation-defined, but  shall be capable of representing the values
> > >      of all the members of the enumeration.    The enumerated type is
> > >      incomplete until immediately after the that terminates the list of
> > >      enumerator declarations, and complete thereafter.
> > > 
> > > (we ignore this completely as far as I know, it is easy to fix though, all
> > >  we need is to make ENUMERATION_TYPE pretend to be INTEGER_TYPE)
> > 
> > Yes, we don't make a distinction between ENUMERAL_TYPE and INTEGER_TYPE.
> 
> hstate.add_int (TREE_CODE (type));

in alias.c I mean.

> makes them different.  I think we want to produce "simplified" code that turns
> REFERENCE_TYPE to POINTER_TYPE and ENUMERAL_TYPE to INTEGER_TYPE.
> I will send patch fo that.

Thanks.

> > 
> > >   10 For two qualified types to be compatible, both shall have the identically
> > >      qualified version of a compatible type; the order of type qualifiers
> > >      within a list of specifiers or qualifiers does not affect the specified type.
> > > 
> > > Now I think in order to get C standard type compatiblity to imply
> > > gimple_canonical_types_compatible we need to implement all the above globbing
> > > rules as part of canonical type computation, not only punt at pointers in
> > > alias.c
> > > 
> > > My reading is that for example
> > > 
> > > struct a {char *a;};
> > > 
> > > is compatible with
> > > 
> > > struct a {enum *a;};
> > > 
> > > defined in other compilation unit.
> > 
> > Yes, as said above the TREE_CODE trick in the pointer-type handing is
> > wrong.  We can as well just drop it ...
> 
> struct a {char a;};
> 
> is compatible with
> 
> struct a {enum a;};
> 
> I would say we just want to simplify the codes and peel for
> pointers instead of TREE_TYPE (t) compare look for actual pointed to type
> (peeling out POINTER_TYPE/RECORD_TYPE/ARRAY_TYPEs)
> > >  8) i think to be correct by C language standard we need to glob enum 
> > > with char
> > >     tough I do not quite see how standard conforming program should use 
> > >     it given that standard does not say if it is char/unsigned 
> > >     char/signed char.
> > 
> > I think it depends on the actual enum, no?  So for forward declarations
> > like
> > 
> >  enum Foo;
> >  struct X { enum Foo *p; };
> > 
> > you face the same issue as with void *.
> 
> Well, handling all enums as integers should solve this.

But sizeof (enum Foo) depends on the enum, so no, it won't solve it.
There are no incomplete integer types but incomplete enum types.
(ok, this is probably a GNU extension issue that we have enums larger
than unsigned int).

> > > Overall plan
> > > ============
> > > 
> > > So I propose the following:
> > >  1) we add the incomplete types mode to gimple_canonical_types_compatible_p
> > >     as suggested by this patch
> > >  2) I will add the individual globbing rules to the functions one by one
> > >  3) once done, we enable recursion on pointer
> > >  4) we drop the alias.c globing of pointer_type for in_lto_p.
> > 
> > I don't think we want to do 4), especially not only for in_lto_p.  I'm
> > not sure we should spend time on improving accuracy before fixing
> > correctness (on the current precision level).
> 
> Yep, fixing correctness is 2).  I will look into that.
> > > Based on experience, we may switch to C/C++ compliant equivalence when we know
> > > that whole unit is C/C++ and we do not have to deal with odd cases from other
> > > languages.
> > 
> > I'm not convinced the patch is a step in the right direction.
> > 
> > Can we please first fix pointer handing for TYPE_CANONICAL by dropping
> > the TREE_CODE handling and fix union handling by only looking at
> > union size?
> 
> OK, I will send patch for TYPE_CODE compare.
> I am not quite convinced about the unions as per example abov.
> > 
> > Originally I wanted to make LTO type compatibility really layout-based
> > (thus struct { int i[2]; } and struct { int i; int j; } match for 
> > example).
> > 
> > I wonder if for non-cross-language-LTO we shouldn't simply stream
> > TYPE_CANONICAL, at stream-in record a TYPE_CANONICAL vs. uses vector
> > map and "fix" up canonical types globally (we'd need to stream
> > "local" canonical types in the global trees then).
> 
> Hmm, can you explain bit more what you have in mind?  I did play
> with non-cross-language canonical type streaming for anonymous
> namespace C++ types (that by definition can not be compatible
> with anything from other language).
> I can return to that.

First of all retain TYPE_CANONICAL if we have a single source language
(we still have to merge them I guess, and hopefully tree merging will
do the correct thing here).

Then avoid the transitivity constraint by computing TYPE_CANONICAL
"globally" (thus not requiring incremental compute to work).  That's
going to work only if we record all canonical types and its uses
(&TYPE_CANONICAL) or the main variants that don't have a canonical
type yet (even for cross-language LTO the original type-canonicals
denote minimal coalescing we have to preserve).

That said, eventually we'd want to stream TYPE_CANONICAL for
correctness (at least for verification that in the end for
two types where the original TYPE_CANONICAL was the same the
LTO idea of TYPE_CANONICAL is also the same - possibly that's
ensured by your type verifier checking the LTO compute computes
the same outcome).

Richard.

> Honza
> > 
> > Richard.
> > 
> > > Honza
> > > 
> > > 	* tree.c (gimple_canonical_types_compatible_p): Turn
> > > 	TRUST_TYPE_CANONICAL into FLAGS parameter; support
> > > 	comparsion with MATCH_WITH_INCOMPLETE_TYPE.
> > > Index: lto/lto.c
> > > ===================================================================
> > > --- lto/lto.c	(revision 223632)
> > > +++ lto/lto.c	(working copy)
> > > @@ -295,7 +295,9 @@
> > >  static unsigned long num_canonical_type_hash_entries;
> > >  static unsigned long num_canonical_type_hash_queries;
> > >  
> > > -static void iterative_hash_canonical_type (tree type, inchash::hash &hstate);
> > > +static void iterative_hash_canonical_type (tree type, inchash::hash &hstate,
> > > +					   unsigned int flags
> > > +					     = MATCH_TYPE_CANONICAL);
> > >  static hashval_t gimple_canonical_type_hash (const void *p);
> > >  static void gimple_register_canonical_type_1 (tree t, hashval_t hash);
> > >  
> > > @@ -302,26 +304,36 @@
> > >  /* Returning a hash value for gimple type TYPE.
> > >  
> > >     The hash value returned is equal for types considered compatible
> > > -   by gimple_canonical_types_compatible_p.  */
> > > +   by gimple_canonical_types_compatible_p.  FLAGS values are interpretted
> > > +   same way as by this function.  */
> > >  
> > >  static hashval_t
> > > -hash_canonical_type (tree type)
> > > +hash_canonical_type (tree type, unsigned int flags = MATCH_TYPE_CANONICAL)
> > >  {
> > >    inchash::hash hstate;
> > >  
> > > -  /* We compute alias sets only for types that needs them.
> > > -     Be sure we do not recurse to something else as we can not hash incomplete
> > > -     types in a way they would have same hash value as compatible complete
> > > -     types.  */
> > > -  gcc_checking_assert (type_with_alias_set_p (type));
> > > +  /* Check that we encounter incomplete types only with
> > > +     MATCH_WITH_INCOMPLETE_TYPE.
> > >  
> > > +     Also check that no one tries to use hashing in compbination with 
> > > +     !MATCH_TYPE_CANONICAL.  In this case gimple_canonical_types_compatible_p
> > > +     is not transitive and thus does not produce equivalence on all types.  */
> > > +  gcc_checking_assert ((flags & MATCH_TYPE_CANONICAL)
> > > +		       && ((flags & MATCH_WITH_INCOMPLETE_TYPE)
> > > +			   || type_with_alias_set_p (type)));
> > > +
> > >    /* Combine a few common features of types so that types are grouped into
> > >       smaller sets; when searching for existing matching types to merge,
> > >       only existing types having the same features as the new type will be
> > >       checked.  */
> > >    hstate.add_int (TREE_CODE (type));
> > > -  hstate.add_int (TYPE_MODE (type));
> > >  
> > > +  /* Incomplete arrays and aggregates do not have TYPE_MODE defined.  */
> > > +  if (!(flags & MATCH_WITH_INCOMPLETE_TYPE)
> > > +      || (TREE_CODE (type) != ARRAY_TYPE
> > > +	  && !AGGREGATE_TYPE_P (type)))
> > > +    hstate.add_int (TYPE_MODE (type));
> > > +
> > >    /* Incorporate common features of numerical types.  */
> > >    if (INTEGRAL_TYPE_P (type)
> > >        || SCALAR_FLOAT_TYPE_P (type)
> > > @@ -355,7 +367,8 @@
> > >      hstate.add_int (TYPE_STRING_FLAG (type));
> > >  
> > >    /* For array types hash the domain bounds and the string flag.  */
> > > -  if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type))
> > > +  if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type)
> > > +      && !(flags & MATCH_WITH_INCOMPLETE_TYPE))
> > >      {
> > >        hstate.add_int (TYPE_STRING_FLAG (type));
> > >        /* OMP lowering can introduce error_mark_node in place of
> > > @@ -366,30 +379,35 @@
> > >  	inchash::add_expr (TYPE_MAX_VALUE (TYPE_DOMAIN (type)), hstate);
> > >      }
> > >  
> > > -  /* Recurse for aggregates with a single element type.  */
> > > +  /* Recurse for aggregates with a single element type.
> > > +     We are safe to drop MATCH_INCOMPLETE.  There is no way to build
> > > +     an array of incomplete types.   */
> > >    if (TREE_CODE (type) == ARRAY_TYPE
> > >        || TREE_CODE (type) == COMPLEX_TYPE
> > >        || TREE_CODE (type) == VECTOR_TYPE)
> > > -    iterative_hash_canonical_type (TREE_TYPE (type), hstate);
> > > +    iterative_hash_canonical_type (TREE_TYPE (type), hstate,
> > > +				   flags & ~MATCH_WITH_INCOMPLETE_TYPE);
> > >  
> > >    /* Incorporate function return and argument types.  */
> > >    if (TREE_CODE (type) == FUNCTION_TYPE || TREE_CODE (type) == METHOD_TYPE)
> > >      {
> > > -      unsigned na;
> > > -      tree p;
> > > +      iterative_hash_canonical_type (TREE_TYPE (type), hstate, flags);
> > >  
> > > -      iterative_hash_canonical_type (TREE_TYPE (type), hstate);
> > > -
> > > -      for (p = TYPE_ARG_TYPES (type), na = 0; p; p = TREE_CHAIN (p))
> > > +      if (!(flags & MATCH_WITH_INCOMPLETE_TYPE)
> > > +	  || TREE_CODE (type) == METHOD_TYPE)
> > >  	{
> > > -	  iterative_hash_canonical_type (TREE_VALUE (p), hstate);
> > > -	  na++;
> > > +	  unsigned na;
> > > +	  tree p;
> > > +	  for (p = TYPE_ARG_TYPES (type), na = 0; p; p = TREE_CHAIN (p))
> > > +	    {
> > > +	      iterative_hash_canonical_type (TREE_VALUE (p), hstate, flags);
> > > +	      na++;
> > > +	    }
> > > +	  hstate.add_int (na);
> > >  	}
> > > -
> > > -      hstate.add_int (na);
> > >      }
> > >  
> > > -  if (RECORD_OR_UNION_TYPE_P (type))
> > > +  if (RECORD_OR_UNION_TYPE_P (type) && !(flags & MATCH_WITH_INCOMPLETE_TYPE))
> > >      {
> > >        unsigned nf;
> > >        tree f;
> > > @@ -397,7 +415,7 @@
> > >        for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f))
> > >  	if (TREE_CODE (f) == FIELD_DECL)
> > >  	  {
> > > -	    iterative_hash_canonical_type (TREE_TYPE (f), hstate);
> > > +	    iterative_hash_canonical_type (TREE_TYPE (f), hstate, flags);
> > >  	    nf++;
> > >  	  }
> > >  
> > > @@ -407,14 +425,22 @@
> > >    return hstate.end();
> > >  }
> > >  
> > > -/* Returning a hash value for gimple type TYPE combined with VAL.  */
> > > +/* Returning a hash value for gimple type TYPE combined with VAL.
> > > +   FLAGS are the same as for gimple_conanical_types_compatible_p.  */
> > >  
> > >  static void
> > > -iterative_hash_canonical_type (tree type, inchash::hash &hstate)
> > > +iterative_hash_canonical_type (tree type, inchash::hash &hstate,
> > > +			       unsigned int flags)
> > >  {
> > >    hashval_t v;
> > > +
> > > +  /* TYPE_CANONICAL reflects equivalence classes with
> > > +     !MATCH_WITH_INCOMPLETE_TYPE.   If we are matching incomplete types,
> > > +     then we need to recurse.  */
> > > +  if (flags & MATCH_WITH_INCOMPLETE_TYPE)
> > > +    v = hash_canonical_type (type, flags);
> > >    /* An already processed type.  */
> > > -  if (TYPE_CANONICAL (type))
> > > +  else if (TYPE_CANONICAL (type))
> > >      {
> > >        type = TYPE_CANONICAL (type);
> > >        v = gimple_canonical_type_hash (type);
> > > @@ -497,6 +523,14 @@
> > >  {
> > >    if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t))
> > >      return;
> > > +  /* Only types that can be handled in memory need canonical types.
> > > +     Function and methods are never accessed. Also we do not need canonical
> > > +     types for incomplete types with exception of arrays - structures may end
> > > +     with incomplete arrays that may be referenced.  */
> > > +  if (TREE_CODE (t) == FUNCTION_TYPE || TREE_CODE (t) == METHOD_TYPE
> > > +      || (!COMPLETE_TYPE_P (t)
> > > +	  && (TREE_CODE (t) != ARRAY_TYPE || !COMPLETE_TYPE_P (TREE_TYPE (t)))))
> > > +    return;
> > >  
> > >    gimple_register_canonical_type_1 (t, hash_canonical_type (t));
> > >  }
> > > Index: tree.c
> > > ===================================================================
> > > --- tree.c	(revision 223633)
> > > +++ tree.c	(working copy)
> > > @@ -12702,12 +12702,24 @@
> > >  /* Return true iff T1 and T2 are structurally identical for what
> > >     TBAA is concerned.  
> > >     This function is used both by lto.c canonical type merging and by the
> > > -   verifier.  If TRUST_TYPE_CANONICAL we do not look into structure of types
> > > -   that have TYPE_CANONICAL defined and assume them equivalent.  */
> > > +   verifier.  
> > >  
> > > +   If flags sets MATCH_TYPE_CANONICAL we assume that TYPE_CANONICAL is set in a
> > > +   way that two types have the same canonical type if and only if
> > > +   gimple_canonical_types_compatible_p (t1,t2, 0) is true.  This is used
> > > +   to cut down recursion during LTO canonical type comptuation.  When this flag
> > > +   is set we also sanity check that we are going to produce equivalence relation
> > > +   that is needed to drive the hashtable in lto.c.
> > > +
> > > +   if MATCH_WITH_INCOMPLETE_TYPE is true, then we do not use any information
> > > +   from complete types and thus i.e. all RECORD_TYPE are equivlaent to other
> > > +   RECORD_TYPEs.  This is the only equivalence possible if one require
> > > +   incomplete type to be in the same equivalence class with all its
> > > +   completetions.  */
> > > +
> > >  bool
> > >  gimple_canonical_types_compatible_p (const_tree t1, const_tree t2,
> > > -				     bool trust_type_canonical)
> > > +				     unsigned int flags)
> > >  {
> > >    /* Before starting to set up the SCC machinery handle simple cases.  */
> > >  
> > > @@ -12719,28 +12731,28 @@
> > >    if (t1 == NULL_TREE || t2 == NULL_TREE)
> > >      return false;
> > >  
> > > -  /* We consider complete types always compatible with incomplete type.
> > > -     This does not make sense for canonical type calculation and thus we
> > > -     need to ensure that we are never called on it.
> > > +  /* Check that either flags allow incomplete types or both types are complete.
> > > +     This is necessary to ensure transitivity for canonical type merging.
> > >  
> > > -     FIXME: For more correctness the function probably should have three modes
> > > -	1) mode assuming that types are complete mathcing their structure
> > > -	2) mode allowing incomplete types but producing equivalence classes
> > > -	   and thus ignoring all info from complete types
> > > -	3) mode allowing incomplete types to match complete but checking
> > > -	   compatibility between complete types.
> > > +     FIXME: with !MATCH_TYPE_CANONICAL we probably should allow match between
> > > +     incomplete type and complete type as defined by language standards.  No
> > > +     one however rely on it so far.  */
> > >  
> > > -     1 and 2 can be used for canonical type calculation. 3 is the real
> > > -     definition of type compatibility that can be used i.e. for warnings during
> > > -     declaration merging.  */
> > > -
> > > -  gcc_assert (!trust_type_canonical
> > > +  gcc_assert ((flags & MATCH_WITH_INCOMPLETE_TYPE)
> > > +	      || !(flags & MATCH_TYPE_CANONICAL)
> > >  	      || (type_with_alias_set_p (t1) && type_with_alias_set_p (t2)));
> > >    /* If the types have been previously registered and found equal
> > >       they still are.  */
> > >    if (TYPE_CANONICAL (t1) && TYPE_CANONICAL (t2)
> > > -      && trust_type_canonical)
> > > -    return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2);
> > > +      && (flags & MATCH_TYPE_CANONICAL))
> > > +    {
> > > +      if (TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2))
> > > +	return true;
> > > +      /* TYPE_CANONICAL is always computed with an assumption that the type
> > > +	 is complete.  */
> > > +      if (!(flags & MATCH_WITH_INCOMPLETE_TYPE))
> > > +	return false;
> > > +    }
> > >  
> > >    /* Can't be the same type if the types don't have the same code.  */
> > >    if (TREE_CODE (t1) != TREE_CODE (t2))
> > > @@ -12753,8 +12765,12 @@
> > >        || TREE_CODE (t1) == NULLPTR_TYPE)
> > >      return true;
> > >  
> > > -  /* Can't be the same type if they have different mode.  */
> > > -  if (TYPE_MODE (t1) != TYPE_MODE (t2))
> > > +  /* Can't be the same type if they have different mode.
> > > +     Incomplete arrays and aggregates do not have TYPE_MODE defined.  */
> > > +  if ((!(flags & MATCH_WITH_INCOMPLETE_TYPE)
> > > +       || (TREE_CODE (t1) != ARRAY_TYPE
> > > +           && !AGGREGATE_TYPE_P (t1)))
> > > +      && TYPE_MODE (t1) != TYPE_MODE (t2))
> > >      return false;
> > >  
> > >    /* Non-aggregate types can be handled cheaply.  */
> > > @@ -12783,8 +12799,7 @@
> > >  	{
> > >  	  if (TYPE_ADDR_SPACE (TREE_TYPE (t1))
> > >  	      != TYPE_ADDR_SPACE (TREE_TYPE (t2)))
> > > -	    return false;
> > > -
> > > + 	    return false;
> > >  	  if (TREE_CODE (TREE_TYPE (t1)) != TREE_CODE (TREE_TYPE (t2)))
> > >  	    return false;
> > >  	}
> > > @@ -12792,9 +12807,9 @@
> > >        /* Tail-recurse to components.  */
> > >        if (TREE_CODE (t1) == VECTOR_TYPE
> > >  	  || TREE_CODE (t1) == COMPLEX_TYPE)
> > > -	return gimple_canonical_types_compatible_p (TREE_TYPE (t1),
> > > -						    TREE_TYPE (t2),
> > > -						    trust_type_canonical);
> > > +	return gimple_canonical_types_compatible_p
> > > +		 (TREE_TYPE (t1), TREE_TYPE (t2),
> > > +		  flags & ~MATCH_WITH_INCOMPLETE_TYPE);
> > >  
> > >        return true;
> > >      }
> > > @@ -12804,12 +12819,20 @@
> > >      {
> > >      case ARRAY_TYPE:
> > >        /* Array types are the same if the element types are the same and
> > > -	 the number of elements are the same.  */
> > > -      if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2),
> > > -						trust_type_canonical)
> > > +	 the number of elements are the same.
> > > +
> > > +	 When MATCH_WITH_INCOMPLETE_TYPE is set, bypass the check
> > > +	 on number of elements.
> > > +	 When recursing, clear MATCH_WITH_INCOMPLETE_TYPE because there is
> > > +	 no way to make incomplete array of array.  */
> > > +      if (!gimple_canonical_types_compatible_p
> > > +	     (TREE_TYPE (t1), TREE_TYPE (t2),
> > > +	      flags & ~MATCH_WITH_INCOMPLETE_TYPE)
> > >  	  || TYPE_STRING_FLAG (t1) != TYPE_STRING_FLAG (t2)
> > >  	  || TYPE_NONALIASED_COMPONENT (t1) != TYPE_NONALIASED_COMPONENT (t2))
> > >  	return false;
> > > +      else if (flags & MATCH_WITH_INCOMPLETE_TYPE)
> > > +	return true;
> > >        else
> > >  	{
> > >  	  tree i1 = TYPE_DOMAIN (t1);
> > > @@ -12848,11 +12871,19 @@
> > >      case METHOD_TYPE:
> > >      case FUNCTION_TYPE:
> > >        /* Function types are the same if the return type and arguments types
> > > -	 are the same.  */
> > > +	 are the same.
> > > +	 It is possible that function pointers have return values and parameters
> > > +	 of incomplete types; permit that by not clearing
> > > +	 MATCH_WITH_INCOMPLETE_TYPE  */
> > >        if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2),
> > > -						trust_type_canonical))
> > > +						flags))
> > >  	return false;
> > >  
> > > +      /* We must permit a match between !prototype_p and prototype_p for
> > > +	 functions; methods are never !prototype_p.  */
> > > +      if ((flags & MATCH_WITH_INCOMPLETE_TYPE)
> > > +	  && TREE_CODE (t1) == FUNCTION_TYPE)
> > > +	return true;
> > >        if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
> > >  	return true;
> > >        else
> > > @@ -12864,8 +12895,7 @@
> > >  	       parms1 = TREE_CHAIN (parms1), parms2 = TREE_CHAIN (parms2))
> > >  	    {
> > >  	      if (!gimple_canonical_types_compatible_p
> > > -		     (TREE_VALUE (parms1), TREE_VALUE (parms2),
> > > -		      trust_type_canonical))
> > > +		     (TREE_VALUE (parms1), TREE_VALUE (parms2), flags))
> > >  		return false;
> > >  	    }
> > >  
> > > @@ -12881,6 +12911,15 @@
> > >        {
> > >  	tree f1, f2;
> > >  
> > > +	/* C standrad require incomplete structures and unions to be
> > > +	   considered compatible with complete ones regardless their TYPE_NAME
> > > +	   when they come from different translation units.
> > > +	   We must consider transitive closure here, so 
> > > +	   every structure/union is equivalent to each other.  */
> > > +	   
> > > +	if (flags & MATCH_WITH_INCOMPLETE_TYPE)
> > > +	  return true;
> > > +
> > >  	/* For aggregate types, all the fields must be the same.  */
> > >  	for (f1 = TYPE_FIELDS (t1), f2 = TYPE_FIELDS (t2);
> > >  	     f1 || f2;
> > > @@ -12897,8 +12936,7 @@
> > >  	    if (DECL_NONADDRESSABLE_P (f1) != DECL_NONADDRESSABLE_P (f2)
> > >  		|| !gimple_compare_field_offset (f1, f2)
> > >  		|| !gimple_canonical_types_compatible_p
> > > -		      (TREE_TYPE (f1), TREE_TYPE (f2),
> > > -		       trust_type_canonical))
> > > +		      (TREE_TYPE (f1), TREE_TYPE (f2), flags))
> > >  	      return false;
> > >  	  }
> > >  
> > > @@ -12961,7 +12999,7 @@
> > >  	      with variably sized arrays because their sizes possibly
> > >  	      gimplified to different variables.  */
> > >  	   && !variably_modified_type_p (ct, NULL)
> > > -	   && !gimple_canonical_types_compatible_p (t, ct, false))
> > > +	   && !gimple_canonical_types_compatible_p (t, ct, 0))
> > >      {
> > >        error ("TYPE_CANONICAL is not compatible");
> > >        debug_tree (ct);
> > > Index: tree.h
> > > ===================================================================
> > > --- tree.h	(revision 223632)
> > > +++ tree.h	(working copy)
> > > @@ -4569,9 +4569,21 @@
> > >  extern unsigned int tree_map_base_hash (const void *);
> > >  extern int tree_map_base_marked_p (const void *);
> > >  extern void DEBUG_FUNCTION verify_type (const_tree t);
> > > -extern bool gimple_canonical_types_compatible_p (const_tree, const_tree,
> > > -						 bool trust_type_canonical = true);
> > >  
> > > +/* Flags used by gimple_canonical_types_compatible_p.  */
> > > +enum gimple_canonical_types_compatible_flags
> > > +  {
> > > +    /* Asume that TYPE_CANONICAL is set in a way that two types have
> > > +       the same canonical type if and only if
> > > +       gimple_canonical_types_compatible_p (t1,t2, 0) is true.  */
> > > +    MATCH_TYPE_CANONICAL = 1,
> > > +    /* Match all types as if they were incomplete.  */
> > > +    MATCH_WITH_INCOMPLETE_TYPE = 2
> > > +  };
> > > +extern bool gimple_canonical_types_compatible_p
> > > +    (const_tree, const_tree,
> > > +     unsigned int flags = MATCH_TYPE_CANONICAL);
> > > +
> > >  #define tree_map_eq tree_map_base_eq
> > >  extern unsigned int tree_map_hash (const void *);
> > >  #define tree_map_marked_p tree_map_base_marked_p
> > > 
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-27  8:38     ` Richard Biener
@ 2015-05-27 22:53       ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2015-05-27 22:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, gcc-patches

> On Tue, 26 May 2015, Jan Hubicka wrote:
> 
> > > > Now the change does not really translate to great increase of disambiguations
> > > > for Firefox (it seems more in noise). The reason is the pointer_type globbing
> > > > in alias.c.
> > > 
> > > Yeah, we only get the improvement because of some "hack" in the tree
> > > alias oracle which also uses the base object for TBAA.
> > 
> > Why that is hack? Dereferencing a pointer makes it clear the type of memory location
> > pointed to is known, we should use that info.
> > > 
> > > Yeah, we should fix that.  And in fact, for cross-language LTO I don't
> > > see why
> > > 
> > >   union { int a; char c; };
> > > 
> > > and
> > > 
> > >   union { int a; short s; };
> > > 
> > > should not be compatible - they have a common member after all.  So
> > > I'd like to glob all unions that have the same size (and as improvement
> > 
> > Well, none of language standards I saw so far expect this to happen.
> > Going to extremes, you can always put variable sized char array to union
> > and by transitivity glob everything with everything.
> 
> I'm speaking of cross-language LTO - that leaves the language standards
> territorry and requires us to apply common sense.

Well, language standards do help here to some degree.  C++, Fortran, Ada, Java
(probably go too, I did not read it yet) do cover interface to other language
(generally C that allows also interoperability between non-C languages).  In
some cases the inter-operable types needs to be marked (in Fortran, to some
degree to C++ by language linkage and Ada) that allows us to punt to alias set
0 in very ezoteric cases. Older Fortran standards seem to be such a case to me.
Fortran 2008 has some interesting bits, too.

In this way C is the most evil language, because every type in C program can
potentially be part of cross-language interface, and, its cross-TU type
compatibility rules are aftertought and not a direct application of inter-TU.

I am trying trying to understand those rules, be sure that we do is compatible with
these standards.  We can make our own extensions, but it is a slipperly ground
- we need to think of consequences and how much extra compaibility rules we
want to commit ourselves to support.

In those cases I would go from practical example to refine our unrestanding of
the problem.

> > Applying this rule you have
> > 
> > union { char a[n]; } compatible with every union and thus also
> > union {int a;}
> > struct { int a;}
> > int a;
> > 
> > Which would disable TBAA completely.
> 
> See ;)  At least we have the int a; vs. struct { int a; } issue
> with Fortran vs. C compatibility (there is even a PR about this).

Yep :) Fortran is bit esoteric though not as esoteric anymore when you do not
consider old language standards.
http://www.j3-fortran.org/doc/year/10/10-007.pdf
look into page 443-455. There are few notes I made about it.

The types that binds to C are declared with BIND(C). BIND(C) does not seem to
appear in Fortran 77 standard, so legacy codeabses won't do that.  We can
handle that though - either detect units containing fortran 77 + different
language and turn into -fno-strict-aliasing (or more generous globbing) and/or
we can add a warning, like -WOdr, that will tell you when Fortran program is
interfacing C without BIND(C) attribute or breaks the Fortran's
interoperability rules and suggest users to use -fno-strict-aliasing in that
case or update the codebase.

Fortran has few interesting ideas.
 - It has type C_SIGNED_CHAR that is defined to be compatible with both unsigned
   char and signed char (but not char)
 - It has C_PTR that is the "universal pointer" but only for non-functions
 - It has C_FUNPTR that is the "universal pointer" but only for functions.  We
   are safe to map both to ptr_type_node.

rest of types more or less uniquely match.  We can do some globbing for character
types or we can just make C_SIGNED_CHAR to have alias set 0.

Page 449 deals with interoperability of derived types.

  A  Fortran  derived  type  is interoperable with  a  C  struct  type  if  and
  only  if  the  Fortran  type  has  the BIND 13 attribute ( 4.5.2), the Fortran
  derived type and the C struct type have the same number of components, and the
  14 components of the Fortran derived type would interoperate with corresponding
  components of the C struct type 15 as described in 15.3.5 and 15.3.6 if the
  components were variables.  A component of a Fortran derived type and 16 a
  component of a C struct type correspond if they are declared in the same
  relative position in their respective 17 type de\fnitions.

So structures compare by fileds, ignoring names.

  There is no Fortran type that is interoperable with a C struct type that
  contains a bitfield or that contains a 19 exible array member.  There is no
  Fortran type that is interoperable with a C union type.

No unions :)

   A Fortran variable that is a named array is interoperable if and only if its
   type and type parameters are interoperable , it is not a coarray , it is of
   explicit shape or assumed size, and if it is of type character its length is
   not assumed or declared by an expression that is not a constant expression

Arrays seems to rule out things with no direct C equivalent.  It also continues
by saying when the bounds are considered compatible.

There is also no legal way to call varargs function from Fortran.
THere is also some detail about what variables can be interoperated (no COMMON) and
how to bind the C symbols.
> 
> But sizeof (enum Foo) depends on the enum, so no, it won't solve it.
> There are no incomplete integer types but incomplete enum types.

By incomplete integer types I mean int and char, for example.

> (ok, this is probably a GNU extension issue that we have enums larger
> than unsigned int).

Yep, we also have -fshort-enum, both are GNU extensions and C standard expect
all enums to be ints. It does not promise that long enum is compatible with
long long, because it is not part of the language.

I would say, that for purpposes of canonical type computation we are thus safe
to glob all enums to "int" or "unsigned int".  For the pointer code, we can just
consider 
enum a *ptr;
for incmplete types to be the universal pointer void *ptr.
> 
> First of all retain TYPE_CANONICAL if we have a single source language
> (we still have to merge them I guess, and hopefully tree merging will
> do the correct thing here).

I don't believe it will.  Consider

struct a {struct b *ptr;}

wrt

struct b {int a;};
struct a {struct b *ptr;};

their canonicals should be merged.  We need to merge TYPE_CANONICAL by at least
all type compatibility rules the source language demands, even for single
language units.

I however do like the idea of streaming TYPE_CANONICAL.  First of all it makes
variadic types easier.  We can arrange for array:

int a[p];

to have canonical type

int a[*];

The canonical type does not refer local declarations and can go into the global
stream and be handled the usual way.  When streaming the function body, we will
only get a new variant of the type and we can just retain the current logic
of type canonical.

We also can use TYPE_CANONICAL as they are a ls long as the TYPE_CANONICAL
are local to unit (anonymous namespaces for C++).
> 
> Then avoid the transitivity constraint by computing TYPE_CANONICAL
> "globally" (thus not requiring incremental compute to work).  That's

Te transitivity is not about incrementality - yes, it would be nice to avoid
the incrementality as it will let us to be significantly stronger in special
cases.  Thus the idea of placing canonical types of variadic types into
global stream.

The transitivity is needed by alias.c to work. It starts by a DAG and builds
its transitive closure. The edges in the transitive closure corresponds to
posisble aliases. This is where the transitivity comes from.
I did not put that much of tought into removing that constraint, but it
would be interesting to get that defined and right.

> going to work only if we record all canonical types and its uses
> (&TYPE_CANONICAL) or the main variants that don't have a canonical
> type yet (even for cross-language LTO the original type-canonicals
> denote minimal coalescing we have to preserve).
> 
> That said, eventually we'd want to stream TYPE_CANONICAL for
> correctness (at least for verification that in the end for
> two types where the original TYPE_CANONICAL was the same the
> LTO idea of TYPE_CANONICAL is also the same - possibly that's
> ensured by your type verifier checking the LTO compute computes
> the same outcome).

Yes, that was one of main things I wanted to check by the verifier.

Honza

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-26  1:18   ` Jan Hubicka
@ 2015-05-29 21:37     ` Joseph Myers
  2015-05-29 21:46       ` Jan Hubicka
  0 siblings, 1 reply; 18+ messages in thread
From: Joseph Myers @ 2015-05-29 21:37 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Bernhard Reutner-Fischer, gcc-patches, rguenther

On Tue, 26 May 2015, Jan Hubicka wrote:

> > On May 25, 2015 1:49:45 AM GMT+02:00, Jan Hubicka <hubicka@ucw.cz> wrote:
> > 
> > 
> > >2  Each enumerated type shall be compatible with char ,  a  signed
> > >integer
> > >     type, or an unsigned integer type. The choice of type is
> > >implementation-defined, but  shall be capable of representing the
> > >values
> > >     of all the members of the enumeration.    The enumerated type is
> > >     incomplete until immediately after the that terminates the list of
> > >     enumerator declarations, and complete thereafter.
> > >
> > >(we ignore this completely as far as I know, it is easy to fix though,
> > >all
> > > we need is to make ENUMERATION_TYPE pretend to be INTEGER_TYPE)
> > 
> > Don't forget -fshort-enum though.
> 
> I believe -fshort-enum is makes us non-complian to the C standard and thus
> we are free to not follow this rule :)

-fshort-enums is perfectly compatible with the C standard.  The choice of 
integer type depends on the enumerated type in question - different 
enumerated types can be compatible with different integer types.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-29 21:37     ` Joseph Myers
@ 2015-05-29 21:46       ` Jan Hubicka
  2015-05-30  3:28         ` Jan Hubicka
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2015-05-29 21:46 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Jan Hubicka, Bernhard Reutner-Fischer, gcc-patches, rguenther

> On Tue, 26 May 2015, Jan Hubicka wrote:
> 
> > > On May 25, 2015 1:49:45 AM GMT+02:00, Jan Hubicka <hubicka@ucw.cz> wrote:
> > > 
> > > 
> > > >2  Each enumerated type shall be compatible with char ,  a  signed
> > > >integer
> > > >     type, or an unsigned integer type. The choice of type is
> > > >implementation-defined, but  shall be capable of representing the
> > > >values
> > > >     of all the members of the enumeration.    The enumerated type is
> > > >     incomplete until immediately after the that terminates the list of
> > > >     enumerator declarations, and complete thereafter.
> > > >
> > > >(we ignore this completely as far as I know, it is easy to fix though,
> > > >all
> > > > we need is to make ENUMERATION_TYPE pretend to be INTEGER_TYPE)
> > > 
> > > Don't forget -fshort-enum though.
> > 
> > I believe -fshort-enum is makes us non-complian to the C standard and thus
> > we are free to not follow this rule :)
> 
> -fshort-enums is perfectly compatible with the C standard.  The choice of 
> integer type depends on the enumerated type in question - different 
> enumerated types can be compatible with different integer types.

I see, so it does not need to be actual "int"/"unsigned int".  I suppose we are then
safe to just treat ENUMERATION_TYPE as INTEGER_TYPE.

Thank you for clarification!
Honza
> 
> -- 
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-29 21:46       ` Jan Hubicka
@ 2015-05-30  3:28         ` Jan Hubicka
  2015-05-30 15:38           ` Bernhard Reutner-Fischer
                             ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Jan Hubicka @ 2015-05-30  3:28 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Joseph Myers, Bernhard Reutner-Fischer, gcc-patches, rguenther

Joseph, Richard,
this is patch implementing the ENUM/INGEGER globbing and also POINTER/REFERENCE
(though I don't know if that one follows by some standard rules).
Joseph, does the attached testcase make sense for you? Is it defined? It is my
first attempt to really interpret C standard to detail.

Ideally I would like to have testcases for all the globbing we do and reasoning
why it is needed.

Bootstraped/regtested ppc64le-linux. OK?

Honza

	* lto.c (hash_canonical_type): Use tree_code_for_canonical_type_merging.

	* tree.h (tree_code_for_canonical_type_merging): New function.
	* tree.c (gimple_canonical_types_compatible_p): Use
	tree_code_for_canonical_type_merging..
	* gcc.dg/lto/c-compatible-types_0.c: New testcase.
	* gcc.dg/lto/c-compatible-types_1.c: New testcase.
Index: lto/lto.c
===================================================================
--- lto/lto.c	(revision 223877)
+++ lto/lto.c	(working copy)
@@ -319,7 +319,7 @@
      smaller sets; when searching for existing matching types to merge,
      only existing types having the same features as the new type will be
      checked.  */
-  hstate.add_int (TREE_CODE (type));
+  hstate.add_int (tree_code_for_canonical_type_merging (TREE_CODE (type)));
   hstate.add_int (TYPE_MODE (type));
 
   /* Incorporate common features of numerical types.  */
Index: testsuite/gcc.dg/lto/c-compatible-types_0.c
===================================================================
--- testsuite/gcc.dg/lto/c-compatible-types_0.c	(revision 0)
+++ testsuite/gcc.dg/lto/c-compatible-types_0.c	(working copy)
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+/* By C standard Each enumerated type shall be compatible with char, a  signed
+   integer, type, or an unsigned integer type. The choice of type is
+   implementation-defined.  Check that enum and unsigned int match.  */
+unsigned int a;
+unsigned int *b;
+void t();
+
+void reset ()
+{
+  asm("":"=r"(a):"0"(0));
+}
+int
+main()
+{
+  asm("":"=r"(a):"0"(1));
+  asm("":"=r"(b):"0"(&a));
+  t();
+  return 0;
+}
Index: testsuite/gcc.dg/lto/c-compatible-types_1.c
===================================================================
--- testsuite/gcc.dg/lto/c-compatible-types_1.c	(revision 0)
+++ testsuite/gcc.dg/lto/c-compatible-types_1.c	(working copy)
@@ -0,0 +1,19 @@
+enum a {test1, test2};
+enum a a;
+enum a *b;
+
+void reset (void);
+
+void
+t()
+{
+  if (a != test2)
+    __builtin_abort ();
+  if (*b != test2)
+    __builtin_abort ();
+  reset ();
+  if (a != test1)
+    __builtin_abort ();
+  if (*b != test1)
+    __builtin_abort ();
+}
Index: tree.c
===================================================================
--- tree.c	(revision 223877)
+++ tree.c	(working copy)
@@ -12877,7 +12877,8 @@
     return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2);
 
   /* Can't be the same type if the types don't have the same code.  */
-  if (TREE_CODE (t1) != TREE_CODE (t2))
+  if (tree_code_for_canonical_type_merging (TREE_CODE (t1))
+      != tree_code_for_canonical_type_merging (TREE_CODE (t2)))
     return false;
 
   /* Qualifiers do not matter for canonical type comparison purposes.  */
Index: tree.h
===================================================================
--- tree.h	(revision 223877)
+++ tree.h	(working copy)
@@ -4598,7 +4598,28 @@
 extern void DEBUG_FUNCTION verify_type (const_tree t);
 extern bool gimple_canonical_types_compatible_p (const_tree, const_tree,
 						 bool trust_type_canonical = true);
+/* Return simplified tree code of type that is used for canonical type merging.  */
+inline enum tree_code
+tree_code_for_canonical_type_merging (enum tree_code code)
+{
+  /* By C standard, each enumerated type shall be compatible with char,
+     a signed integer, or an unsigned integer.  The choice of type is
+     implementation defined (in our case it depends on -fshort-enum).
 
+     For this reason we make no distinction between ENUMERAL_TYPE and INTEGER
+     type and compare only by their signedness and precision.  */
+  if (code == ENUMERAL_TYPE)
+    return INTEGER_TYPE;
+  /* To allow inter-operability between languages having references and
+     C, we consider reference types and pointers alike.  Note that this is
+     not strictly necessary for C-Fortran 2008 interoperability because
+     Fortran define C_PTR type that needs to be compatible with C pointers
+     and we handle this one as ptr_type_node.  */
+  if (code == REFERENCE_TYPE)
+    return POINTER_TYPE;
+  return code;
+}
+
 #define tree_map_eq tree_map_base_eq
 extern unsigned int tree_map_hash (const void *);
 #define tree_map_marked_p tree_map_base_marked_p

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-30  3:28         ` Jan Hubicka
@ 2015-05-30 15:38           ` Bernhard Reutner-Fischer
  2015-05-30 21:14             ` Jan Hubicka
  2015-06-01 19:47           ` Joseph Myers
  2015-06-03 11:41           ` Richard Biener
  2 siblings, 1 reply; 18+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-05-30 15:38 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Joseph Myers, gcc-patches, rguenther

On May 30, 2015 12:56:26 AM GMT+02:00, Jan Hubicka <hubicka@ucw.cz> wrote:

>Index: tree.h
>===================================================================
>--- tree.h	(revision 223877)
>+++ tree.h	(working copy)
>@@ -4598,7 +4598,28 @@
> extern void DEBUG_FUNCTION verify_type (const_tree t);
>extern bool gimple_canonical_types_compatible_p (const_tree,
>const_tree,
> 						 bool trust_type_canonical = true);
>+/* Return simplified tree code of type that is used for canonical type
>merging.  */
>+inline enum tree_code
>+tree_code_for_canonical_type_merging (enum tree_code code)
>+{
>+  /* By C standard, each enumerated type shall be compatible with
>char,
>+     a signed integer, or an unsigned integer.  The choice of type is
>+     implementation defined (in our case it depends on -fshort-enum).

Please drop the mention of -fshort-enum as Joseph clarified.

thanks,

> 
>+     For this reason we make no distinction between ENUMERAL_TYPE and
>INTEGER
>+     type and compare only by their signedness and precision.  */
>+  if (code == ENUMERAL_TYPE)
>+    return INTEGER_TYPE;
>+  /* To allow inter-operability between languages having references
>and
>+     C, we consider reference types and pointers alike.  Note that
>this is
>+     not strictly necessary for C-Fortran 2008 interoperability
>because
>+     Fortran define C_PTR type that needs to be compatible with C
>pointers
>+     and we handle this one as ptr_type_node.  */
>+  if (code == REFERENCE_TYPE)
>+    return POINTER_TYPE;
>+  return code;
>+}
>+
> #define tree_map_eq tree_map_base_eq
> extern unsigned int tree_map_hash (const void *);
> #define tree_map_marked_p tree_map_base_marked_p


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-30 15:38           ` Bernhard Reutner-Fischer
@ 2015-05-30 21:14             ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2015-05-30 21:14 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: Jan Hubicka, Joseph Myers, gcc-patches, rguenther

> On May 30, 2015 12:56:26 AM GMT+02:00, Jan Hubicka <hubicka@ucw.cz> wrote:
> 
> >Index: tree.h
> >===================================================================
> >--- tree.h	(revision 223877)
> >+++ tree.h	(working copy)
> >@@ -4598,7 +4598,28 @@
> > extern void DEBUG_FUNCTION verify_type (const_tree t);
> >extern bool gimple_canonical_types_compatible_p (const_tree,
> >const_tree,
> > 						 bool trust_type_canonical = true);
> >+/* Return simplified tree code of type that is used for canonical type
> >merging.  */
> >+inline enum tree_code
> >+tree_code_for_canonical_type_merging (enum tree_code code)
> >+{
> >+  /* By C standard, each enumerated type shall be compatible with
> >char,
> >+     a signed integer, or an unsigned integer.  The choice of type is
> >+     implementation defined (in our case it depends on -fshort-enum).
> 
> Please drop the mention of -fshort-enum as Joseph clarified.

I think the comment there is correct -fshort-enum will make us to pick different
integer types based on number of values, but they will always interoperable with
some normal integer type.

Honza
> 
> thanks,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-30  3:28         ` Jan Hubicka
  2015-05-30 15:38           ` Bernhard Reutner-Fischer
@ 2015-06-01 19:47           ` Joseph Myers
  2015-06-02 17:34             ` Jan Hubicka
  2015-06-03 11:41           ` Richard Biener
  2 siblings, 1 reply; 18+ messages in thread
From: Joseph Myers @ 2015-06-01 19:47 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Bernhard Reutner-Fischer, gcc-patches, rguenther

On Sat, 30 May 2015, Jan Hubicka wrote:

> Joseph, does the attached testcase make sense for you? Is it defined? It is my
> first attempt to really interpret C standard to detail.

I suppose it's defined if unsigned int is the type chosen as compatible 
with that enum.  The test should be skipped for short_enums targets 
(arm-eabi bare metal) (you can't simply use -fno-short-enums as then that 
will fail the link-time compatibility checking).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-06-01 19:47           ` Joseph Myers
@ 2015-06-02 17:34             ` Jan Hubicka
  2015-06-02 17:43               ` Joseph Myers
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2015-06-02 17:34 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Jan Hubicka, Bernhard Reutner-Fischer, gcc-patches, rguenther

> On Sat, 30 May 2015, Jan Hubicka wrote:
> 
> > Joseph, does the attached testcase make sense for you? Is it defined? It is my
> > first attempt to really interpret C standard to detail.
> 
> I suppose it's defined if unsigned int is the type chosen as compatible 
> with that enum.  The test should be skipped for short_enums targets 
> (arm-eabi bare metal) (you can't simply use -fno-short-enums as then that 
> will fail the link-time compatibility checking).

thanks. I did not notice we have -fshort-enum by default targets. I suppose we want:
/* { dg-xfail-if "" { arm-eabi-* } { "*" } { "" } } */

Honza
> 
> -- 
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-06-02 17:34             ` Jan Hubicka
@ 2015-06-02 17:43               ` Joseph Myers
  2015-06-02 17:55                 ` Jan Hubicka
  0 siblings, 1 reply; 18+ messages in thread
From: Joseph Myers @ 2015-06-02 17:43 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Bernhard Reutner-Fischer, gcc-patches, rguenther

On Tue, 2 Jun 2015, Jan Hubicka wrote:

> > On Sat, 30 May 2015, Jan Hubicka wrote:
> > 
> > > Joseph, does the attached testcase make sense for you? Is it defined? It is my
> > > first attempt to really interpret C standard to detail.
> > 
> > I suppose it's defined if unsigned int is the type chosen as compatible 
> > with that enum.  The test should be skipped for short_enums targets 
> > (arm-eabi bare metal) (you can't simply use -fno-short-enums as then that 
> > will fail the link-time compatibility checking).
> 
> thanks. I did not notice we have -fshort-enum by default targets. I suppose we want:
> /* { dg-xfail-if "" { arm-eabi-* } { "*" } { "" } } */

Well, not that (which matches "eabi" against the vendor part of the 
triplet), but skip for the short_enums effective-target keyword.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-06-02 17:43               ` Joseph Myers
@ 2015-06-02 17:55                 ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2015-06-02 17:55 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Jan Hubicka, Bernhard Reutner-Fischer, gcc-patches, rguenther

> > thanks. I did not notice we have -fshort-enum by default targets. I suppose we want:
> > /* { dg-xfail-if "" { arm-eabi-* } { "*" } { "" } } */
> 
> Well, not that (which matches "eabi" against the vendor part of the 
> triplet), but skip for the short_enums effective-target keyword.
Ok. Did not know about short_enums (my dejagnu-fu is still very limited :( )

/* { dg-skip-if "require -fno-short-enums to work" {target short_enums} } */

Alternatively I suppose I can add enum value set to INT_MAX to force enum to be large.

Honza
> 
> -- 
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-05-30  3:28         ` Jan Hubicka
  2015-05-30 15:38           ` Bernhard Reutner-Fischer
  2015-06-01 19:47           ` Joseph Myers
@ 2015-06-03 11:41           ` Richard Biener
  2015-06-03 22:12             ` Jan Hubicka
  2 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2015-06-03 11:41 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Joseph Myers, Bernhard Reutner-Fischer, gcc-patches

On Sat, 30 May 2015, Jan Hubicka wrote:

> Joseph, Richard,
> this is patch implementing the ENUM/INGEGER globbing and also POINTER/REFERENCE
> (though I don't know if that one follows by some standard rules).
> Joseph, does the attached testcase make sense for you? Is it defined? It is my
> first attempt to really interpret C standard to detail.
> 
> Ideally I would like to have testcases for all the globbing we do and reasoning
> why it is needed.
> 
> Bootstraped/regtested ppc64le-linux. OK?

Works for me.  (what about BOOLEAN_TYPE?)

Thanks,
Richard.

> Honza
> 
> 	* lto.c (hash_canonical_type): Use tree_code_for_canonical_type_merging.
> 
> 	* tree.h (tree_code_for_canonical_type_merging): New function.
> 	* tree.c (gimple_canonical_types_compatible_p): Use
> 	tree_code_for_canonical_type_merging..
> 	* gcc.dg/lto/c-compatible-types_0.c: New testcase.
> 	* gcc.dg/lto/c-compatible-types_1.c: New testcase.
> Index: lto/lto.c
> ===================================================================
> --- lto/lto.c	(revision 223877)
> +++ lto/lto.c	(working copy)
> @@ -319,7 +319,7 @@
>       smaller sets; when searching for existing matching types to merge,
>       only existing types having the same features as the new type will be
>       checked.  */
> -  hstate.add_int (TREE_CODE (type));
> +  hstate.add_int (tree_code_for_canonical_type_merging (TREE_CODE (type)));
>    hstate.add_int (TYPE_MODE (type));
>  
>    /* Incorporate common features of numerical types.  */
> Index: testsuite/gcc.dg/lto/c-compatible-types_0.c
> ===================================================================
> --- testsuite/gcc.dg/lto/c-compatible-types_0.c	(revision 0)
> +++ testsuite/gcc.dg/lto/c-compatible-types_0.c	(working copy)
> @@ -0,0 +1,21 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3" } */
> +/* By C standard Each enumerated type shall be compatible with char, a  signed
> +   integer, type, or an unsigned integer type. The choice of type is
> +   implementation-defined.  Check that enum and unsigned int match.  */
> +unsigned int a;
> +unsigned int *b;
> +void t();
> +
> +void reset ()
> +{
> +  asm("":"=r"(a):"0"(0));
> +}
> +int
> +main()
> +{
> +  asm("":"=r"(a):"0"(1));
> +  asm("":"=r"(b):"0"(&a));
> +  t();
> +  return 0;
> +}
> Index: testsuite/gcc.dg/lto/c-compatible-types_1.c
> ===================================================================
> --- testsuite/gcc.dg/lto/c-compatible-types_1.c	(revision 0)
> +++ testsuite/gcc.dg/lto/c-compatible-types_1.c	(working copy)
> @@ -0,0 +1,19 @@
> +enum a {test1, test2};
> +enum a a;
> +enum a *b;
> +
> +void reset (void);
> +
> +void
> +t()
> +{
> +  if (a != test2)
> +    __builtin_abort ();
> +  if (*b != test2)
> +    __builtin_abort ();
> +  reset ();
> +  if (a != test1)
> +    __builtin_abort ();
> +  if (*b != test1)
> +    __builtin_abort ();
> +}
> Index: tree.c
> ===================================================================
> --- tree.c	(revision 223877)
> +++ tree.c	(working copy)
> @@ -12877,7 +12877,8 @@
>      return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2);
>  
>    /* Can't be the same type if the types don't have the same code.  */
> -  if (TREE_CODE (t1) != TREE_CODE (t2))
> +  if (tree_code_for_canonical_type_merging (TREE_CODE (t1))
> +      != tree_code_for_canonical_type_merging (TREE_CODE (t2)))
>      return false;
>  
>    /* Qualifiers do not matter for canonical type comparison purposes.  */
> Index: tree.h
> ===================================================================
> --- tree.h	(revision 223877)
> +++ tree.h	(working copy)
> @@ -4598,7 +4598,28 @@
>  extern void DEBUG_FUNCTION verify_type (const_tree t);
>  extern bool gimple_canonical_types_compatible_p (const_tree, const_tree,
>  						 bool trust_type_canonical = true);
> +/* Return simplified tree code of type that is used for canonical type merging.  */
> +inline enum tree_code
> +tree_code_for_canonical_type_merging (enum tree_code code)
> +{
> +  /* By C standard, each enumerated type shall be compatible with char,
> +     a signed integer, or an unsigned integer.  The choice of type is
> +     implementation defined (in our case it depends on -fshort-enum).
>  
> +     For this reason we make no distinction between ENUMERAL_TYPE and INTEGER
> +     type and compare only by their signedness and precision.  */
> +  if (code == ENUMERAL_TYPE)
> +    return INTEGER_TYPE;
> +  /* To allow inter-operability between languages having references and
> +     C, we consider reference types and pointers alike.  Note that this is
> +     not strictly necessary for C-Fortran 2008 interoperability because
> +     Fortran define C_PTR type that needs to be compatible with C pointers
> +     and we handle this one as ptr_type_node.  */
> +  if (code == REFERENCE_TYPE)
> +    return POINTER_TYPE;
> +  return code;
> +}
> +
>  #define tree_map_eq tree_map_base_eq
>  extern unsigned int tree_map_hash (const void *);
>  #define tree_map_marked_p tree_map_base_marked_p
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Teach gimple_canonical_types_compatible_p about incomplete types
  2015-06-03 11:41           ` Richard Biener
@ 2015-06-03 22:12             ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2015-06-03 22:12 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jan Hubicka, Joseph Myers, Bernhard Reutner-Fischer, gcc-patches

> On Sat, 30 May 2015, Jan Hubicka wrote:
> 
> > Joseph, Richard,
> > this is patch implementing the ENUM/INGEGER globbing and also POINTER/REFERENCE
> > (though I don't know if that one follows by some standard rules).
> > Joseph, does the attached testcase make sense for you? Is it defined? It is my
> > first attempt to really interpret C standard to detail.
> > 
> > Ideally I would like to have testcases for all the globbing we do and reasoning
> > why it is needed.
> > 
> > Bootstraped/regtested ppc64le-linux. OK?
> 
> Works for me.  (what about BOOLEAN_TYPE?)

No idea. So far I did not find anything in the language standards that would strictly
require to merge these two though I see it would make sense when mixing K&R and Ansi-C
units...

I am going to push out patch that complains about decl merging where memory locations are
TBAA incompatible. We will get warnings on these then and we shall see how much it hit
us.

Honza

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-06-03 22:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-25  1:32 Teach gimple_canonical_types_compatible_p about incomplete types Jan Hubicka
2015-05-25 17:33 ` Bernhard Reutner-Fischer
2015-05-26  1:18   ` Jan Hubicka
2015-05-29 21:37     ` Joseph Myers
2015-05-29 21:46       ` Jan Hubicka
2015-05-30  3:28         ` Jan Hubicka
2015-05-30 15:38           ` Bernhard Reutner-Fischer
2015-05-30 21:14             ` Jan Hubicka
2015-06-01 19:47           ` Joseph Myers
2015-06-02 17:34             ` Jan Hubicka
2015-06-02 17:43               ` Joseph Myers
2015-06-02 17:55                 ` Jan Hubicka
2015-06-03 11:41           ` Richard Biener
2015-06-03 22:12             ` Jan Hubicka
2015-05-26  8:45 ` Richard Biener
2015-05-26 17:36   ` Jan Hubicka
2015-05-27  8:38     ` Richard Biener
2015-05-27 22:53       ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).