Hi, this email might be of interest to people working on the gimple level and those dealing with the C++ front-end. I have been looking at how to do devirtualization in gcc on the gimple level. Eventually I want to do it intraprocedurally through folding, in indirect inlining and also propagate types in IPA-CP. In order to learn how to find the target of a virtual call if the base type is known, I have had a look at the current state of folding of OBJ_TYPE_REF calls, which I thought was already working, only to find out it really wasn't. We did not see errors just because the sole user of gimple_fold_obj_type_ref() in tree-ssa-ccp.c called it in only the simplest of circumstances, namely when OBJ_TYPE_REF_OBJECT was an address of a declaration, even though often it is an address of a series of COMPONENT_REFs of a declaration. First I have simply attempted to fold OBJ_TYPE_REFs with such object references as well but I have run into ICEs when the object had multiple ancestors and the called virtual function came from the second one (or any but the first one). The problem is that fold_obj_type_ref, as it is now, simply takes OBJ_TYPE_REF_TOKEN as an index i into the virtual table of the given type and returns ith method in the list of virtual methods of that type it acquires from its BINFO. The problem is that the first virtual method of the second ancestor has index zero in the ancestor (and thus OBJ_TYPE_REF_TOKEN is zero) but not zero in the big containing object. Thus fold_obj_type_ref returns a wrong method which eventually leads to some type checking error or miscompilation. So my second attempt was to require that all the fields in the OBJ_TYPE_REF_OBJECT chain of COMPONENT_REFs are all first ancestors through base_binfos. This worked as expected and for example raised the number of folded calls in the g++ testsuite to 2906 from just 22. However, I still was not able to fold my multiple inheritance example. Examining that example (which is a testcase of the patch below) I have realized that BASE_BINFOs in it have lists of virtual methods of their own which point to the correct methods (i.e. base_binfo representing A in B had B::foo(), not A:foo() in the list). Therefore my third attempt was to traverse the base_binfo tree while traversing the component_refs and use the corresponding list of virtual methods. Unfortunately, I have found that some base_binfos have wrong lists of virtual methods with the methods corresponding to the ancestor in them. However, I have also found out that such binfos have BINFO_FLAG_2 (which is TREE_FLAG_2) unset and BINFO_FLAG_5 set and so I could simply refuse to fold upon encountering them. That worked and I could still fold my example but not much more, the number of folded expressions in the testcase was back to 22. So my fourth and so far the last attempt to tackle this issue was to combine the previous two approaches, I use the binfo of the true type of the whole object as long as the fields in COMPONENT_REFs refer to first ancestors and resort to traversing the base_binfos if multiple inheritance is detected. This way the number of folded virtual calls in the testcases is back to 2906 and my example is successfully devirtualized too. Of course, this is quite awkward. If I have missed anything and the process of locating the correct method can be simplified somehow, I'd like to know. I would especially like to ask the C++ people to tell me if the general idea is sensible and I would appreciate their comments on the binfo flags 2 and 5 as well. As I have written in the beginning, this is just the first step. I need to employ a similar mechanism in devirtualization in indirect inlining and IPA-CP which inevitably mandates some quite fun semantics of jump-functions. Therefore if you can comment on the general idea, please try to do so soon. The patch is an RFC one intended for 4.6. Nevertheless, it passes bootstrap and testing (including Ada) on x86_64-linux (rev 153007). Some of the interfaces will probably change a bit as I will generalize the binfo search so that it can be used by indirect inlining and IPA-CP but at the moment I do not intend to do any big changes. To give some more numbers, the patch folds all 8 OBJ_TYPE_REFs that have an ADDR_EXPR of a part of a declaration as their object in the set of libstdc++ benchmarks. The runtime of the ultra-artificial testcase attached to this email (when compiled with -O3) dropped from 22 seconds to 7 seconds on my desktop. Thanks for any comments, Martin 2009-10-21 Martin Jambor * gimple.c (get_base_binfo_for_type): New function. (get_relevant_ref_binfo_single_inh): New function. (get_relevant_ref_binfo_multi_inh): New function. (gimple_fold_obj_type_ref_known_binfo): New function. (gimple_fold_obj_type_ref): Get binfo from get_relevant_ref_binfo_single_inh and get_relevant_ref_binfo_multi_inh and use gimple_fold_obj_type_ref_known_binfo. * gimple.h (gimple_fold_obj_type_ref): Declare. (gimple_fold_obj_type_ref_known_binfo): Declare. * tree-ssa-ccp.c (fold_gimple_call): Simplify condition for folding virtual calls and call gimple_fold_obj_type_ref. Index: icln/gcc/gimple.c =================================================================== --- icln.orig/gcc/gimple.c +++ icln/gcc/gimple.c @@ -4563,24 +4563,135 @@ gimple_decl_printable_name (tree decl, i return IDENTIFIER_POINTER (DECL_NAME (decl)); } +/* Search for a base binfo of BINFO that corresponds to TYPE and return it if + it is found or NULL_TREE if it is not. */ -/* Fold a OBJ_TYPE_REF expression to the address of a function. - KNOWN_TYPE carries the true type of OBJ_TYPE_REF_OBJECT(REF). Adapted - from cp_fold_obj_type_ref, but it tolerates types with no binfo - data. */ +static tree +get_base_binfo_for_type (tree binfo, tree type) +{ + int i; + tree base_binfo; + tree res = NULL_TREE; + + for (i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++) + if (TREE_TYPE (base_binfo) == type) + { + gcc_assert (!res); + res = base_binfo; + } -tree -gimple_fold_obj_type_ref (tree ref, tree known_type) + return res; +} + +/* Return a binfo describing the true type of object referenced by expression + REF if all component references access first ancestors. REF can consist of + a series of COMPONENT_REFs of with a declaration or an INDIREC_REF. It can + also be just a simple declaration, indirect reference or an SSA_NAME. If + the function discoveres an INIDRECT_REF or an SSA_NAME, it will assume that + the encapsulating type is KNOWN_TYPE or return NULL_TREE if KNOWN_TYPE is + NULL_TREE. Otherwise the first nonartifical field declaration or the base + declaration will be examined to get the encapsulating type. If a + COMPONENT_REF referencing an ancestor which is not the first one, this + function returns NULL_TREE and sets *try_multi to true. */ + +static tree +get_relevant_ref_binfo_single_inh (tree ref, tree known_type, bool *try_multi) { - HOST_WIDE_INT index; - HOST_WIDE_INT i; - tree v; - tree fndecl; + while (true) + { + if (TREE_CODE (ref) == COMPONENT_REF) + { + tree par_type; + tree binfo, base_binfo; + tree field = TREE_OPERAND (ref, 1); + + if (!DECL_ARTIFICIAL (field)) + return TYPE_BINFO (TREE_TYPE (field)); + + par_type = TREE_TYPE (TREE_OPERAND (ref, 0)); + binfo = TYPE_BINFO (par_type); + if (!binfo + || BINFO_N_BASE_BINFOS (binfo) == 0) + { + if (try_multi) + *try_multi = 1; + return NULL_TREE; + } + base_binfo = BINFO_BASE_BINFO (binfo, 0); + if (TREE_TYPE (base_binfo) != TREE_TYPE (field)) + { + if (try_multi) + *try_multi = 1; + return NULL_TREE; + } + + ref = TREE_OPERAND (ref, 0); + } + else if (DECL_P (ref)) + return TYPE_BINFO (TREE_TYPE (ref)); + else if (known_type + && (TREE_CODE (ref) == SSA_NAME + || TREE_CODE (ref) == INDIRECT_REF)) + return TYPE_BINFO (known_type); + else + return NULL_TREE; + } +} + - if (TYPE_BINFO (known_type) == NULL_TREE) +/* Return a binfo describing the part of object referenced by expression REF. + This can and often is a base_binfo of a descendatn binfo. REF can consist of + a series of COMPONENT_REFs of with a declaration or an INDIREC_REF. It can + also be just a simple declaration, indirect reference or an SSA_NAME. If + the function discoveres an INIDRECT_REF or an SSA_NAME, it will assume that + the encapsulating type is KNOWN_TYPE or return NULL_TREE if KNOWN_TYPE is + NULL_TREE. Otherwise the first nonartifical field declaration or the base + declaration will be examined to get the encapsulating type. */ + +static tree +get_relevant_ref_binfo_multi_inh (tree ref, tree known_type) +{ + if (DECL_P (ref)) + return TYPE_BINFO (TREE_TYPE (ref)); + else if (TREE_CODE (ref) == COMPONENT_REF) + { + tree desc_binfo; + tree field = TREE_OPERAND (ref, 1); + + if (!DECL_ARTIFICIAL (field)) + return TYPE_BINFO (TREE_TYPE (field)); + desc_binfo = get_relevant_ref_binfo_multi_inh (TREE_OPERAND (ref, 0), + known_type); + if (!desc_binfo) + return NULL_TREE; + return get_base_binfo_for_type (desc_binfo, TREE_TYPE (field)); + } + else if (known_type + && (TREE_CODE (ref) == SSA_NAME + || TREE_CODE (ref) == INDIRECT_REF)) + return TYPE_BINFO (known_type); + else return NULL_TREE; +} + - v = BINFO_VIRTUALS (TYPE_BINFO (known_type)); +/* Fold a OBJ_TYPE_REF expression to the address of a function. KNOWN_BINFO + carries the binfo describing the true type of OBJ_TYPE_REF_OBJECT(REF). */ + +tree +gimple_fold_obj_type_ref_known_binfo (tree ref, tree known_binfo) +{ + HOST_WIDE_INT index, i; + tree v, fndecl; + + /* If binfo flag 2 is not set, this binfo does not "have its own virtual + table" (according to cp/cp-tree.h) and cannot be safely used for + devirtualization. (Purely empirical experience also shows that we can + also bail out if flag 5 _is_ set but I have no reason to prefer one + way over the other.) */ + if (!BINFO_FLAG_2 (known_binfo)) + return NULL_TREE; + v = BINFO_VIRTUALS (known_binfo); index = tree_low_cst (OBJ_TYPE_REF_TOKEN (ref), 1); i = 0; while (i != index) @@ -4591,15 +4702,33 @@ gimple_fold_obj_type_ref (tree ref, tree } fndecl = TREE_VALUE (v); + return build_fold_addr_expr (fndecl); +} -#ifdef ENABLE_CHECKING - gcc_assert (tree_int_cst_equal (OBJ_TYPE_REF_TOKEN (ref), - DECL_VINDEX (fndecl))); -#endif - cgraph_node (fndecl)->local.vtable_method = true; +/* Fold a OBJ_TYPE_REF expression to the address of a function. If KNOWN_TYPE + is not NULL_TREE, it is the true type of the outmost encapsulating object if + that comes from a pointer SSA_NAME. If the true outmost encapsulating type + can be determined from a declaration OBJ_TYPE_REF_OBJECT(REF), it is used + regardless of KNOWN_TYPE (which thuc can be NULL_TREE). */ - return build_fold_addr_expr (fndecl); +tree +gimple_fold_obj_type_ref (tree ref, tree known_type) +{ + tree obj = OBJ_TYPE_REF_OBJECT (ref); + bool try_multi = false; + tree binfo; + + if (TREE_CODE (obj) == ADDR_EXPR) + obj = TREE_OPERAND (obj, 0); + + binfo = get_relevant_ref_binfo_single_inh (obj, known_type, &try_multi); + if (!binfo && try_multi) + binfo = get_relevant_ref_binfo_multi_inh (obj, known_type); + if (binfo) + return gimple_fold_obj_type_ref_known_binfo (ref, binfo); + else + return NULL_TREE; } #include "gt-gimple.h" Index: icln/gcc/tree-ssa-ccp.c =================================================================== --- icln.orig/gcc/tree-ssa-ccp.c +++ icln/gcc/tree-ssa-ccp.c @@ -2924,9 +2924,6 @@ fold_gimple_call (gimple_stmt_iterator * } else { - /* Check for resolvable OBJ_TYPE_REF. The only sorts we can resolve - here are when we've propagated the address of a decl into the - object slot. */ /* ??? Should perhaps do this in fold proper. However, doing it there requires that we create a new CALL_EXPR, and that requires copying EH region info to the new node. Easier to just do it @@ -2934,19 +2931,11 @@ fold_gimple_call (gimple_stmt_iterator * /* ??? Is there a good reason not to do this in fold_stmt_inplace? */ callee = gimple_call_fn (stmt); if (TREE_CODE (callee) == OBJ_TYPE_REF - && lang_hooks.fold_obj_type_ref - && TREE_CODE (OBJ_TYPE_REF_OBJECT (callee)) == ADDR_EXPR - && DECL_P (TREE_OPERAND - (OBJ_TYPE_REF_OBJECT (callee), 0))) + && TREE_CODE (OBJ_TYPE_REF_OBJECT (callee)) == ADDR_EXPR) { tree t; - /* ??? Caution: Broken ADDR_EXPR semantics means that - looking at the type of the operand of the addr_expr - can yield an array type. See silly exception in - check_pointer_types_r. */ - t = TREE_TYPE (TREE_TYPE (OBJ_TYPE_REF_OBJECT (callee))); - t = lang_hooks.fold_obj_type_ref (callee, t); + t = gimple_fold_obj_type_ref (callee, NULL_TREE); if (t) { gimple_call_set_fn (stmt, t); Index: icln/gcc/gimple.h =================================================================== --- icln.orig/gcc/gimple.h +++ icln/gcc/gimple.h @@ -859,6 +859,7 @@ unsigned get_gimple_rhs_num_ops (enum tr gimple gimple_alloc_stat (enum gimple_code, unsigned MEM_STAT_DECL); const char *gimple_decl_printable_name (tree, int); tree gimple_fold_obj_type_ref (tree, tree); +tree gimple_fold_obj_type_ref_known_binfo (tree, tree); /* Returns true iff T is a valid GIMPLE statement. */ extern bool is_gimple_stmt (tree); Index: icln/gcc/testsuite/g++.dg/otr-fold-1.C =================================================================== --- /dev/null +++ icln/gcc/testsuite/g++.dg/otr-fold-1.C @@ -0,0 +1,77 @@ +/* Verify that simple virtual calls are inlined even without early + inlining, even when a typecast to an ancestor is involved along the + way. */ +/* { dg-do run } */ +/* { dg-options "-O -fdump-tree-optimized-slim" } */ + +extern "C" void abort (void); + +class Distraction +{ +public: + float f; + double d; + Distraction () + { + f = 8.3; + d = 10.2; + } + virtual float bar (float z); +}; + +class A +{ +public: + int data; + virtual int foo (int i); +}; + + +class B : public Distraction, public A +{ +public: + virtual int foo (int i); +}; + +float Distraction::bar (float z) +{ + f += z; + return f/2; +} + +int A::foo (int i) +{ + return i + 1; +} + +int B::foo (int i) +{ + return i + 2; +} + +int __attribute__ ((noinline,noclone)) get_input(void) +{ + return 1; +} + +static int middleman_1 (class A *obj, int i) +{ + return obj->foo (i); +} + +static int middleman_2 (class B *obj, int i) +{ + return middleman_1 (obj, i); +} + +int main (int argc, char *argv[]) +{ + class B b; + + if (middleman_2 (&b, get_input ()) != 3) + abort (); + return 0; +} + +/* { dg-final { scan-tree-dump "= B::foo" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */