public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC] Whole Program Devirtualization
@ 2021-08-20  9:48 Feng Xue OS
  2021-08-20 13:45 ` Martin Liška
  0 siblings, 1 reply; 5+ messages in thread
From: Feng Xue OS @ 2021-08-20  9:48 UTC (permalink / raw)
  To: gcc; +Cc: JiangNing OS

1. Background

C++ Devirtualization is meant to determine all possible class type
identities of a virtual dynamic call, and try to transform it to direct
static call, or certain kind of statements with lower runtime cost.
Current implementation in GCC tends to adopt a somewhat conservative
scheme, originated from a safe but defensive assumption that class
hierarchy of a type (except for FINAL class or class in anonymous
namespace) is incomplete, and might be extended by external library and
module. Therefore, most of qualified virtual calls would be partially
devirtualized to a speculative form of direct call as the following,
guarded by a function address check and a failsafe branch.

    if (obj->vfn_ptr == &T::FN)
        obj->T::FN();
    else
        obj->vfn_ptr();

Here, full devirtualization is required if we want to get vcalls
concisely transformed with no guard test. In theory, when whole program
compilation is enforced, we could enable such kind of devirtualization
based on the closed-world assumption about contained class types. But
"whole program" does not ensure that class hierarchy of a type never
span to dependent C++ libraries (one is libstdc++), which would result
in incorrect devirtualization. An example is given below to demonstrate
the problem. 

    // Has been pre-compiled to a library
    class Base {
        virtual void method() = 0;
    };

    class Derive_in_Library : public Base {
	virtual void method()  { ... }
    };

    Base *get_object() 
    {
        return new Derive_in_Library();
    }

    -------------------------------------------------------
  
    // User source code to compile
    class Base {
        virtual void method() = 0;
    };

    class Derive : public Base {
	virtual void method()  { ... }
    };

    void foo()
    {
      Base *obj = get_object();
 
      obj->method();
    }

If there is no way to inspect entire class hierarchy comprised by
relevant types in the library and user source code, whole program
analysis would improperly think of 'Derive' as sole descendant of
'Base', and triggers devirtualizing 'obj->method' to 'Derive::method',
which is definitely unexpected transformation. To target this
complication, we should figure out those types that are referenced by
regular object file and library, and exclude them from candidates of
full devirtualization. This RFC will discuss some implementable
approaches to identify whether type is qualified for full
devirtualization under whole program assumption (so also called whole-
program devirtualization, WPD for short), and new optimizations
inspired by whole program type analysis.


2. Whole-program local type

In this RFC, we merely care about polymorphic class (class contains
virtual function), since only this kind of type is meaningful to
devirtualization. A class type is identified as local in terms of
whole-program scope iff there is no instantiation of the type or its
derived types in regular object files and libraries to link with.
A whole-program local type is safe to apply WPD. Here are two
mechanisms for us to perform this check on a given type: 

2.1 Typeinfo symbol resolution by linker-plugin

Essentially, devirtualization is driven by analysis on vtable lifetime,
which begins once the vtable is attached to a new complete object or
sub-object during instantiation of a class or its derived class.

Seemingly, we could find whether a class type is referenced in object
file or library by tracking availability of its vtable symbol. But
vtable might be purged and we are still interested in its belonging
class type. Refer to the library code in above example, vtable of
'Base' is unused since it neither participate construction of 'Base'
nor 'Derive_in_Library', but we still must know if 'Base' is live
in the library to ensure correct result.

Moreover, once a class is virtual inherited, it will have multiple
vtables (including construction vtables), but the way of looking up
class via vtable symbol requires one-to-one correspondence between
them, then it does not work.

Beside vtable symbol, class instantiation also creates references to
typeinfo symbols of itself and all its parent classes. At the same time,
each class has unique typeinfo, which could act as a desirable type
marker, and be used to perform type lookup in object file and library.
Anyway, the approach is not 100% safe, specifically, when typeinfo
symbols are invisible or missed, for example, when libraries to link
against was built with -fno-weak or -fno-rtti. But at least for
libstc++, these symbols should be present for the sake of allowing
dynamic_cast in user source code.

Lto-linker-plugin will work with linker to do a whole-program symbol
resolution before LTO/WPA happens, and attach binding information to
symbols. Given a typeinfo symbol, if it is resolved as
LDPR_PREVAILING_DEF_IRONLY (only referenced from IR code, with no
references from regular objects), its corresponding class type is
deemed to be whole-program local. In above example, resolution of
typeinfo symbols generated for user code will be:

    class Base {                  //  global type
	// _ZTI4Base (typeinfo for Base)      LDPR_PREVAILING_DEF
        virtual void method() = 0;
    };

    class Derive : public Base {  // local type
	// _ZTI6Derive (typeinfo for Derive)  LDPR_PREVAILING_DEF_IRONLY
	virtual void method()  { ... }
    };

2.2 Symbol visibility of class type

As a common practice to design interface for a library, anything that
is expected to be visible by user program would be explicitly decorated
with __attribute__((__visibility__("default"))) or equivalent syntax at
the source level. Since C++ class type has linkage as function and
variable, it could also be exported using this practice, which is
actually followed by implementation of libstdc++. Then adding
-fvisibility=hidden to compilation of user source code would make those
undecorated class types be hidden (symbol privatization by
-fwhole-program nearly does the same thing), thus compiler can easily
make a distinction between local and global types solely based on
visibility, which can be reflected on corresponding vtable symbol. For
above example, if 'Base' is supposed to be exported in a library header
file, we know that 'Base' is not whole-program local, and suppress full
devirtualization on it.

    // Library header file: Base.h
    class Base __attribute__((__visibility__("default"))) { // global type
        virtual void method() = 0;
    };

    -------------------------------------------------------

    // User source code to compile
    #include <Base.h>

    class Derive : public Base {  // -fvisibility=hidden, local type
	virtual void method()  { ... }
    };

This approach is adopted by LLVM, and works for C++ application whose
C++ libraries to link with correctly announce exported class types as
how libstdc++ does, which is probably true for most independent self-
contained system or 3rd-part libraries. However, defining class type
w/o visibility specification is normal C++ coding practice, especially
in user source code, then it will be unsafe if the C++ application is
separated to more than one internally-used libraries and modules.
Additionally, this approach is unapplicable on system that does not
support ELF symbol visibility, such as Windows.


3. New whole-program optimizations

3.1 Virtual function elimination

Currently, GCC only does VFE under very limited conditions (for class in
anonymous namespace). Now this can be extended to unused whole-program local
class whose virtual functions will not added to devirtualization targets
of virtual calls. Another fine-grained VFE is also possible but with more
complexities, that is to remove virtual function that is never called, 
while its belonging class is live.

3.2 Virtual call pure-const analysis

Side effect of a virtual call could be determined if inheritance graph
of object type is complete, and this analysis can simply be archived
by iteratively parsing all possible function targets.

3.3 Virtual constant replacement

There is a common scenario in design of class hierarchy, that each
class type has a constant attribute shared by all objects of that
class. For example, it could be a unique kind number, against which
safe type cast becomes a comparison over kind number + a light
static_cast, with no need of dynamic_cast.

Instead of keeping this constant attribute in a class object field, it
is always implemented as a simple const function which just returns
value of that attribute, and this is why we call it virtual constant.
An example:

    enum NodeType {
        T_NODE,
	T_SUB_NODE_1,
	T_SUB_NODE_2
    };

    class Node {
        /* Attribute may be duplicate. */
	virtual bool isRoot()  { return true; }

        /* Attribute is unique. */
        virtual NodeType getNodeType() { return T_NODE; }
    };

    class SubNode1 : public Node {
	virtual bool isRoot()  { return false; }
        virtual NodeType getNodeType() { return T_SUB_NODE_1; }
    };

    class SubNode2 : public Node {
	virtual bool isRoot()  { return false; }
        virtual NodeType getNodeType() { return T_SUB_NODE_2; }
    };

For whole-program local class hierarchy, it is a natural optimization
to move virtual constant into vtable (since vtable can be seen as an
information repository to keep class-wide data), and replace virtual
call with a memory load from vtable. As to where to save virtual
constant, there are two choices:

  o. The vtable slot of virtual function, but it will cause ABI-
     incompatible codes.

  o. Re-layout vtable to add a new slot, primarily at the head of
     vtable.

Since a vtable slot is of pointer-size, generally wide enough, a more
aggressive means is to compact as many virtual constants into one slot
as possible.

For comparison over unique virtual constant (getNodeType is example),
if class hierarchy contains no virtual inheritance, the virtual constant
value and vtable is perfect one-to-one correspondence. It can be further
lowered to comparison over vtable address, which saves one memory load.
To illustrate optimization effect, the first is a C++ statement,
following which three pieces of gimple dump are also given, respectively 
representing original gimple IR, gimple IR of loading virtual constant
from vtable, and gimple IR of direct vtable comparison.

   if (obj->getNodeType() == T_NODE)
      ...

   // Original gimple IR
   _31 = obj_30->_vptr.Node;
   _32 = MEM[(int (*__vtbl_ptr_type) () *)_31 + 8B];
   _33 = OBJ_TYPE_REF(_11;(struct Node)obj_30->1) (obj_30);
   if (_33 == T_NODE)
     ...

   // Load virtual constant from vtable
   _31 = obj_30->_vptr.Node;
   _32 = MEM[(int *)_31 + 8B];
   if (_32 == T_NODE)
     ...

   // Direct vtable comparison
   _31 = obj_30->_vptr.Node;
   if (_31 == &_ZTS4Node)   // vtable for Node
     ...

We are planning to work on tasks proposed in this RFC step by step.
In fact, some of them have been completed in a prototype, but not sure
if its basic idea on how to identify a whole-program local class is
acceptable or not, so hope your comments on this. Thanks.

Best Regards,
Feng

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Whole Program Devirtualization
  2021-08-20  9:48 [RFC] Whole Program Devirtualization Feng Xue OS
@ 2021-08-20 13:45 ` Martin Liška
  2021-09-01  1:46   ` PING: " Feng Xue OS
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Liška @ 2021-08-20 13:45 UTC (permalink / raw)
  To: Feng Xue OS, gcc; +Cc: JiangNing OS, Jan Hubicka

... adding Honza to CC who spent quite some time on devirtualization pass.

Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* PING: [RFC] Whole Program Devirtualization
  2021-08-20 13:45 ` Martin Liška
@ 2021-09-01  1:46   ` Feng Xue OS
  2021-09-14  2:35     ` PING^2: " Feng Xue OS
  0 siblings, 1 reply; 5+ messages in thread
From: Feng Xue OS @ 2021-09-01  1:46 UTC (permalink / raw)
  To: Martin Liška, Jan Hubicka, gcc; +Cc: JiangNing OS

Honza,

  How do you think about proposal in this RFC? Thanks a lot.

Best Regards,
Feng
________________________________________
From: Martin Liška <mliska@suse.cz>
Sent: Friday, August 20, 2021 9:45 PM
To: Feng Xue OS; gcc@gcc.gnu.org
Cc: JiangNing OS; Jan Hubicka
Subject: Re: [RFC] Whole Program Devirtualization

... adding Honza to CC who spent quite some time on devirtualization pass.

Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* PING^2: [RFC] Whole Program Devirtualization
  2021-09-01  1:46   ` PING: " Feng Xue OS
@ 2021-09-14  2:35     ` Feng Xue OS
  2021-09-15 13:58       ` Martin Jambor
  0 siblings, 1 reply; 5+ messages in thread
From: Feng Xue OS @ 2021-09-14  2:35 UTC (permalink / raw)
  To: Martin Liška, Jan Hubicka, Richard Biener, gcc; +Cc: JiangNing OS

Ping again.

Thanks,
Feng

________________________________________
From: Feng Xue OS <fxue@os.amperecomputing.com>
Sent: Wednesday, September 1, 2021 9:46 AM
To: Martin Liška; Jan Hubicka; gcc@gcc.gnu.org
Cc: JiangNing OS
Subject: PING: [RFC] Whole Program Devirtualization

Honza,

  How do you think about proposal in this RFC? Thanks a lot.

Best Regards,
Feng
________________________________________
From: Martin Liška <mliska@suse.cz>
Sent: Friday, August 20, 2021 9:45 PM
To: Feng Xue OS; gcc@gcc.gnu.org
Cc: JiangNing OS; Jan Hubicka
Subject: Re: [RFC] Whole Program Devirtualization

... adding Honza to CC who spent quite some time on devirtualization pass.

Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PING^2: [RFC] Whole Program Devirtualization
  2021-09-14  2:35     ` PING^2: " Feng Xue OS
@ 2021-09-15 13:58       ` Martin Jambor
  0 siblings, 0 replies; 5+ messages in thread
From: Martin Jambor @ 2021-09-15 13:58 UTC (permalink / raw)
  To: Feng Xue OS, Martin Liška, Jan Hubicka, Richard Biener, gcc
  Cc: JiangNing OS

Hi Feng,

On Tue, Sep 14 2021, Feng Xue OS via Gcc wrote:
> Ping again.
>

I have read your email but do not have much to comment.  I think that
you have identified all the potential issues yourself.  IIUC, I am
afraid that they will mean that at least the "Typeinfo symbol resolution
by linker-plugin" and "Virtual constant replacement" (but I suspect also
other things you propose) will not be safe to perform by default at any
optimization level save -Ofast.  The cost-benefit evaluation of having
it in the compiler will be difficult, but if you believe there will be
users (and we will be able to document the semantics of the new options
well) and the code is not very difficult to maintain, then presumably
there will be no hard objections, I think.

Thanks for looking into such techniques and discussing them here,

Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-15 13:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20  9:48 [RFC] Whole Program Devirtualization Feng Xue OS
2021-08-20 13:45 ` Martin Liška
2021-09-01  1:46   ` PING: " Feng Xue OS
2021-09-14  2:35     ` PING^2: " Feng Xue OS
2021-09-15 13:58       ` Martin Jambor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).