public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* GCC [RFC] Whole Program Devirtualization
@ 2021-08-20 12:36 Basile Starynkevitch
  2021-08-21  8:41 ` Jonathan Wakely
  2021-08-23  2:23 ` Feng Xue OS
  0 siblings, 2 replies; 3+ messages in thread
From: Basile Starynkevitch @ 2021-08-20 12:36 UTC (permalink / raw)
  To: fxue; +Cc: basile.starynkevitch, gcc

Hello Feng Xue OS


Your project is interesting, but ambitious.

I think the major points are:

*whole program analysis*. Static analysis tools like 
https://frama-c.com/ <https://frama-c.com/> or 
https://github.com/bstarynk/bismon/ 
<https://github.com/bstarynk/bismon/> could be relevant. Projects like 
https://www.decoder-project.eu/ <https://www.decoder-project.eu/> could 
be relevant. With cross-compilation, things are becoming harder.

*abstract interpretation* might be relevant (but difficult and costly to 
implement). See wikipedia.

*size of the whole program which is analyzed*.  If the entire program 
(including system libraries like libc) has e.g. less than ten thousand 
routines and less than a million GIMPLE instructions in total, it make 
sense. But if the entire program is as large as the Linux kernel, or the 
GCC compiler, or the Firefox browser (all have many millions lines of 
source code) you probably won't be able to do whole program 
devirtualization in a few years of human work.


*computed gotos* or *labels as values* (see 
https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html 
<https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html> for more) are 
making this difficult. But they do exist, and probably could be hidden 
in GNU glibc or libstdc++ internal code.

*asm**statements are difficult*. They usually appear inside your libc. 
How would you deal with them?

*Can you afford a month of computer time to compile a large software* 
with your whole program devirtualizer? In most cases, not, but Pitrat's 
book /Artificial Beings - the conscience of a conscious machine/ (ISBN 
9781848211018) suggest cases where it might make sense (he is explaining 
a "compiler like system" which runs for a month of CPU time).

My recommendation would be to *code first a simple GCC plugin as a proof 
of concept thing*, which reject programs which could not be 
realistically devirtualized, and store somewhere (in some database 
perhaps) a representation of them otherwise. I worked 3 years full time 
on https://github.com/bstarynk/bismon/ 
<https://github.com/bstarynk/bismon/> to achieve a similar goal (and I 
don't claim to have succeeded, and I don't have any more funding). My 
guess is that some code could be useful to you (then contact me by email 
both at work basile.starynkevitch@cea.fr and at home 
basile@starynkevitch.net ....)

The most important thing: limit your ambition at first. Write a document 
(at least an internal one) stating what you won't do.


Cheers

-- 
Basile Starynkevitch                  <basile@starynkevitch.net>
(only mine opinions / les opinions sont miennes uniquement)
92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: GCC [RFC] Whole Program Devirtualization
  2021-08-20 12:36 GCC [RFC] Whole Program Devirtualization Basile Starynkevitch
@ 2021-08-21  8:41 ` Jonathan Wakely
  2021-08-23  2:23 ` Feng Xue OS
  1 sibling, 0 replies; 3+ messages in thread
From: Jonathan Wakely @ 2021-08-21  8:41 UTC (permalink / raw)
  To: Basile Starynkevitch; +Cc: Feng Xue OS, gcc, basile.starynkevitch

On Fri, 20 Aug 2021, 13:37 Basile Starynkevitch, <basile@starynkevitch.net>
wrote:

>
> *computed gotos* or *labels as values* (see
> https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
> <https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html> for more) are
> making this difficult. But they do exist, and probably could be hidden
> in GNU glibc or libstdc++ internal code.
>

There are none in libstdc++.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: GCC [RFC] Whole Program Devirtualization
  2021-08-20 12:36 GCC [RFC] Whole Program Devirtualization Basile Starynkevitch
  2021-08-21  8:41 ` Jonathan Wakely
@ 2021-08-23  2:23 ` Feng Xue OS
  1 sibling, 0 replies; 3+ messages in thread
From: Feng Xue OS @ 2021-08-23  2:23 UTC (permalink / raw)
  To: Basile Starynkevitch; +Cc: basile.starynkevitch, gcc, Jan Hubicka, JiangNing OS

We are not going to create a new devirtualization framework from
scratch, just hope it to be an enhancement on current speculative
devirtualization. The process does not need parse native code in
library, but only resort to existing lightweight symbol resolution
by LTO-prelinker. And C++ virtual dispatching is expected to be
translated to gimple IR from C++ source, if user attempts to
hand-craft those using embedded ASMs, it should be considered as an
UB to C++ ABI.

Compile time of whole-program analysis is not that terrible as you
think, basically, it is realistically acceptable even base code is
very large. As I know, google enables WPD in building of chrome,
while it is based on llvm.

Thanks,
Feng

________________________________________
From: Basile Starynkevitch <basile@starynkevitch.net>
Sent: Friday, August 20, 2021 8:36 PM
To: Feng Xue OS
Cc: basile.starynkevitch@cea.fr; gcc@gcc.gnu.org
Subject: GCC [RFC] Whole Program Devirtualization

Hello Feng Xue OS


Your project is interesting, but ambitious.

I think the major points are:

whole program analysis. Static analysis tools like https://frama-c.com/ or https://github.com/bstarynk/bismon/ could be relevant. Projects like https://www.decoder-project.eu/ could be relevant. With cross-compilation, things are becoming harder.

abstract interpretation might be relevant (but difficult and costly to implement). See wikipedia.

size of the whole program which is analyzed.  If the entire program (including system libraries like libc) has e.g. less than ten thousand routines and less than a million GIMPLE instructions in total, it make sense. But if the entire program is as large as the Linux kernel, or the GCC compiler, or the Firefox browser (all have many millions lines of source code) you probably won't be able to do whole program devirtualization in a few years of human work.


computed gotos or labels as values (see https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html for more) are making this difficult. But they do exist, and probably could be hidden in GNU glibc or libstdc++ internal code.

asm statements are difficult. They usually appear inside your libc. How would you deal with them?

Can you afford a month of computer time to compile a large software with your whole program devirtualizer? In most cases, not, but Pitrat's book Artificial Beings - the conscience of a conscious machine (ISBN 9781848211018) suggest cases where it might make sense (he is explaining a "compiler like system" which runs for a month of CPU time).

My recommendation would be to code first a simple GCC plugin as a proof of concept thing, which reject programs which could not be realistically devirtualized, and store somewhere (in some database perhaps) a representation of them otherwise. I worked 3 years full time on https://github.com/bstarynk/bismon/ to achieve a similar goal (and I don't claim to have succeeded, and I don't have any more funding). My guess is that some code could be useful to you (then contact me by email both at work basile.starynkevitch@cea.fr<mailto:basile.starynkevitch@cea.fr> and at home basile@starynkevitch.net<mailto:basile@starynkevitch.net> ....)

The most important thing: limit your ambition at first. Write a document (at least an internal one) stating what you won't do.


Cheers

--
Basile Starynkevitch                  <basile@starynkevitch.net><mailto:basile@starynkevitch.net>
(only mine opinions / les opinions sont miennes uniquement)
92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-23  2:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20 12:36 GCC [RFC] Whole Program Devirtualization Basile Starynkevitch
2021-08-21  8:41 ` Jonathan Wakely
2021-08-23  2:23 ` Feng Xue OS

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).