* GCC [RFC] Whole Program Devirtualization
@ 2021-08-20 12:36 Basile Starynkevitch
2021-08-21 8:41 ` Jonathan Wakely
2021-08-23 2:23 ` Feng Xue OS
0 siblings, 2 replies; 3+ messages in thread
From: Basile Starynkevitch @ 2021-08-20 12:36 UTC (permalink / raw)
To: fxue; +Cc: basile.starynkevitch, gcc
Hello Feng Xue OS
Your project is interesting, but ambitious.
I think the major points are:
*whole program analysis*. Static analysis tools like
https://frama-c.com/ <https://frama-c.com/> or
https://github.com/bstarynk/bismon/
<https://github.com/bstarynk/bismon/> could be relevant. Projects like
https://www.decoder-project.eu/ <https://www.decoder-project.eu/> could
be relevant. With cross-compilation, things are becoming harder.
*abstract interpretation* might be relevant (but difficult and costly to
implement). See wikipedia.
*size of the whole program which is analyzed*. If the entire program
(including system libraries like libc) has e.g. less than ten thousand
routines and less than a million GIMPLE instructions in total, it make
sense. But if the entire program is as large as the Linux kernel, or the
GCC compiler, or the Firefox browser (all have many millions lines of
source code) you probably won't be able to do whole program
devirtualization in a few years of human work.
*computed gotos* or *labels as values* (see
https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
<https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html> for more) are
making this difficult. But they do exist, and probably could be hidden
in GNU glibc or libstdc++ internal code.
*asm**statements are difficult*. They usually appear inside your libc.
How would you deal with them?
*Can you afford a month of computer time to compile a large software*
with your whole program devirtualizer? In most cases, not, but Pitrat's
book /Artificial Beings - the conscience of a conscious machine/ (ISBN
9781848211018) suggest cases where it might make sense (he is explaining
a "compiler like system" which runs for a month of CPU time).
My recommendation would be to *code first a simple GCC plugin as a proof
of concept thing*, which reject programs which could not be
realistically devirtualized, and store somewhere (in some database
perhaps) a representation of them otherwise. I worked 3 years full time
on https://github.com/bstarynk/bismon/
<https://github.com/bstarynk/bismon/> to achieve a similar goal (and I
don't claim to have succeeded, and I don't have any more funding). My
guess is that some code could be useful to you (then contact me by email
both at work basile.starynkevitch@cea.fr and at home
basile@starynkevitch.net ....)
The most important thing: limit your ambition at first. Write a document
(at least an internal one) stating what you won't do.
Cheers
--
Basile Starynkevitch <basile@starynkevitch.net>
(only mine opinions / les opinions sont miennes uniquement)
92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: GCC [RFC] Whole Program Devirtualization
2021-08-20 12:36 GCC [RFC] Whole Program Devirtualization Basile Starynkevitch
@ 2021-08-21 8:41 ` Jonathan Wakely
2021-08-23 2:23 ` Feng Xue OS
1 sibling, 0 replies; 3+ messages in thread
From: Jonathan Wakely @ 2021-08-21 8:41 UTC (permalink / raw)
To: Basile Starynkevitch; +Cc: Feng Xue OS, gcc, basile.starynkevitch
On Fri, 20 Aug 2021, 13:37 Basile Starynkevitch, <basile@starynkevitch.net>
wrote:
>
> *computed gotos* or *labels as values* (see
> https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
> <https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html> for more) are
> making this difficult. But they do exist, and probably could be hidden
> in GNU glibc or libstdc++ internal code.
>
There are none in libstdc++.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: GCC [RFC] Whole Program Devirtualization
2021-08-20 12:36 GCC [RFC] Whole Program Devirtualization Basile Starynkevitch
2021-08-21 8:41 ` Jonathan Wakely
@ 2021-08-23 2:23 ` Feng Xue OS
1 sibling, 0 replies; 3+ messages in thread
From: Feng Xue OS @ 2021-08-23 2:23 UTC (permalink / raw)
To: Basile Starynkevitch; +Cc: basile.starynkevitch, gcc, Jan Hubicka, JiangNing OS
We are not going to create a new devirtualization framework from
scratch, just hope it to be an enhancement on current speculative
devirtualization. The process does not need parse native code in
library, but only resort to existing lightweight symbol resolution
by LTO-prelinker. And C++ virtual dispatching is expected to be
translated to gimple IR from C++ source, if user attempts to
hand-craft those using embedded ASMs, it should be considered as an
UB to C++ ABI.
Compile time of whole-program analysis is not that terrible as you
think, basically, it is realistically acceptable even base code is
very large. As I know, google enables WPD in building of chrome,
while it is based on llvm.
Thanks,
Feng
________________________________________
From: Basile Starynkevitch <basile@starynkevitch.net>
Sent: Friday, August 20, 2021 8:36 PM
To: Feng Xue OS
Cc: basile.starynkevitch@cea.fr; gcc@gcc.gnu.org
Subject: GCC [RFC] Whole Program Devirtualization
Hello Feng Xue OS
Your project is interesting, but ambitious.
I think the major points are:
whole program analysis. Static analysis tools like https://frama-c.com/ or https://github.com/bstarynk/bismon/ could be relevant. Projects like https://www.decoder-project.eu/ could be relevant. With cross-compilation, things are becoming harder.
abstract interpretation might be relevant (but difficult and costly to implement). See wikipedia.
size of the whole program which is analyzed. If the entire program (including system libraries like libc) has e.g. less than ten thousand routines and less than a million GIMPLE instructions in total, it make sense. But if the entire program is as large as the Linux kernel, or the GCC compiler, or the Firefox browser (all have many millions lines of source code) you probably won't be able to do whole program devirtualization in a few years of human work.
computed gotos or labels as values (see https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html for more) are making this difficult. But they do exist, and probably could be hidden in GNU glibc or libstdc++ internal code.
asm statements are difficult. They usually appear inside your libc. How would you deal with them?
Can you afford a month of computer time to compile a large software with your whole program devirtualizer? In most cases, not, but Pitrat's book Artificial Beings - the conscience of a conscious machine (ISBN 9781848211018) suggest cases where it might make sense (he is explaining a "compiler like system" which runs for a month of CPU time).
My recommendation would be to code first a simple GCC plugin as a proof of concept thing, which reject programs which could not be realistically devirtualized, and store somewhere (in some database perhaps) a representation of them otherwise. I worked 3 years full time on https://github.com/bstarynk/bismon/ to achieve a similar goal (and I don't claim to have succeeded, and I don't have any more funding). My guess is that some code could be useful to you (then contact me by email both at work basile.starynkevitch@cea.fr<mailto:basile.starynkevitch@cea.fr> and at home basile@starynkevitch.net<mailto:basile@starynkevitch.net> ....)
The most important thing: limit your ambition at first. Write a document (at least an internal one) stating what you won't do.
Cheers
--
Basile Starynkevitch <basile@starynkevitch.net><mailto:basile@starynkevitch.net>
(only mine opinions / les opinions sont miennes uniquement)
92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-08-23 2:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20 12:36 GCC [RFC] Whole Program Devirtualization Basile Starynkevitch
2021-08-21 8:41 ` Jonathan Wakely
2021-08-23 2:23 ` Feng Xue OS
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).