Re: GSoC proposal: Provide optimizations feedback through post-compilation messages

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: GSoC proposal: Provide optimizations feedback through post-compilation messages
@ 2012-04-02 19:57 Thibault Raffaillac
  2012-04-04 20:05 ` Tomasz Borowik
  0 siblings, 1 reply; 5+ messages in thread
From: Thibault Raffaillac @ 2012-04-02 19:57 UTC (permalink / raw)
  To: gcc

Bump!

Let me renew my interest in contributing through GSoC with post-compilation
feedback (This was not an early april joke). Do you think it could lead to an
acceptable GSoC proposal? (mentor interested?)

@Tomasz:
On the interaction side I totally agree that communication between compiler and
programmer is scarce (and there is room for improvement). Focusing too soon on
the editor would overlook the vast users needs though, as:
_ some users do not use an IDE (and will kindly refuse);
_ some users do not need more communication, as they already know what GCC can
  and cannot do;
_ some users do not want more communication, as they have other business to
  focus on;

I think the editor being split from the compiler is good thing. There still
exist tools to expose static analysis data from the compiler (and choose the
editor to visualize it with), but fundamentally they are assisting him/her
rather than helping him/her improve. Instead of gathering loads of data on the
optimizations/analysis performed, and filtering it for visualization by the
user, we could relate the optimization technique used so that the user truly
knows what GCC is capable of (instead of guessing by observation).

My proposal is thus not to be confused with a static analysis visualization:
the programmer learns what techniques are implemented in GCC (or in compilers
in general), how to write code that is more easily compiled, and can further
browse the Internet for detailed theory on the techniques involved.

The point on the possible-optimizations-which-could-be-enabled-if-specific-
-constraint-is-lifted is particularly interesting, but is also extremely risky
if the compiler makes a stupid remark on a constraint which can "obviously"
(for the programmer) not be lifted. If ever, I would introduce it with a LOT of
care.

Thibault
ps: As for an editor with real-time feedback on static analysis and more, I am
100% with you :) (and there are some promising prototypes, like in this talk:
http://vimeo.com/36579366)

> Hello all,
> 
> My name is Thibault Raffaillac, CS degree student at Kungliga Tekniska Högskolan,
> Stockholm, Sweden (in double-degree partnership with Ecole Centrale Marseille,
> France).
> GCC currently provides no concise way to inform the user whether it applied an
> expected optimization (ie, it "understood" the code). As a result, some will do
> premature optimizations when they do not trust the compiler, and some others
> will create overly convoluted code with blind belief in the compiler. This is
> especially relevant for users non-initiated to the internals of GCC.
> The project I would like to propose is a feedback for the optimizations
> performed by GCC. To avoid binding users to the compiler, I would focus on some
> very standard optimizations across vendors, or for some specific yet nice
> features I would indicate their specificity to GCC/an architecture.
> 
> The feedback would be triggered when compilation is successful, and display a
> couple of different messages each time it is run:
> gcc --feedback test.c
> test.c:xx:x: info: All operands being constant, constant folding was applied to assign '2560' to 'a'
> test.c:xx:x: info: GCC could not fold constants here because...
> test.c:xx:x: info: As integers are stored in binary format, strength reduction was applied to replace '* 8' by '<< 3'
> test.c:xx:x: info: Basic block vectorization was applied to pack the 3 independent additions into a single SIMD instruction
> test.c:xx:x: info: GCC implements unordered_map as open-addressed hash tables, with double hashing probing
> 
> As a difference with the internal verbose messages, here they would form a set,
> and the system would remember those already displayed and decrease their
> frequency of occurence between compilations. All messages would explain what
> triggered them, cite the optimization name, and describe the consequence.
> 
> As for the work plan, it would consist in:
> _ Enumerating all possible messages in the messages set.
> _ Implementing a function receiving feedback from each optimization unit and
>   choosing whether to display it: info_printf(enum INFO_INDEX, const char*, ...);
> _ Write a formatting guide for adding messages in the set.
> 
> My academic background includes compiler construction, C programming and Human-
> Computer Interactions. I am very much interested in the usability of compilers
> (on which I am currently carrying my degree thesis -
> http://www.csc.kth.se/~traf/traf-sketch.pdf) and thus would be glad to
> contribute to GCC.
> 
> If this can be of interest, suggestions are welcome!
> 
> Best regards,
> Thibault (http://www.csc.kth.se/~traf/)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GSoC proposal: Provide optimizations feedback through post-compilation messages
  2012-04-02 19:57 GSoC proposal: Provide optimizations feedback through post-compilation messages Thibault Raffaillac
@ 2012-04-04 20:05 ` Tomasz Borowik
  2012-04-12 18:15   ` Thibault Raffaillac
  0 siblings, 1 reply; 5+ messages in thread
From: Tomasz Borowik @ 2012-04-04 20:05 UTC (permalink / raw)
  To: gcc

On Mon, 2 Apr 2012 19:57:20 +0000
Thibault Raffaillac <traf@kth.se> wrote:

> Bump!
> 
> Let me renew my interest in contributing through GSoC with post-compilation
> feedback (This was not an early april joke). Do you think it could lead to an
> acceptable GSoC proposal? (mentor interested?)

Feedback can be scarce, but don't let that stop you from submitting a
proposal.
Either way, can you keep me informed about any progress? I might wish to help
though that would probably be later in the cycle (got a lot queued up for
the comming months).

> @Tomasz:
> On the interaction side I totally agree that communication between compiler and
> programmer is scarce (and there is room for improvement). Focusing too soon on
> the editor would overlook the vast users needs though, as:
> _ some users do not use an IDE (and will kindly refuse);
> _ some users do not need more communication, as they already know what GCC can
>   and cannot do;
> _ some users do not want more communication, as they have other business to
>   focus on;

Sure, I'm one of the people who don't use an IDE as it causes more
issues than it solves for me. This isn't meant for everyone the same
way anything else isn't, it just can't;p Still looking at it, other
languages, different IDEs, I'd say my way of tackling the issues is
more usable and useful than most other, and could easily see wider
adoption. Btw my experience is mostly in low-level kernel/driver
programming, 2/3d graphics, games.

> I think the editor being split from the compiler is good thing. There still
> exist tools to expose static analysis data from the compiler (and choose the
> editor to visualize it with), but fundamentally they are assisting him/her
> rather than helping him/her improve. Instead of gathering loads of data on the
> optimizations/analysis performed, and filtering it for visualization by the
> user, we could relate the optimization technique used so that the user truly
> knows what GCC is capable of (instead of guessing by observation).

Great that's exactly what I'm aiming at:) It's not just presenting the
results of static analysis in real-time, as I actually dislike most
kinds of it like finding memory leaks, to me that seems like an attempt
to make the computer do what it's really bad at (understanding the
code). I just want to give the programmer the fullest picture of the
situation but at the same time make it so it doesn't become noise that
interferes. More or less you can say the goal is "To provide feedback
that allows the user to extend his understanding of the program". That
mostly means giving access to all the information that can be
unambiguously concluded from the code by the computer. To what degree
we carry it and how much the compiler is involved is only a question of
practicality and performance.

> My proposal is thus not to be confused with a static analysis visualization:
> the programmer learns what techniques are implemented in GCC (or in compilers
> in general), how to write code that is more easily compiled, and can further
> browse the Intwawaernet for detailed theory on the techniques involved.

Perfect! However, how to do that so that it actually works seems a bit
complex. The first (practically unsolvable) issue is what actually
constitutes better code, as given two pieces one may be faster in some
cirtumstances while the other in different. But as I understand that's
not really what we're trying to tell the user, rather we want him to
explore for himself what's possible and what are the results and why
they are the way they are? I'm guessing this will unfortunately (or
fortunately) require him to actually see and undestand the intermediate
code, see how it changes after different optimizations, and see the
output assembly. Personally I really need/want that;) Though my end
target is a bit more to "broaden" the abstraction when programming
(both up and down), so not to just show what's happening with the code
but also allow the programmer to interact with it on that lower level.
LLVM seems like the perfect fit for that but I've got some gripes with
it, and that is still far away in the future.

> The point on the possible-optimizations-which-could-be-enabled-if-specific-
> -constraint-is-lifted is particularly interesting, but is also extremely risky
> if the compiler makes a stupid remark on a constraint which can "obviously"
> (for the programmer) not be lifted. If ever, I would introduce it with a LOT of
> care.

Yes and no. First of all I don't necessarily mean for the
compiler/editor to suggest anything to the programmer, rather if the
programmer asks just say what's physically possible, and not what's
right, since if the compiler could do that it would just perform the
optimization. Furthermore the situation with my source code is that I
can probably make all this in such a form that it is actually usable
and useful which seems to me close to impossible with normal languages.
I can also with almost no effort store within the source code the
"dialogue" between the programmer and compiler, whether he analyzed
something allow him to make a quick "don't report this ever again" note
with a reason for other developers. Also I personally think that if a
programmer wants to shoot himself in the foot he should be allowed to
do that as soon/fast as possible as that is the most important learning
tool, though warnings are obviously still very much welcome.

> Thibault
> ps: As for an editor with real-time feedback on static analysis and more, I am
> 100% with you :) (and there are some promising prototypes, like in this talk:
> http://vimeo.com/36579366)
> 

Unfortunately I only saw 36m of it as it broke and seeking doesn't work
on vimeo for me, so I'll watch the rest later. To me it touches on some
of the right issues/concepts but in slightly the wrong way, and it
completely ignores some issues. First of all the exact things he's
showing are extremely limited in their applicability. The graphical and
circuit stuff is very domain specific and mostly already done, the
issue with them usually boils down to performance and it's the same
issue we hit with giant IDEs. That's also an issue I'm adressing with
my language and the prognosis for the future looks very promising as it
does often more than xcode/eclipse and can be faster than simple text
editors like geany/kate. Also the kind of instant he's showing (the
video might be partly to blame) is a bit far from my definition of
instant, I've used xcode and it isn't instant even in the most basic
operations like switching between files.

The example with the animation of the leaf, highlights an important
issue that very often you know exactly what you want/need but don't
know how to get it. With programming I very often hit a situation where
I know what the assembly could look like, but have no idea how to make
gcc output it like that or even whether it is like that since checking
would take too long.

The more immediate problem to me is of scale, as no one has a problem
with those kinds of code or circuits or whatever. The question is how
do you improve your performance and quality of output when you're
dealing with between 50k and 10m lines of code. And to me that is for
the most part by givin the programmer exactly the information he is
looking for immediately when he needs it, without slowing down the
basic tasks or interrupting the workflow.

The main difference with his demos or other IDEs and what I'm doing is
that it's still just a layer above the source code, just a feature
added ad-hoc after designing the language. What I'm doing is actually
part of the language, in my solution the information stored as the
source code can itself be presented as text, graphs, trees, tables or
whatever happens to be most efficient for the task at hand.

To give the most basic (to the point of primitiveness) example when
you're working with an application that has let's say 50 different
kinds of objects it displays on the screen and those are arranged in a
certain hierarchy of classes, you'll usually organize the code to match
that hierarchy (meaning just bundle the methods and types together).
But at the same time all of them have stuff cross-cutting between them
like the mechanism for printing them on screen, or handling
mouse/keyboard input. So depending whether you are changing/adding a
new object or a mechanism for working on/between them, you'll want to
organize the source code differently, and that's trivially simple with
my solution as you can have as many perspectives as you wish, either
manual or automatic like call-graphs. Keep in mind that's just one very
tiny part of the whole, meaning I'm not saying that solves all the
issues in the world;p

-- 
Tomasz Borowik

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: GSoC proposal: Provide optimizations feedback through post-compilation messages
  2012-04-04 20:05 ` Tomasz Borowik
@ 2012-04-12 18:15   ` Thibault Raffaillac
  0 siblings, 0 replies; 5+ messages in thread
From: Thibault Raffaillac @ 2012-04-12 18:15 UTC (permalink / raw)
  To: Tomasz Borowik, gcc

Quite lengthy but very interesting mail! It took me a while to formulate a proper reply :)

> Feedback can be scarce, but don't let that stop you from submitting a
> proposal.
> Either way, can you keep me informed about any progress? I might wish to help
> though that would probably be later in the cycle (got a lot queued up for
> the comming months).

Submitted :) The reviews are not too positive yet, my biggest efforts go into
making my plan clear. If any progress, help will be very appreciable indeed.

> Great that's exactly what I'm aiming at:) It's not just presenting the
> results of static analysis in real-time, as I actually dislike most
> kinds of it like finding memory leaks, to me that seems like an attempt
> to make the computer do what it's really bad at (understanding the
> code). I just want to give the programmer the fullest picture of the
> situation but at the same time make it so it doesn't become noise that
> interferes. More or less you can say the goal is "To provide feedback
> that allows the user to extend his understanding of the program". That
> mostly means giving access to all the information that can be
> unambiguously concluded from the code by the computer. To what degree
> we carry it and how much the compiler is involved is only a question of
> practicality and performance.

I quite agree for the most part, still there is a subtle nuance on which I want
to argue: Do we really help the programmer by offering all the valuable
information that is possible to infer? Ten years from now, would he/she be a
better programmer if we had not let him/her strive to simulate the program in
mind, or code a portion in assembly and finally learn about machine
architecture?

My point is to avoid creating an interface that "assists" of "helps" the
programmer, as he/she might become dependent on it. This is just helping in the
short term, and the only person who ever learns something is the one who
actually creates the compiler. If a statement could sum my view, it would be
that "the user improves through his/her use of the interface" (here the
feedback messages).

How does it make a difference in practice? I want to minimize the information given :)
The reason I want to introduce feedback messages is that this particular
information (the inner workings of compilers) is very hard to find in practice.
I want to give a slight help to put the user on the rails, nothing more.

> Perfect! However, how to do that so that it actually works seems a bit
> complex. The first (practically unsolvable) issue is what actually
> constitutes better code, as given two pieces one may be faster in some
> cirtumstances while the other in different. But as I understand that's
> not really what we're trying to tell the user, rather we want him to
> explore for himself what's possible and what are the results and why
> they are the way they are? I'm guessing this will unfortunately (or
> fortunately) require him to actually see and undestand the intermediate
> code, see how it changes after different optimizations, and see the
> output assembly. Personally I really need/want that;) Though my end
> target is a bit more to "broaden" the abstraction when programming
> (both up and down), so not to just show what's happening with the code
> but also allow the programmer to interact with it on that lower level.
> LLVM seems like the perfect fit for that but I've got some gripes with
> it, and that is still far away in the future.

Excellent! Letting the user explore by himself sounds great, and seing the
output assembly/IR besides is indeed a must. I like the idea that compilation
is a cooperation between programmer and machine (as far as the programmer is
inclined to help of course). It would also be nice to see compilation be split
at Value range propagation, as one could verify it is properly computed, before
proceeding into optimizations.

> Unfortunately I only saw 36m of it as it broke and seeking doesn't work
> on vimeo for me, so I'll watch the rest later. To me it touches on some
> of the right issues/concepts but in slightly the wrong way, and it
> completely ignores some issues.

Agreed. (Only the first half of the video is relevant for the programming
prototype)

Thibault

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GSoC proposal: Provide optimizations feedback through post-compilation messages
  2012-03-27 22:34 Thibault Raffaillac
@ 2012-03-30  0:01 ` Tomasz Borowik
  0 siblings, 0 replies; 5+ messages in thread
From: Tomasz Borowik @ 2012-03-30  0:01 UTC (permalink / raw)
  To: gcc

On Tue, 27 Mar 2012 22:33:39 +0000
Thibault Raffaillac <traf@kth.se> wrote:

> Hello all,
> 
> My name is Thibault Raffaillac, CS degree student at Kungliga Tekniska Högskolan,
> Stockholm, Sweden (in double-degree partnership with Ecole Centrale Marseille,
> France).
> GCC currently provides no concise way to inform the user whether it applied an
> expected optimization (ie, it "understood" the code). As a result, some will do
> premature optimizations when they do not trust the compiler, and some others
> will create overly convoluted code with blind belief in the compiler. This is
> especially relevant for users non-initiated to the internals of GCC.
> The project I would like to propose is a feedback for the optimizations
> performed by GCC. To avoid binding users to the compiler, I would focus on some
> very standard optimizations across vendors, or for some specific yet nice
> features I would indicate their specificity to GCC/an architecture.
> 
> The feedback would be triggered when compilation is successful, and display a
> couple of different messages each time it is run:
> gcc --feedback test.c
> test.c:xx:x: info: All operands being constant, constant folding was applied to assign '2560' to 'a'
> test.c:xx:x: info: GCC could not fold constants here because...
> test.c:xx:x: info: As integers are stored in binary format, strength reduction was applied to replace '* 8' by '<< 3'
> test.c:xx:x: info: Basic block vectorization was applied to pack the 3 independent additions into a single SIMD instruction
> test.c:xx:x: info: GCC implements unordered_map as open-addressed hash tables, with double hashing probing
> 
> As a difference with the internal verbose messages, here they would form a set,
> and the system would remember those already displayed and decrease their
> frequency of occurence between compilations. All messages would explain what
> triggered them, cite the optimization name, and describe the consequence.
> 
> As for the work plan, it would consist in:
> _ Enumerating all possible messages in the messages set.
> _ Implementing a function receiving feedback from each optimization unit and
>   choosing whether to display it: info_printf(enum INFO_INDEX, const char*, ...);
> _ Write a formatting guide for adding messages in the set.
> 
> My academic background includes compiler construction, C programming and Human-
> Computer Interactions. I am very much interested in the usability of compilers
> (on which I am currently carrying my degree thesis -
> http://www.csc.kth.se/~traf/traf-sketch.pdf) and thus would be glad to
> contribute to GCC.
> 
> If this can be of interest, suggestions are welcome!
> 
> Best regards,
> Thibault (http://www.csc.kth.se/~traf/)
> 

Hi Thibault,

I completely agree, and it's actually a part of what I'm targeting in the long term, so I think we might be able to join forces. I'm also thinking of a gsoc project though in different areas (there's an email in the list about them on 19.03), so maybe we could do separate parts that combine into something even more awesome;)

I think a huge part of the issue is in the medium of communication between the programmer and compiler. I'm targeting an environment where the source code editor practically becomes the compiler's front-end. My project allows extremely dynamic presentation of the source code, so I can e.g.
 - easily inform the programmer about anything in an unobtrusive manner within the code, 
 - give him different perspectives of the same code,
 - allow him to give precise and detailed information to the compiler about possible code optimizations without making the code unreadable.

The first two points may seem already solved by eclipse, xcode or whatever other gigantic ide, but I'm talking about a much larger scale of feedback presented instantly like: ex/implicit and inferred typing info, constant folds, dead code, unfolded loops, data flow, vector operations, tree view of expressions.

The first issue is that for any non trivial amount of code you'll end up with thousands of messages 90% of which are probably not very interesting (similarly to warnings in a certain style of objective programming in C). As long as the output is not interleaved with the code at the right place and the delay from writing to getting feedback is too long, the feature will loose much of its usefullness. Though don't misunderstand me, I think it's still better to have the info in any form than not.

The last point is probably the more important, as there often is a large amount of optimizations that cannot be done due to for example pointer aliasing rules, but the programmer knows that the optimization is safe. I can easily add literally hundreds of markers like "this expression is volatile", "the result of this function call will not change within this loop", "these two pointers don't alias" and it wouldn't obfuscate the code as much as with normal languages. Furthermore my editor can easily list only the meaningful options for a given expression with full descriptions of what they do.

-- 
Tomasz Borowik <timon37@lavabit.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* GSoC proposal: Provide optimizations feedback through post-compilation messages
@ 2012-03-27 22:34 Thibault Raffaillac
  2012-03-30  0:01 ` Tomasz Borowik
  0 siblings, 1 reply; 5+ messages in thread
From: Thibault Raffaillac @ 2012-03-27 22:34 UTC (permalink / raw)
  To: gcc

Hello all,

My name is Thibault Raffaillac, CS degree student at Kungliga Tekniska Högskolan,
Stockholm, Sweden (in double-degree partnership with Ecole Centrale Marseille,
France).
GCC currently provides no concise way to inform the user whether it applied an
expected optimization (ie, it "understood" the code). As a result, some will do
premature optimizations when they do not trust the compiler, and some others
will create overly convoluted code with blind belief in the compiler. This is
especially relevant for users non-initiated to the internals of GCC.
The project I would like to propose is a feedback for the optimizations
performed by GCC. To avoid binding users to the compiler, I would focus on some
very standard optimizations across vendors, or for some specific yet nice
features I would indicate their specificity to GCC/an architecture.

The feedback would be triggered when compilation is successful, and display a
couple of different messages each time it is run:
gcc --feedback test.c
test.c:xx:x: info: All operands being constant, constant folding was applied to assign '2560' to 'a'
test.c:xx:x: info: GCC could not fold constants here because...
test.c:xx:x: info: As integers are stored in binary format, strength reduction was applied to replace '* 8' by '<< 3'
test.c:xx:x: info: Basic block vectorization was applied to pack the 3 independent additions into a single SIMD instruction
test.c:xx:x: info: GCC implements unordered_map as open-addressed hash tables, with double hashing probing

As a difference with the internal verbose messages, here they would form a set,
and the system would remember those already displayed and decrease their
frequency of occurence between compilations. All messages would explain what
triggered them, cite the optimization name, and describe the consequence.

As for the work plan, it would consist in:
_ Enumerating all possible messages in the messages set.
_ Implementing a function receiving feedback from each optimization unit and
  choosing whether to display it: info_printf(enum INFO_INDEX, const char*, ...);
_ Write a formatting guide for adding messages in the set.

My academic background includes compiler construction, C programming and Human-
Computer Interactions. I am very much interested in the usability of compilers
(on which I am currently carrying my degree thesis -
http://www.csc.kth.se/~traf/traf-sketch.pdf) and thus would be glad to
contribute to GCC.

If this can be of interest, suggestions are welcome!

Best regards,
Thibault (http://www.csc.kth.se/~traf/)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-04-12 18:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-02 19:57 GSoC proposal: Provide optimizations feedback through post-compilation messages Thibault Raffaillac
2012-04-04 20:05 ` Tomasz Borowik
2012-04-12 18:15   ` Thibault Raffaillac
  -- strict thread matches above, loose matches on Subject: below --
2012-03-27 22:34 Thibault Raffaillac
2012-03-30  0:01 ` Tomasz Borowik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).