public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Questions about __attribute((cold))
@ 2008-03-23  4:02 John Fine
  2008-03-24 18:08 ` John Fine
  2008-03-24 20:27 ` Brian Dessent
  0 siblings, 2 replies; 4+ messages in thread
From: John Fine @ 2008-03-23  4:02 UTC (permalink / raw)
  To: gcc-help

I want to understand some things about __attribute((cold)) that were 
unclear from the documentation. I have a few reasons including a 
specific possible use  that seems to not work.

I am using (GCC) 4.4.0 20080319 (experimental)
I constructed several sample cpp files and compiled with
g++  -O3 -S -fstrict-aliasing -fverbose-asm test.cpp
and looked at test.s to see the generated code.  I haven't yet managed 
to even construct a case in which __attribute((cold)) makes a difference.

For the first specific use I have in mind, consider code with the basic 
structure:

while (xxx1) {
   xxx2;
   if (xxx3) {
       xxx4; }
   xxx5; }

Where each xxxn is replaced by some useful code.  I want to invent a 
syntax to be used across my project in several such cases to get the 
compiler to push xxx4 out of line to avoid polluting the L1 cache.

One version of that which seems to work, but I dislike syntactically is:

#define rarely_true(x) __builtin_expect(x,0)
while (xxx1) {
   xxx2;
   if ( rarely_true(xxx3) ) {
       xxx4; }
   xxx5; }

The documentation of "cold" seems to say that declaring a function as 
cold will tag the block containing the call to that function as cold, so 
I hoped to be able to use something like this (which syntax I prefer):

inline void __attribute__((cold)) rarely_execute() {}
while (xxx1) {
   xxx2;
   if (xxx3) {
       rarely_execute();
       xxx4; }
   xxx5; }

That generates identical code as without the rarely_execute(), unlike 
the above rarely_true(...) version that generates code that looks like 
it would be better.

I don't know whether I misunderstood what cold is supposed to do, or  
whether the fact that the function does nothing causes the cold to be 
ignored or what.

I have also tried some examples where I want cold on a non empty 
function for other purposes (such as error reporting) and even there 
haven't constructed an example where it makes a difference.  Though for 
that use I have not tried as many different constructs to try to find a 
case where it makes a difference.

I understand that the usual advice for these situations is to use 
profile guided optimization instead of programmer guided optimization.  
I think my reasons for rejecting that are valid, but I'd prefer not to 
discuss them here.  I'd appreciate some help with programmer guided 
optimization methods without a lot of advice against the general idea.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Questions about __attribute((cold))
  2008-03-23  4:02 Questions about __attribute((cold)) John Fine
@ 2008-03-24 18:08 ` John Fine
  2008-03-24 20:27 ` Brian Dessent
  1 sibling, 0 replies; 4+ messages in thread
From: John Fine @ 2008-03-24 18:08 UTC (permalink / raw)
  To: John Fine; +Cc: gcc-help

I want to clarify a few things that probably weren't clear in my initial 
question.

I'm asking primarily about a feature which I think is new in GCC 4.3.  I 
downloaded, built, and tested just version 4.4.0 in case there has been 
recent further work on this new feature, but I'm asking this question 
primarily about version 4.3.  If you can help me understand how 4.3 
behaves, please don't decide not to reply based on my use of 4.4.0 for 
my own testing.

In the example below, I would like to know if there is some way to 
define rarely_executed() so it's use will tell the compiler that xxx4 
will be executed far less often than xxx2 and xxx5.  I thought 
__attribute__((cold)) might do that and I tried that.  But if some other 
way of defining rarely_executed() will do that, I'd like to know.

I would also like to know whether __attribute__((cold)) does or doesn't 
change the compilation of the block containing the call.  The 
documentation seems to say it does.  I looked at the gcc source code and 
there seems to be code to achieve that.  But I haven't found the effect 
in the actual generated asm code.

John Fine wrote:

> I am using (GCC) 4.4.0 20080319 (experimental)


> inline void __attribute__((cold)) rarely_execute() {}
> while (xxx1) {
>   xxx2;
>   if (xxx3) {
>       rarely_execute();
>       xxx4; }
>   xxx5; }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Questions about __attribute((cold))
  2008-03-23  4:02 Questions about __attribute((cold)) John Fine
  2008-03-24 18:08 ` John Fine
@ 2008-03-24 20:27 ` Brian Dessent
  2008-03-24 21:25   ` John Fine
  1 sibling, 1 reply; 4+ messages in thread
From: Brian Dessent @ 2008-03-24 20:27 UTC (permalink / raw)
  To: John Fine; +Cc: gcc-help

John Fine wrote:

> while (xxx1) {
>    xxx2;
>    if (xxx3) {
>        xxx4; }
>    xxx5; }
> 
> Where each xxxn is replaced by some useful code.  I want to invent a
> syntax to be used across my project in several such cases to get the
> compiler to push xxx4 out of line to avoid polluting the L1 cache.

You may also need -freorder-blocks-and-partition for that.  From
invoke.texi:

> @item -freorder-blocks-and-partition
> @opindex freorder-blocks-and-partition
> In addition to reordering basic blocks in the compiled function, in order
> to reduce number of taken branches, partitions hot and cold basic blocks
> into separate sections of the assembly and .o files, to improve
> paging and cache locality performance.
> 
> This optimization is automatically turned off in the presence of
> exception handling, for linkonce sections, for functions with a user-defined
> section attribute and on any architecture that does not support named
> sections.

If you read that section of the manual it looks like -freorder-functions
and -freorder-blocks are enabled by default at -O2 and -O3, but not
-freorder-blocks-and-partition.  To me that implies that hot/cold
partitioning by default will happen at the function level but not at the
basic block level (as you seek) unless you also specify
-freorder-blocks-and-partition.

Brian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Questions about __attribute((cold))
  2008-03-24 20:27 ` Brian Dessent
@ 2008-03-24 21:25   ` John Fine
  0 siblings, 0 replies; 4+ messages in thread
From: John Fine @ 2008-03-24 21:25 UTC (permalink / raw)
  To: gcc-help

I'll try some tests to see what that does, but I have two reasons to 
expect that is not what I want.

I would prefer not to put the cold blocks in separate sections, because 
that may require unnecessarily larger branch instructions.  I'm worrying 
primarily about the L1 cache.  For the L1 cache, most of the possible 
benefits are achieved by dropping all the cold blocks right after the 
final return of the function.  There is no need to push them to a 
separate section.  For virtual memory locality, pushing them to a 
separate section would be important.

When I used __builtin_expect to tell the compiler about a cold block, it 
did push that cold block to right after the final return of the 
function.  So without changing any command line option it does know what 
to do with cold blocks.  The question seems to be whether it recognizes 
a block is cold (based on that block calling a cold function).  The 
option you suggest deals with what to do with a cold block not with how 
to recognize a cold block.

Brian Dessent wrote:

>>@item -freorder-blocks-and-partition
>>@opindex freorder-blocks-and-partition
>>In addition to reordering basic blocks in the compiled function, in order
>>to reduce number of taken branches, partitions hot and cold basic blocks
>>into separate sections of the assembly and .o files, to improve
>>paging and cache locality performance.
>>
>>    
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-03-24 21:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-23  4:02 Questions about __attribute((cold)) John Fine
2008-03-24 18:08 ` John Fine
2008-03-24 20:27 ` Brian Dessent
2008-03-24 21:25   ` John Fine

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).