public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/40093]  New: Optimization by functios reordering.
@ 2009-05-10 16:39 vvv at ru dot ru
  2009-05-10 16:43 ` [Bug c/40093] " vvv at ru dot ru
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: vvv at ru dot ru @ 2009-05-10 16:39 UTC (permalink / raw)
  To: gcc-bugs

Because memory controller prefetch memory blocks, execution time of functions
calls sequence depend on order this functions in memory. For example:
4 calls:

call func1
call func2
call func3
call func4

faster in case of direct functions order in memmory:
.p2align 4
func1:
      ret
.p2align 4
func2:
      ret
.p2align 4
func3:
      ret
.p2align 4
func4:
      ret

and slow in case inverse order:
.p2align 4
func4:
      ret
.p2align 4
func3:
      ret
.p2align 4
func2:
      ret
.p2align 4
func1:
      ret

Unfortunately, inverse order is typical for C/C++.
what do you think about this kind optimization?


-- 
           Summary: Optimization by functios reordering.
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: vvv at ru dot ru


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40093


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug c/40093] Optimization by functios reordering.
  2009-05-10 16:39 [Bug c/40093] New: Optimization by functios reordering vvv at ru dot ru
@ 2009-05-10 16:43 ` vvv at ru dot ru
  2009-05-10 16:49 ` [Bug middle-end/40093] " pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: vvv at ru dot ru @ 2009-05-10 16:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from vvv at ru dot ru  2009-05-10 16:43 -------
Created an attachment (id=17847)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17847&action=view)
Example direct/inverse calls

Simple example. RDTSC ticks for direct and inverse sequence of calls.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40093


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/40093] Optimization by functios reordering.
  2009-05-10 16:39 [Bug c/40093] New: Optimization by functios reordering vvv at ru dot ru
  2009-05-10 16:43 ` [Bug c/40093] " vvv at ru dot ru
@ 2009-05-10 16:49 ` pinskia at gcc dot gnu dot org
  2009-05-10 18:08 ` vvv at ru dot ru
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-05-10 16:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from pinskia at gcc dot gnu dot org  2009-05-10 16:49 -------
This should have been done already with cgraph order.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |middle-end


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40093


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/40093] Optimization by functios reordering.
  2009-05-10 16:39 [Bug c/40093] New: Optimization by functios reordering vvv at ru dot ru
  2009-05-10 16:43 ` [Bug c/40093] " vvv at ru dot ru
  2009-05-10 16:49 ` [Bug middle-end/40093] " pinskia at gcc dot gnu dot org
@ 2009-05-10 18:08 ` vvv at ru dot ru
  2009-05-10 18:10 ` pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: vvv at ru dot ru @ 2009-05-10 18:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from vvv at ru dot ru  2009-05-10 18:08 -------
(In reply to comment #2)
> This should have been done already with cgraph order.

Unfortunately, I can see inverse order only in separate source file. Inverse
but not optimized.
Example:
// file order1.c 
#include <stdio.h>
main(int argc, char **argv)
{int i,j,k,l;
i=func1();
j=func2();
k=func3();
l=func4();
printf("%d %d %d %d\n",i,j,k,l);
}
=====================================
// file order2.c 
int func1(){ return(F(4));}
int func2(){ return(F(3));}
int func3(){ return(F(2));}
int func4(){ return(F(1));}
=====================================
// file order3.c 
int F(int x){ return(x);}

# gcc --version
gcc (GCC) 4.5.0 20090508 (experimental)
# gcc -o order order3.c order2.c order1.c -O2
# objdump -d order

0000000000400520 <F>:
  400520:       89 f8                   mov    %edi,%eax
  400522:       c3                      retq   

0000000000400530 <func4>:
  400530:       bf 01 00 00 00          mov    $0x1,%edi
  400535:       31 c0                   xor    %eax,%eax
  400537:       e9 e4 ff ff ff          jmpq   400520 <F>

0000000000400540 <func3>:
  400540:       bf 02 00 00 00          mov    $0x2,%edi
  400545:       31 c0                   xor    %eax,%eax
  400547:       e9 d4 ff ff ff          jmpq   400520 <F>

0000000000400550 <func2>:
  400550:       bf 03 00 00 00          mov    $0x3,%edi
  400555:       31 c0                   xor    %eax,%eax
  400557:       e9 c4 ff ff ff          jmpq   400520 <F>

0000000000400560 <func1>:
  400560:       bf 04 00 00 00          mov    $0x4,%edi
  400565:       31 c0                   xor    %eax,%eax
  400567:       e9 b4 ff ff ff          jmpq   400520 <F>

0000000000400570 <main>:
  400570:       48 89 5c 24 e8          mov    %rbx,-0x18(%rsp)
  400575:       48 89 6c 24 f0          mov    %rbp,-0x10(%rsp)
  40057a:       31 c0                   xor    %eax,%eax
  40057c:       4c 89 64 24 f8          mov    %r12,-0x8(%rsp)
  400581:       48 83 ec 18             sub    $0x18,%rsp
  400585:       e8 d6 ff ff ff          callq  400560 <func1>
  40058a:       89 c3                   mov    %eax,%ebx
  40058c:       31 c0                   xor    %eax,%eax
  40058e:       e8 bd ff ff ff          callq  400550 <func2>
  400593:       89 c5                   mov    %eax,%ebp
  400595:       31 c0                   xor    %eax,%eax
  400597:       e8 a4 ff ff ff          callq  400540 <func3>
  40059c:       41 89 c4                mov    %eax,%r12d
  40059f:       31 c0                   xor    %eax,%eax
  4005a1:       e8 8a ff ff ff          callq  400530 <func4>
  4005a6:       44 89 e1                mov    %r12d,%ecx
  4005a9:       41 89 c0                mov    %eax,%r8d
  4005ac:       89 ea                   mov    %ebp,%edx
  4005ae:       89 de                   mov    %ebx,%esi
  4005b0:       48 8b 6c 24 08          mov    0x8(%rsp),%rbp
  4005b5:       48 8b 1c 24             mov    (%rsp),%rbx
  4005b9:       4c 8b 64 24 10          mov    0x10(%rsp),%r12
  4005be:       bf bc 06 40 00          mov    $0x4006bc,%edi
  4005c3:       31 c0                   xor    %eax,%eax
  4005c5:       48 83 c4 18             add    $0x18,%rsp
  4005c9:       e9 42 fe ff ff          jmpq   400410 <printf@plt>
=====================================

But optimal:

0000000000400520 <main>:
  400520:       48 89 5c 24 e8          mov    %rbx,-0x18(%rsp)
  400525:       48 89 6c 24 f0          mov    %rbp,-0x10(%rsp)
  40052a:       31 c0                   xor    %eax,%eax
  40052c:       4c 89 64 24 f8          mov    %r12,-0x8(%rsp)
  400531:       48 83 ec 18             sub    $0x18,%rsp
  400535:       e8 46 00 00 00          callq  400580 <func1>
  40053a:       89 c3                   mov    %eax,%ebx
  40053c:       31 c0                   xor    %eax,%eax
  40053e:       e8 4d 00 00 00          callq  400590 <func2>
  400543:       89 c5                   mov    %eax,%ebp
  400545:       31 c0                   xor    %eax,%eax
  400547:       e8 54 00 00 00          callq  4005a0 <func3>
  40054c:       41 89 c4                mov    %eax,%r12d
  40054f:       31 c0                   xor    %eax,%eax
  400551:       e8 5a 00 00 00          callq  4005b0 <func4>
  400556:       44 89 e1                mov    %r12d,%ecx
  400559:       41 89 c0                mov    %eax,%r8d
  40055c:       89 ea                   mov    %ebp,%edx
  40055e:       89 de                   mov    %ebx,%esi
  400560:       48 8b 6c 24 08          mov    0x8(%rsp),%rbp
  400565:       48 8b 1c 24             mov    (%rsp),%rbx
  400569:       4c 8b 64 24 10          mov    0x10(%rsp),%r12
  40056e:       bf bc 06 40 00          mov    $0x4006bc,%edi
  400573:       31 c0                   xor    %eax,%eax
  400575:       48 83 c4 18             add    $0x18,%rsp
  400579:       e9 92 fe ff ff          jmpq   400410 <printf@plt>

0000000000400580 <func1>:
  400580:       bf 01 00 00 00          mov    $0x1,%edi
  400585:       31 c0                   xor    %eax,%eax
  400587:       e9 34 00 00 00          jmpq   4005c0 <F>

0000000000400590 <func2>:
  400590:       bf 02 00 00 00          mov    $0x2,%edi
  400595:       31 c0                   xor    %eax,%eax
  400597:       e9 24 00 00 00          jmpq   4005c0 <F>

00000000004005a0 <func3>:
  4005a0:       bf 03 00 00 00          mov    $0x3,%edi
  4005a5:       31 c0                   xor    %eax,%eax
  4005a7:       e9 14 00 00 00          jmpq   4005c0 <F>

00000000004005b0 <func4>:
  4005b0:       bf 04 00 00 00          mov    $0x4,%edi
  4005b5:       31 c0                   xor    %eax,%eax
  4005b7:       e9 04 00 00 00          jmpq   4005c0 <F>

00000000004005c0 <F>:
  4005c0:       89 f8                   mov    %edi,%eax
  4005c2:       c3                      retq   
===========

BTW, how about reordering idea for data?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40093


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/40093] Optimization by functios reordering.
  2009-05-10 16:39 [Bug c/40093] New: Optimization by functios reordering vvv at ru dot ru
                   ` (2 preceding siblings ...)
  2009-05-10 18:08 ` vvv at ru dot ru
@ 2009-05-10 18:10 ` pinskia at gcc dot gnu dot org
  2009-05-10 18:20 ` vvv at ru dot ru
  2009-06-04 14:50 ` hubicka at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-05-10 18:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from pinskia at gcc dot gnu dot org  2009-05-10 18:10 -------
Well you need whole program to get the behavior which you want.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40093


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/40093] Optimization by functios reordering.
  2009-05-10 16:39 [Bug c/40093] New: Optimization by functios reordering vvv at ru dot ru
                   ` (3 preceding siblings ...)
  2009-05-10 18:10 ` pinskia at gcc dot gnu dot org
@ 2009-05-10 18:20 ` vvv at ru dot ru
  2009-06-04 14:50 ` hubicka at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: vvv at ru dot ru @ 2009-05-10 18:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from vvv at ru dot ru  2009-05-10 18:20 -------
(In reply to comment #4)
> Well you need whole program to get the behavior which you want.

Yes. Of course, it's no problem for small single-programmer project, but it's
problem for big projects like Linux Kernel.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40093


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/40093] Optimization by functios reordering.
  2009-05-10 16:39 [Bug c/40093] New: Optimization by functios reordering vvv at ru dot ru
                   ` (4 preceding siblings ...)
  2009-05-10 18:20 ` vvv at ru dot ru
@ 2009-06-04 14:50 ` hubicka at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2009-06-04 14:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from hubicka at gcc dot gnu dot org  2009-06-04 14:50 -------
There is simple algoritm reordering functions so calls more commonly leads to
following function in memory. (just order calls by frequency and concatenate
them into sequences and then order sequences to promote forward calls).

The problem here is that this conflicts with the optimizations propagatng
information top-down across callgraph (like stack alignment).

So I can write easilly pass that will give desired order, but I don't want
production of RTL bodies to happen this order and thus we need some assembler
support for this to order stuff properly (gas has subsections that would do
fine). I was just bit lazy to implement this so far.


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-06-04 14:50:20
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40093


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-06-04 14:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-10 16:39 [Bug c/40093] New: Optimization by functios reordering vvv at ru dot ru
2009-05-10 16:43 ` [Bug c/40093] " vvv at ru dot ru
2009-05-10 16:49 ` [Bug middle-end/40093] " pinskia at gcc dot gnu dot org
2009-05-10 18:08 ` vvv at ru dot ru
2009-05-10 18:10 ` pinskia at gcc dot gnu dot org
2009-05-10 18:20 ` vvv at ru dot ru
2009-06-04 14:50 ` hubicka at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).