public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Confusing optimization
@ 2010-05-09 19:25 Luca Béla Palkovics
  2010-05-10  4:49 ` Ian Lance Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Luca Béla Palkovics @ 2010-05-09 19:25 UTC (permalink / raw)
  To: gcc-help

void a()
{
	... do my stuff
}

void b()
{
	... do my stuff
}

int main(int argc, char *argv[])
{
	a();
	b();	
}

>g++ main.cpp -O3
>./a.out
a takes 125ms
b takes 340ms

now the same but seperated
int main(int argc, char *argv[])
{
	a();
}

>g++ main.cpp -O3
>./a.out
a takes 85ms

Is this normal ? b has nothing todo with a .. why does a get slower ?
(b is also faster without a...)

Luca.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Confusing optimization
  2010-05-09 19:25 Confusing optimization Luca Béla Palkovics
@ 2010-05-10  4:49 ` Ian Lance Taylor
  2010-05-10 14:27   ` Luca Béla Palkovics
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Lance Taylor @ 2010-05-10  4:49 UTC (permalink / raw)
  To: Luca Béla Palkovics; +Cc: gcc-help

"Luca Béla Palkovics" <luca.bela.palkovics@gmail.com> writes:

> Is this normal ? b has nothing todo with a .. why does a get slower ?
> (b is also faster without a...)

There are a number of possibilities.  It's hard to know what is
happening without an exact test case.  You also neglected to say what
platform you are running on.

Some possibilities are:

1) Measurement error.  Surprisingly often people are not measuring
   what they think they are measuring, and you didn't provide any
   details about how you got your timings.

2) Instruction cache effects, if a() and b() call other functions.
   When both are linked together, those other functions will be at
   different addresses, and whether they are contiguous may change,
   all affecting the instruction cache.

3) Exact aligment of loop starts may shift when both are linked
   together, affecting the processor's branch optimizers if it has
   any.  Similarly, the exact alignment of labels may shift.  You can
   control these using gcc options like -falign-functions,
   -falign-jumps, -falign-labels, -falign-loops.

There are other, less likely, possibilities.

Ian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: Confusing optimization
  2010-05-10  4:49 ` Ian Lance Taylor
@ 2010-05-10 14:27   ` Luca Béla Palkovics
  2010-05-10 16:41     ` Ian Lance Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Luca Béla Palkovics @ 2010-05-10 14:27 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

Am Sonntag, den 09.05.2010, 21:49 -0700 schrieb Ian Lance Taylor:
> Some possibilities are:
> 
> 1) Measurement error.  Surprisingly often people are not measuring
>    what they think they are measuring, and you didn't provide any
>    details about how you got your timings.
function is extern "C" and I messure the time outside.

long start=GetTime(); //own function using gettimeofday (returns ms)
A();
long time=GetTime()-start;
//Output time..

So shold be correct..


> 2) Instruction cache effects, if a() and b() call other functions.
>    When both are linked together, those other functions will be at
>    different addresses, and whether they are contiguous may change,
>    all affecting the instruction cache.
> 3) Exact aligment of loop starts may shift when both are linked
>    together, affecting the processor's branch optimizers if it hasvar
>    any.  Similarly, the exact alignment of labels may shift.  You can
>    control these using gcc options like -falign-functions,
>    -falign-jumps, -falign-labels, -falign-loops.
Well I checked out now asm..
.. Ubuntu 10.04 with g++ 4.4.3

First line is function "a" alone 125ms
Second line is function "a" with "b" 130ms

...
.text:00000000004023BA                 jmp     short loc_402425
.text:00000000004023BA                 jmp     short loc_40241F
...
.text:0000000000402425                 cmp     [rsp+298h+var_118], 3
.text:000000000040241F                 cmp     [rsp+298h+var_118], 3
.text:000000000040242D                 jz      short loc_40243E
.text:0000000000402427                 jz      short loc_402438
...
LOOP START
...
.text:000000000040243E                 cmp     [rsp+298h+var_110],
0FFFFFEh
.text:0000000000402438                 cmp     [rsp+298h+var_110],
0FFFFFEh
.text:000000000040244A                 setle   al
.text:0000000000402444                 setle   al
.text:000000000040244D                 test    al, al
.text:0000000000402447                 test    al, al
.text:000000000040244F                 jnz     loc_4023BC
.text:0000000000402449                 jnz     loc_4023BC
...
.text:00000000004023C3                 cmp     [rsp+298h+var_98], 3
.text:00000000004023BC                 cmp     [rsp+298h+var_58], 3
.text:00000000004023CA                 jz      short loc_4023D9
.text:00000000004023C4                 jz      short loc_4023D3
...
.text:00000000004023D9                 mov     rax, [rsp+298h+var_110]
.text:00000000004023D3                 mov     rax, [rsp+298h+var_110]
.text:00000000004023E1                 add     [rsp+298h+var_90], rax
.text:00000000004023DB                 add     [rsp+298h+var_50], rax
.text:00000000004023E9                 cmp     [rsp+298h+var_D8], 3
.text:00000000004023E3                 cmp     [rsp+298h+var_98], 3
.text:00000000004023F1                 jz      short loc_4023FF
.text:00000000004023EB                 jz      short loc_4023F9
...
.text:00000000004023FF                 inc     [rsp+298h+var_D0]
.text:00000000004023F9                 inc     [rsp+298h+var_90]
.text:0000000000402407                 cmp     [rsp+298h+var_118], 3
.text:0000000000402401                 cmp     [rsp+298h+var_118], 3
.text:000000000040240F                 jz      short loc_40241D
.text:0000000000402409                 jz      short loc_402417
...
.text:000000000040241D                 inc     [rsp+298h+var_110]
.text:0000000000402417                 inc     [rsp+298h+var_110]
.text:0000000000402425                 cmp     [rsp+298h+var_118], 3
.text:000000000040241F                 cmp     [rsp+298h+var_118], 3
.text:000000000040242D                 jz      short loc_40243E
.text:0000000000402427                 jz      short loc_402438 
...
LOOP END
...

C++ code:
for(i=0L;i<0xFFFFFFL;i++)
{
	Temp+=i;
	Test++;
}

= ++ += ++ and < are overloaded functions..

So well .. must be the cache or the align..
Maybe I should flip my ifs .. it evertime jz.. hmmm


I hope I answered this mail in the correct way.. 
This is the 3. time using a mailing list.. :P

Luca.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Confusing optimization
  2010-05-10 14:27   ` Luca Béla Palkovics
@ 2010-05-10 16:41     ` Ian Lance Taylor
  2010-05-12 14:24       ` Luca Béla Palkovics
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Lance Taylor @ 2010-05-10 16:41 UTC (permalink / raw)
  To: Luca Béla Palkovics; +Cc: gcc-help

"Luca Béla Palkovics" <luca.bela.palkovics@gmail.com> writes:

> Am Sonntag, den 09.05.2010, 21:49 -0700 schrieb Ian Lance Taylor:
>> Some possibilities are:
>> 
>> 1) Measurement error.  Surprisingly often people are not measuring
>>    what they think they are measuring, and you didn't provide any
>>    details about how you got your timings.
> function is extern "C" and I messure the time outside.
>
> long start=GetTime(); //own function using gettimeofday (returns ms)
> A();
> long time=GetTime()-start;
> //Output time..
>
> So shold be correct..

Using gettimeofday means that the measurements are highly sensitive to
any other load on the machine.  To get measurements this way, you must
at the very least run the function many times in a loop, and you need
an outer loop to test first one function then the other.  Then you
need to average all the results.

If you are interested in how much CPU time the functions require,
using the clock function will be more accurate.  However, clock can be
misleading if the functions make any system calls.

Ian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Confusing optimization
  2010-05-10 16:41     ` Ian Lance Taylor
@ 2010-05-12 14:24       ` Luca Béla Palkovics
  0 siblings, 0 replies; 5+ messages in thread
From: Luca Béla Palkovics @ 2010-05-12 14:24 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

Am Montag, den 10.05.2010, 09:41 -0700 schrieb Ian Lance Taylor:
> If you are interested in how much CPU time the functions require,
> using the clock function will be more accurate.  However, clock can be
> misleading if the functions make any system calls.
Okay, I will checkout the clock function.
I have rewritten a lot of my code and the problem disappeared.

Thanks alot for the help/information.

Luca.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-05-12 14:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-09 19:25 Confusing optimization Luca Béla Palkovics
2010-05-10  4:49 ` Ian Lance Taylor
2010-05-10 14:27   ` Luca Béla Palkovics
2010-05-10 16:41     ` Ian Lance Taylor
2010-05-12 14:24       ` Luca Béla Palkovics

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).