public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/37312]  New: -Os significantly faster than -O2 on test case
@ 2008-09-01 11:22 andi-gcc at firstfloor dot org
  2008-09-01 11:23 ` [Bug tree-optimization/37312] " andi-gcc at firstfloor dot org
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: andi-gcc at firstfloor dot org @ 2008-09-01 11:22 UTC (permalink / raw)
  To: gcc-bugs

[component might be wrong]

The appended test case is significantly faster with -Os -funroll-all-loops
(~5%) versus -O2 -funroll-all-loops in gcc 4.4 ( gcc version 4.4.0 20080829;
that
is shortly after the IRA merge) on a Core2 (Merom) 

In earlier gcc versions they are about the same performance. The -Os
improvement
is against all earlier versions (good!) but it should be in -O2 too.

I tried -fno-tree-pre as it was suggested and it didn't make a difference.


-- 
           Summary: -Os significantly faster than -O2 on test case
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: andi-gcc at firstfloor dot org
  GCC host triplet: x86_64-linux
GCC target triplet: x86-64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/37312] -Os significantly faster than -O2 on test case
  2008-09-01 11:22 [Bug tree-optimization/37312] New: -Os significantly faster than -O2 on test case andi-gcc at firstfloor dot org
@ 2008-09-01 11:23 ` andi-gcc at firstfloor dot org
  2008-09-01 13:43 ` rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: andi-gcc at firstfloor dot org @ 2008-09-01 11:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from andi-gcc at firstfloor dot org  2008-09-01 11:22 -------
Created an attachment (id=16178)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16178&action=view)
test case

checksum functions extracted from the Linux kernel.

Not preprocessed, but should compile on any x86 ISO-C system


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/37312] -Os significantly faster than -O2 on test case
  2008-09-01 11:22 [Bug tree-optimization/37312] New: -Os significantly faster than -O2 on test case andi-gcc at firstfloor dot org
  2008-09-01 11:23 ` [Bug tree-optimization/37312] " andi-gcc at firstfloor dot org
@ 2008-09-01 13:43 ` rguenth at gcc dot gnu dot org
  2008-09-01 14:21 ` andi-gcc at firstfloor dot org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-09-01 13:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2008-09-01 13:42 -------
Uh, well.  The code ist mostly inline assembly which doesn't give GCC much
freedom to do something.  I guess -O2 simply optimizes "too much" around the
asm.  Try not using inline assembly instead.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/37312] -Os significantly faster than -O2 on test case
  2008-09-01 11:22 [Bug tree-optimization/37312] New: -Os significantly faster than -O2 on test case andi-gcc at firstfloor dot org
  2008-09-01 11:23 ` [Bug tree-optimization/37312] " andi-gcc at firstfloor dot org
  2008-09-01 13:43 ` rguenth at gcc dot gnu dot org
@ 2008-09-01 14:21 ` andi-gcc at firstfloor dot org
  2008-09-01 14:37 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: andi-gcc at firstfloor dot org @ 2008-09-01 14:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from andi-gcc at firstfloor dot org  2008-09-01 14:20 -------
Thanks for the us^whelpful comment. If you can suggest a way to do carry
preserving addition without inline assembler that would be fine, otherwise not.

-Os seems to do something that improves it at least (and that is new in 4.4,
4.3 didn't do that)

I suppose -O2 does something more that makes it then worse again. 

I merely filled it because I thought it would be interesting to fix that
something to not pessimize code.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/37312] -Os significantly faster than -O2 on test case
  2008-09-01 11:22 [Bug tree-optimization/37312] New: -Os significantly faster than -O2 on test case andi-gcc at firstfloor dot org
                   ` (2 preceding siblings ...)
  2008-09-01 14:21 ` andi-gcc at firstfloor dot org
@ 2008-09-01 14:37 ` rguenth at gcc dot gnu dot org
  2008-09-01 20:41   ` Andrew Thomas Pinski
  2008-09-01 20:42 ` [Bug tree-optimization/37312] -Os significantly faster than -O2 on test case wiht -funroll-all-loops pinskia at gmail dot com
  2008-09-02 20:46 ` pinskia at gcc dot gnu dot org
  5 siblings, 1 reply; 8+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-09-01 14:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from rguenth at gcc dot gnu dot org  2008-09-01 14:36 -------
Well, now -Os -funroll-all-loops doesn't do any unrolling anymore while it did
before.  With -O2 you get what you ask for - unrolled loops.

-funroll-all-loops isn't really a flag to be used in general.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug tree-optimization/37312] -Os significantly faster than -O2 on test case
  2008-09-01 14:37 ` rguenth at gcc dot gnu dot org
@ 2008-09-01 20:41   ` Andrew Thomas Pinski
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Thomas Pinski @ 2008-09-01 20:41 UTC (permalink / raw)
  To: gcc-bugzilla; +Cc: gcc-bugs

This is mostly because of extra register moves that IRA some times  
introduces. There is another bug about Inline-asm and the return  
register.

Sent from my iPhone

On Sep 1, 2008, at 7:36, "rguenth at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #4 from rguenth at gcc dot gnu dot org  2008-09-01  
> 14:36 -------
> Well, now -Os -funroll-all-loops doesn't do any unrolling anymore  
> while it did
> before.  With -O2 you get what you ask for - unrolled loops.
>
> -funroll-all-loops isn't really a flag to be used in general.
>
>
> -- 
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/37312] -Os significantly faster than -O2 on test case wiht -funroll-all-loops
  2008-09-01 11:22 [Bug tree-optimization/37312] New: -Os significantly faster than -O2 on test case andi-gcc at firstfloor dot org
                   ` (3 preceding siblings ...)
  2008-09-01 14:37 ` rguenth at gcc dot gnu dot org
@ 2008-09-01 20:42 ` pinskia at gmail dot com
  2008-09-02 20:46 ` pinskia at gcc dot gnu dot org
  5 siblings, 0 replies; 8+ messages in thread
From: pinskia at gmail dot com @ 2008-09-01 20:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from pinskia at gmail dot com  2008-09-01 20:41 -------
Subject: Re:  -Os significantly faster than -O2 on test case

This is mostly because of extra register moves that IRA some times  
introduces. There is another bug about Inline-asm and the return  
register.

Sent from my iPhone

On Sep 1, 2008, at 7:36, "rguenth at gcc dot gnu dot org"
<gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #4 from rguenth at gcc dot gnu dot org  2008-09-01  
> 14:36 -------
> Well, now -Os -funroll-all-loops doesn't do any unrolling anymore  
> while it did
> before.  With -O2 you get what you ask for - unrolled loops.
>
> -funroll-all-loops isn't really a flag to be used in general.
>
>
> -- 
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/37312] -Os significantly faster than -O2 on test case wiht -funroll-all-loops
  2008-09-01 11:22 [Bug tree-optimization/37312] New: -Os significantly faster than -O2 on test case andi-gcc at firstfloor dot org
                   ` (4 preceding siblings ...)
  2008-09-01 20:42 ` [Bug tree-optimization/37312] -Os significantly faster than -O2 on test case wiht -funroll-all-loops pinskia at gmail dot com
@ 2008-09-02 20:46 ` pinskia at gcc dot gnu dot org
  5 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-09-02 20:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from pinskia at gcc dot gnu dot org  2008-09-02 20:45 -------
The main difference between -O2 and -Os is that csum_partial is inlined for -Os
and unrolling is disabled for -Os.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-09-02 20:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-01 11:22 [Bug tree-optimization/37312] New: -Os significantly faster than -O2 on test case andi-gcc at firstfloor dot org
2008-09-01 11:23 ` [Bug tree-optimization/37312] " andi-gcc at firstfloor dot org
2008-09-01 13:43 ` rguenth at gcc dot gnu dot org
2008-09-01 14:21 ` andi-gcc at firstfloor dot org
2008-09-01 14:37 ` rguenth at gcc dot gnu dot org
2008-09-01 20:41   ` Andrew Thomas Pinski
2008-09-01 20:42 ` [Bug tree-optimization/37312] -Os significantly faster than -O2 on test case wiht -funroll-all-loops pinskia at gmail dot com
2008-09-02 20:46 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).