[Bug c/34027] New: [4.3 regression] -Os code size nearly doubled

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/34027]  New: [4.3 regression] -Os code size nearly doubled
@ 2007-11-08 11:39 bunk at stusta dot de
  2007-11-08 21:45 ` [Bug tree-optimization/34027] " rguenth at gcc dot gnu dot org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: bunk at stusta dot de @ 2007-11-08 11:39 UTC (permalink / raw)
  To: gcc-bugs

PR32044 makes it for months impossible to compile the Linux kernel with gcc
4.3, but I'll leave the discussions who's technically at fault (and whether gcc
4.3 will ever be able to compile the Linux kernel) to the people who know more
about such things.

But the fact that the code emitted with -Os used is nearly twice as big that's
a  regression in gcc.

Test case:

$ cat test.c
unsigned long long foobar(unsigned long long ns)
{
  while(ns >= 1000000000L)
    ns -= 1000000000L;
  return ns;
}
$ gcc --version
gcc (GCC) 4.2.3 20071014 (prerelease) (Debian 4.2.2-3)
...
$ gcc -Os test.c -c -o old-gcc.o
$ /usr/local/DIR/gcc-svn20071108/bin/gcc -Os test.c -c -o new-gcc.o
$ objdump -D old-gcc.o 

old-gcc.o:     file format elf32-i386

Disassembly of section .text:

00000000 <foobar>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 45 08                mov    0x8(%ebp),%eax
   6:   8b 55 0c                mov    0xc(%ebp),%edx
   9:   eb 08                   jmp    13 <foobar+0x13>
   b:   05 00 36 65 c4          add    $0xc4653600,%eax
  10:   83 d2 ff                adc    $0xffffffff,%edx
  13:   83 fa 00                cmp    $0x0,%edx
  16:   77 f3                   ja     b <foobar+0xb>
  18:   3d ff c9 9a 3b          cmp    $0x3b9ac9ff,%eax
  1d:   77 ec                   ja     b <foobar+0xb>
  1f:   5d                      pop    %ebp
  20:   c3                      ret    
Disassembly of section .comment:
...
$ objdump -D new-gcc.o 

new-gcc.o:     file format elf32-i386

Disassembly of section .text:

00000000 <foobar>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   57                      push   %edi
   4:   56                      push   %esi
   5:   53                      push   %ebx
   6:   bb 00 36 65 c4          mov    $0xc4653600,%ebx
   b:   83 ec 0c                sub    $0xc,%esp
   e:   8b 7d 0c                mov    0xc(%ebp),%edi
  11:   8b 75 08                mov    0x8(%ebp),%esi
  14:   6a 00                   push   $0x0
  16:   68 00 ca 9a 3b          push   $0x3b9aca00
  1b:   57                      push   %edi
  1c:   56                      push   %esi
  1d:   e8 fc ff ff ff          call   1e <foobar+0x1e>
  22:   83 c4 10                add    $0x10,%esp
  25:   69 ca 00 36 65 c4       imul   $0xc4653600,%edx,%ecx
  2b:   29 c1                   sub    %eax,%ecx
  2d:   f7 e3                   mul    %ebx
  2f:   01 f0                   add    %esi,%eax
  31:   8d 14 11                lea    (%ecx,%edx,1),%edx
  34:   11 fa                   adc    %edi,%edx
  36:   8d 65 f4                lea    -0xc(%ebp),%esp
  39:   5b                      pop    %ebx
  3a:   5e                      pop    %esi
  3b:   5f                      pop    %edi
  3c:   5d                      pop    %ebp
  3d:   c3                      ret    
Disassembly of section .comment:
...
$


-- 
           Summary: [4.3 regression] -Os code size nearly doubled
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: bunk at stusta dot de
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
  2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
@ 2007-11-08 21:45 ` rguenth at gcc dot gnu dot org
  2007-11-09 12:15 ` jakub at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-11-08 21:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2007-11-08 21:45 -------
Confirmed.

Also, on 64bit x86_64 we don't see that this computes the modulus, but do

foobar:
.LFB2:  
        movl    $1000000000, %esi
        movq    %rdi, %rax
        xorl    %edx, %edx
        divq    %rsi
        imulq   $-1000000000, %rax, %rax
        addq    %rdi, %rax
        ret

for

unsigned long long foobar(unsigned long long ns)
{
  return ns % 1000000000L;
}

we produce instead

foobar2:
.LFB3:  
        movl    $1000000000, %edx
        movq    %rdi, %rax
        movq    %rdx, %rcx
        xorl    %edx, %edx
        divq    %rcx
        movq    %rdx, %rax
        ret

which is smaller and faster.  Likewise the 32bit variant:

foobar2:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        pushl   $0
        pushl   $1000000000
        pushl   12(%ebp)
        pushl   8(%ebp)
        call    __umoddi3
        addl    $16, %esp
        leave
        ret

which would make this argument moot (ok, only by cheating ;)).  The problem
is supposedly that we don't fold

(chrec_apply
  (varying_loop = 1
)
  (chrec = {ns_2(D), +, 0x0ffffffffc4653600}_1)
  (x = ns_2(D) /[fl] 1000000000)
  (res = ns_2(D) + (ns_2(D) /[fl] 1000000000) * 0x0ffffffffc4653600))

which is ns_2 - (ns_2 / 1000000000) * 1000000000.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org
  BugsThisDependsOn|                            |32044
             Status|UNCONFIRMED                 |NEW
          Component|c                           |tree-optimization
     Ever Confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2007-11-08 21:45:30
               date|                            |
   Target Milestone|---                         |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
  2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
  2007-11-08 21:45 ` [Bug tree-optimization/34027] " rguenth at gcc dot gnu dot org
@ 2007-11-09 12:15 ` jakub at gcc dot gnu dot org
  2007-11-09 12:20 ` rguenther at suse dot de
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-09 12:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from jakub at gcc dot gnu dot org  2007-11-09 12:15 -------
I think whether the modulus will be bigger or smaller is terribly hard to
estimate.  Really, if you file -Os regressions, you should at least compile the
whole kernel and compare whether the resulting sizes, rather than cherry
picking one example.  E.g. on ppc64 computing modulus rather than doing the
loop
is definitely much shorter.
IMHO if the kernel wants to avoid using modulus, it should just say so
unsigned long long foobar(unsigned long long ns)
{
  while(ns >= 1000000000L) {
    ns -= 1000000000L;
    asm ("" : "=r" (ns) : "0" (ns));
  }
  return ns;
}
will do that just fine.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
  2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
  2007-11-08 21:45 ` [Bug tree-optimization/34027] " rguenth at gcc dot gnu dot org
  2007-11-09 12:15 ` jakub at gcc dot gnu dot org
@ 2007-11-09 12:20 ` rguenther at suse dot de
  2007-11-09 12:30 ` jakub at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenther at suse dot de @ 2007-11-09 12:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenther at suse dot de  2007-11-09 12:20 -------
Subject: Re:  [4.3 regression] -Os code size
 nearly doubled

On Fri, 9 Nov 2007, jakub at gcc dot gnu dot org wrote:

> ------- Comment #2 from jakub at gcc dot gnu dot org  2007-11-09 12:15 -------
> I think whether the modulus will be bigger or smaller is terribly hard to
> estimate.  Really, if you file -Os regressions, you should at least compile the
> whole kernel and compare whether the resulting sizes, rather than cherry
> picking one example.  E.g. on ppc64 computing modulus rather than doing the
> loop
> is definitely much shorter.
> IMHO if the kernel wants to avoid using modulus, it should just say so
> unsigned long long foobar(unsigned long long ns)
> {
>   while(ns >= 1000000000L) {
>     ns -= 1000000000L;
>     asm ("" : "=r" (ns) : "0" (ns));
>   }
>   return ns;
> }
> will do that just fine.

Yes, just that at the moment we don't procude the modulus but use
a division, a multiplication and a subtraction.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
  2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
                   ` (2 preceding siblings ...)
  2007-11-09 12:20 ` rguenther at suse dot de
@ 2007-11-09 12:30 ` jakub at gcc dot gnu dot org
  2007-11-09 12:37 ` rguenther at suse dot de
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-09 12:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from jakub at gcc dot gnu dot org  2007-11-09 12:30 -------
So then shouldn't this bug be about:
unsigned long long
foo (unsigned long long ns)
{
  return ns % 1000000000L;
}

unsigned long long
bar (unsigned long long ns)
{
  return ns - (ns / 1000000000L) * 1000000000L;
}

not compiling the same code at -Os?  On x86_64 with -O2 it actually produces
identical code with the subtraction, supposedly that's faster.  Guess even
(ns / 1000000000L) * 1000000000L should be folded into
ns - (ns % 1000000000L).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
  2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
                   ` (3 preceding siblings ...)
  2007-11-09 12:30 ` jakub at gcc dot gnu dot org
@ 2007-11-09 12:37 ` rguenther at suse dot de
  2007-11-10  7:58 ` bunk at stusta dot de
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenther at suse dot de @ 2007-11-09 12:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rguenther at suse dot de  2007-11-09 12:37 -------
Subject: Re:  [4.3 regression] -Os code size
 nearly doubled

On Fri, 9 Nov 2007, jakub at gcc dot gnu dot org wrote:

> ------- Comment #4 from jakub at gcc dot gnu dot org  2007-11-09 12:30 -------
> So then shouldn't this bug be about:
> unsigned long long
> foo (unsigned long long ns)
> {
>   return ns % 1000000000L;
> }
> 
> unsigned long long
> bar (unsigned long long ns)
> {
>   return ns - (ns / 1000000000L) * 1000000000L;
> }
> 
> not compiling the same code at -Os?  On x86_64 with -O2 it actually produces
> identical code with the subtraction, supposedly that's faster.  Guess even
> (ns / 1000000000L) * 1000000000L should be folded into
> ns - (ns % 1000000000L).

With -O2 we express the division by the constant by multiplication / add
sequences.  But for both we get the extra multiplication:

bar:
.LFB3:
        movl    $1000000000, %esi
        movq    %rdi, %rax
        xorl    %edx, %edx
        divq    %rsi
        movq    %rdi, %rcx
        imulq   $1000000000, %rax, %rdx
        subq    %rdx, %rcx
        movq    %rcx, %rax
        ret

bar:
.LFB3:
        movq    %rdi, %rdx
        movabsq $19342813113834067, %rax
        shrq    $9, %rdx
        mulq    %rdx 
        shrq    $11, %rdx
        imulq   $1000000000, %rdx, %rdx
        subq    %rdx, %rdi
        movq    %rdi, %rax
        ret

because we miss this folding opportunity.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
  2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
                   ` (4 preceding siblings ...)
  2007-11-09 12:37 ` rguenther at suse dot de
@ 2007-11-10  7:58 ` bunk at stusta dot de
  2007-11-10 23:54 ` rguenth at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bunk at stusta dot de @ 2007-11-10  7:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from bunk at stusta dot de  2007-11-10 07:58 -------
I remove the dependency on PR32044:

This bug is really just something I observed by chance when looking at the
kernel compilation problem, but unless I completely misunderstood your comments
here whatever is required to fix this issue does not depend on PR32044 being
fixed.

Also the other way round __umoddi3 wouldn't be better than __udivdi3 for the
kernel although it mightbe what gets emitted after this bug gets fixed.


-- 

bunk at stusta dot de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|32044                       |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
  2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
                   ` (5 preceding siblings ...)
  2007-11-10  7:58 ` bunk at stusta dot de
@ 2007-11-10 23:54 ` rguenth at gcc dot gnu dot org
  2007-11-12 13:24 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-11-10 23:54 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from rguenth at gcc dot gnu dot org  2007-11-10 23:54 -------
I have a patch.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |rguenth at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2007-11-08 21:45:30         |2007-11-10 23:54:00
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
  2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
                   ` (6 preceding siblings ...)
  2007-11-10 23:54 ` rguenth at gcc dot gnu dot org
@ 2007-11-12 13:24 ` rguenth at gcc dot gnu dot org
  2007-11-12 13:28 ` rguenth at gcc dot gnu dot org
  2007-11-12 15:01 ` rguenth at gcc dot gnu dot org
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-11-12 13:24 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from rguenth at gcc dot gnu dot org  2007-11-12 13:24 -------
Subject: Bug 34027

Author: rguenth
Date: Mon Nov 12 13:24:06 2007
New Revision: 130097

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=130097
Log:
2007-11-12  Richard Guenther  <rguenther@suse.de>

        PR middle-end/34027
        * fold-const.c (fold_binary): Fold n - (n / m) * m to n % m.
        (fold_binary): Fold unsinged FLOOR_DIV_EXPR to TRUNC_DIV_EXPR.

        * gcc.dg/pr34027-1.c: New testcase.
        * gcc.dg/pr34027-2.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/pr34027-1.c
    trunk/gcc/testsuite/gcc.dg/pr34027-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c
    trunk/gcc/testsuite/ChangeLog


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
  2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
                   ` (7 preceding siblings ...)
  2007-11-12 13:24 ` rguenth at gcc dot gnu dot org
@ 2007-11-12 13:28 ` rguenth at gcc dot gnu dot org
  2007-11-12 15:01 ` rguenth at gcc dot gnu dot org
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-11-12 13:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from rguenth at gcc dot gnu dot org  2007-11-12 13:28 -------
We now generate with -Os -m32

foobar:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        pushl   $0
        pushl   $1000000000
        pushl   12(%ebp)
        pushl   8(%ebp)
        call    __umoddi3
        addl    $16, %esp
        leave
        ret

and with -O2 -m32:

foobar:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        movl    12(%ebp), %edx
        movl    8(%ebp), %eax
        cmpl    $0, %edx
        ja      .L5
        cmpl    $999999999, %eax
        ja      .L5
        leave
        ret

which for -Os is smaller than what we generated with 4.2 (and for -O2 it
is slightly larger).

So, fixed.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
  2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
                   ` (8 preceding siblings ...)
  2007-11-12 13:28 ` rguenth at gcc dot gnu dot org
@ 2007-11-12 15:01 ` rguenth at gcc dot gnu dot org
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-11-12 15:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from rguenth at gcc dot gnu dot org  2007-11-12 15:01 -------
of course in the -O2 case I forgot to paste the rest of the fn in, here it is:

        .p2align 4,,7
        .p2align 3
.L5:
        addl    $-1000000000, %eax
        adcl    $-1, %edx
        movl    $1000000000, 8(%esp)
        movl    $0, 12(%esp)
        movl    %eax, (%esp)
        movl    %edx, 4(%esp)
        call    __umoddi3
        leave
        ret

so unfortunately the expanders don't deal with modulus the same way as with
the div/mul sequence.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-11-12 15:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
2007-11-08 21:45 ` [Bug tree-optimization/34027] " rguenth at gcc dot gnu dot org
2007-11-09 12:15 ` jakub at gcc dot gnu dot org
2007-11-09 12:20 ` rguenther at suse dot de
2007-11-09 12:30 ` jakub at gcc dot gnu dot org
2007-11-09 12:37 ` rguenther at suse dot de
2007-11-10  7:58 ` bunk at stusta dot de
2007-11-10 23:54 ` rguenth at gcc dot gnu dot org
2007-11-12 13:24 ` rguenth at gcc dot gnu dot org
2007-11-12 13:28 ` rguenth at gcc dot gnu dot org
2007-11-12 15:01 ` rguenth at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).