* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
@ 2007-11-08 21:45 ` rguenth at gcc dot gnu dot org
2007-11-09 12:15 ` jakub at gcc dot gnu dot org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-11-08 21:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2007-11-08 21:45 -------
Confirmed.
Also, on 64bit x86_64 we don't see that this computes the modulus, but do
foobar:
.LFB2:
movl $1000000000, %esi
movq %rdi, %rax
xorl %edx, %edx
divq %rsi
imulq $-1000000000, %rax, %rax
addq %rdi, %rax
ret
for
unsigned long long foobar(unsigned long long ns)
{
return ns % 1000000000L;
}
we produce instead
foobar2:
.LFB3:
movl $1000000000, %edx
movq %rdi, %rax
movq %rdx, %rcx
xorl %edx, %edx
divq %rcx
movq %rdx, %rax
ret
which is smaller and faster. Likewise the 32bit variant:
foobar2:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
pushl $0
pushl $1000000000
pushl 12(%ebp)
pushl 8(%ebp)
call __umoddi3
addl $16, %esp
leave
ret
which would make this argument moot (ok, only by cheating ;)). The problem
is supposedly that we don't fold
(chrec_apply
(varying_loop = 1
)
(chrec = {ns_2(D), +, 0x0ffffffffc4653600}_1)
(x = ns_2(D) /[fl] 1000000000)
(res = ns_2(D) + (ns_2(D) /[fl] 1000000000) * 0x0ffffffffc4653600))
which is ns_2 - (ns_2 / 1000000000) * 1000000000.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu dot
| |org
BugsThisDependsOn| |32044
Status|UNCONFIRMED |NEW
Component|c |tree-optimization
Ever Confirmed|0 |1
Keywords| |missed-optimization
Last reconfirmed|0000-00-00 00:00:00 |2007-11-08 21:45:30
date| |
Target Milestone|--- |4.3.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
2007-11-08 21:45 ` [Bug tree-optimization/34027] " rguenth at gcc dot gnu dot org
@ 2007-11-09 12:15 ` jakub at gcc dot gnu dot org
2007-11-09 12:20 ` rguenther at suse dot de
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-09 12:15 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from jakub at gcc dot gnu dot org 2007-11-09 12:15 -------
I think whether the modulus will be bigger or smaller is terribly hard to
estimate. Really, if you file -Os regressions, you should at least compile the
whole kernel and compare whether the resulting sizes, rather than cherry
picking one example. E.g. on ppc64 computing modulus rather than doing the
loop
is definitely much shorter.
IMHO if the kernel wants to avoid using modulus, it should just say so
unsigned long long foobar(unsigned long long ns)
{
while(ns >= 1000000000L) {
ns -= 1000000000L;
asm ("" : "=r" (ns) : "0" (ns));
}
return ns;
}
will do that just fine.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
2007-11-08 21:45 ` [Bug tree-optimization/34027] " rguenth at gcc dot gnu dot org
2007-11-09 12:15 ` jakub at gcc dot gnu dot org
@ 2007-11-09 12:20 ` rguenther at suse dot de
2007-11-09 12:30 ` jakub at gcc dot gnu dot org
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenther at suse dot de @ 2007-11-09 12:20 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from rguenther at suse dot de 2007-11-09 12:20 -------
Subject: Re: [4.3 regression] -Os code size
nearly doubled
On Fri, 9 Nov 2007, jakub at gcc dot gnu dot org wrote:
> ------- Comment #2 from jakub at gcc dot gnu dot org 2007-11-09 12:15 -------
> I think whether the modulus will be bigger or smaller is terribly hard to
> estimate. Really, if you file -Os regressions, you should at least compile the
> whole kernel and compare whether the resulting sizes, rather than cherry
> picking one example. E.g. on ppc64 computing modulus rather than doing the
> loop
> is definitely much shorter.
> IMHO if the kernel wants to avoid using modulus, it should just say so
> unsigned long long foobar(unsigned long long ns)
> {
> while(ns >= 1000000000L) {
> ns -= 1000000000L;
> asm ("" : "=r" (ns) : "0" (ns));
> }
> return ns;
> }
> will do that just fine.
Yes, just that at the moment we don't procude the modulus but use
a division, a multiplication and a subtraction.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
` (2 preceding siblings ...)
2007-11-09 12:20 ` rguenther at suse dot de
@ 2007-11-09 12:30 ` jakub at gcc dot gnu dot org
2007-11-09 12:37 ` rguenther at suse dot de
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-09 12:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from jakub at gcc dot gnu dot org 2007-11-09 12:30 -------
So then shouldn't this bug be about:
unsigned long long
foo (unsigned long long ns)
{
return ns % 1000000000L;
}
unsigned long long
bar (unsigned long long ns)
{
return ns - (ns / 1000000000L) * 1000000000L;
}
not compiling the same code at -Os? On x86_64 with -O2 it actually produces
identical code with the subtraction, supposedly that's faster. Guess even
(ns / 1000000000L) * 1000000000L should be folded into
ns - (ns % 1000000000L).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
` (3 preceding siblings ...)
2007-11-09 12:30 ` jakub at gcc dot gnu dot org
@ 2007-11-09 12:37 ` rguenther at suse dot de
2007-11-10 7:58 ` bunk at stusta dot de
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenther at suse dot de @ 2007-11-09 12:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from rguenther at suse dot de 2007-11-09 12:37 -------
Subject: Re: [4.3 regression] -Os code size
nearly doubled
On Fri, 9 Nov 2007, jakub at gcc dot gnu dot org wrote:
> ------- Comment #4 from jakub at gcc dot gnu dot org 2007-11-09 12:30 -------
> So then shouldn't this bug be about:
> unsigned long long
> foo (unsigned long long ns)
> {
> return ns % 1000000000L;
> }
>
> unsigned long long
> bar (unsigned long long ns)
> {
> return ns - (ns / 1000000000L) * 1000000000L;
> }
>
> not compiling the same code at -Os? On x86_64 with -O2 it actually produces
> identical code with the subtraction, supposedly that's faster. Guess even
> (ns / 1000000000L) * 1000000000L should be folded into
> ns - (ns % 1000000000L).
With -O2 we express the division by the constant by multiplication / add
sequences. But for both we get the extra multiplication:
bar:
.LFB3:
movl $1000000000, %esi
movq %rdi, %rax
xorl %edx, %edx
divq %rsi
movq %rdi, %rcx
imulq $1000000000, %rax, %rdx
subq %rdx, %rcx
movq %rcx, %rax
ret
bar:
.LFB3:
movq %rdi, %rdx
movabsq $19342813113834067, %rax
shrq $9, %rdx
mulq %rdx
shrq $11, %rdx
imulq $1000000000, %rdx, %rdx
subq %rdx, %rdi
movq %rdi, %rax
ret
because we miss this folding opportunity.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
` (4 preceding siblings ...)
2007-11-09 12:37 ` rguenther at suse dot de
@ 2007-11-10 7:58 ` bunk at stusta dot de
2007-11-10 23:54 ` rguenth at gcc dot gnu dot org
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: bunk at stusta dot de @ 2007-11-10 7:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from bunk at stusta dot de 2007-11-10 07:58 -------
I remove the dependency on PR32044:
This bug is really just something I observed by chance when looking at the
kernel compilation problem, but unless I completely misunderstood your comments
here whatever is required to fix this issue does not depend on PR32044 being
fixed.
Also the other way round __umoddi3 wouldn't be better than __udivdi3 for the
kernel although it mightbe what gets emitted after this bug gets fixed.
--
bunk at stusta dot de changed:
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn|32044 |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
` (5 preceding siblings ...)
2007-11-10 7:58 ` bunk at stusta dot de
@ 2007-11-10 23:54 ` rguenth at gcc dot gnu dot org
2007-11-12 13:24 ` rguenth at gcc dot gnu dot org
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-11-10 23:54 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from rguenth at gcc dot gnu dot org 2007-11-10 23:54 -------
I have a patch.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |rguenth at gcc dot gnu dot
|dot org |org
Status|NEW |ASSIGNED
Last reconfirmed|2007-11-08 21:45:30 |2007-11-10 23:54:00
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
` (6 preceding siblings ...)
2007-11-10 23:54 ` rguenth at gcc dot gnu dot org
@ 2007-11-12 13:24 ` rguenth at gcc dot gnu dot org
2007-11-12 13:28 ` rguenth at gcc dot gnu dot org
2007-11-12 15:01 ` rguenth at gcc dot gnu dot org
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-11-12 13:24 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from rguenth at gcc dot gnu dot org 2007-11-12 13:24 -------
Subject: Bug 34027
Author: rguenth
Date: Mon Nov 12 13:24:06 2007
New Revision: 130097
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=130097
Log:
2007-11-12 Richard Guenther <rguenther@suse.de>
PR middle-end/34027
* fold-const.c (fold_binary): Fold n - (n / m) * m to n % m.
(fold_binary): Fold unsinged FLOOR_DIV_EXPR to TRUNC_DIV_EXPR.
* gcc.dg/pr34027-1.c: New testcase.
* gcc.dg/pr34027-2.c: Likewise.
Added:
trunk/gcc/testsuite/gcc.dg/pr34027-1.c
trunk/gcc/testsuite/gcc.dg/pr34027-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/fold-const.c
trunk/gcc/testsuite/ChangeLog
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
` (7 preceding siblings ...)
2007-11-12 13:24 ` rguenth at gcc dot gnu dot org
@ 2007-11-12 13:28 ` rguenth at gcc dot gnu dot org
2007-11-12 15:01 ` rguenth at gcc dot gnu dot org
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-11-12 13:28 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from rguenth at gcc dot gnu dot org 2007-11-12 13:28 -------
We now generate with -Os -m32
foobar:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
pushl $0
pushl $1000000000
pushl 12(%ebp)
pushl 8(%ebp)
call __umoddi3
addl $16, %esp
leave
ret
and with -O2 -m32:
foobar:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl 12(%ebp), %edx
movl 8(%ebp), %eax
cmpl $0, %edx
ja .L5
cmpl $999999999, %eax
ja .L5
leave
ret
which for -Os is smaller than what we generated with 4.2 (and for -O2 it
is slightly larger).
So, fixed.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/34027] [4.3 regression] -Os code size nearly doubled
2007-11-08 11:39 [Bug c/34027] New: [4.3 regression] -Os code size nearly doubled bunk at stusta dot de
` (8 preceding siblings ...)
2007-11-12 13:28 ` rguenth at gcc dot gnu dot org
@ 2007-11-12 15:01 ` rguenth at gcc dot gnu dot org
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-11-12 15:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from rguenth at gcc dot gnu dot org 2007-11-12 15:01 -------
of course in the -O2 case I forgot to paste the rest of the fn in, here it is:
.p2align 4,,7
.p2align 3
.L5:
addl $-1000000000, %eax
adcl $-1, %edx
movl $1000000000, 8(%esp)
movl $0, 12(%esp)
movl %eax, (%esp)
movl %edx, 4(%esp)
call __umoddi3
leave
ret
so unfortunately the expanders don't deal with modulus the same way as with
the div/mul sequence.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34027
^ permalink raw reply [flat|nested] 11+ messages in thread