public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size.
@ 2007-01-02 22:09 vda dot linux at googlemail dot com
2007-01-02 22:45 ` [Bug target/30354] " pinskia at gcc dot gnu dot org
` (13 more replies)
0 siblings, 14 replies; 18+ messages in thread
From: vda dot linux at googlemail dot com @ 2007-01-02 22:09 UTC (permalink / raw)
To: gcc-bugs
gcc -O2 usually optimizes a/CONST. In many cases code is not only significantly
faster, but also smaller:
unsigned f(unsigned a) { return a/10; }
gcc 4.1.1 -O2:
movl $-858993459, %eax
mull 4(%esp)
shrl $3, %edx
movl %edx, %eax
ret
gcc 4.1.1 -Os:
movl 4(%esp), %eax
movl $10, %edx
movl %edx, %ecx
xorl %edx, %edx
divl %ecx
ret
Unfortunately, gcc -S never uses this optimization.
Note that with code proposed in bug 28417 a/CONST can be optimized even further
(we can use smaller mul constant and avoid shrl) when we know from VRP that
value of a is always small enough.
--
Summary: -Os doesn't optimize a/CONST even if it saves size.
Product: gcc
Version: 4.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: vda dot linux at googlemail dot com
GCC build triplet: i386-pc-linux-gnu
GCC host triplet: i386-pc-linux-gnu
GCC target triplet: i386-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
@ 2007-01-02 22:45 ` pinskia at gcc dot gnu dot org
2007-07-25 15:05 ` vda dot linux at googlemail dot com
` (12 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-01-02 22:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from pinskia at gcc dot gnu dot org 2007-01-02 22:45 -------
I think this is really a cost issue with the x86 back-end, rather than with the
middle-end.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|rtl-optimization |target
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
2007-01-02 22:45 ` [Bug target/30354] " pinskia at gcc dot gnu dot org
@ 2007-07-25 15:05 ` vda dot linux at googlemail dot com
2007-07-25 15:09 ` vda dot linux at googlemail dot com
` (11 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: vda dot linux at googlemail dot com @ 2007-07-25 15:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from vda dot linux at googlemail dot com 2007-07-25 15:05 -------
Created an attachment (id=13973)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13973&action=view)
Fix: adjust div cost for -Os on i386
Patch was tested with 4.2.1, I guess it will apply to other versions of gcc as
it is quite trivial.
Test program with lots or randomly-generated constant divisions and %'s was
compiled by patched gcc with different costs of division:
text data bss dec hex filename
257731 0 0 257731 3eec3 421.org/t-Os.o
256788 0 0 256788 3eb14 421.7/t-Os.o
256788 0 0 256788 3eb14 421.8/t-Os.o
257377 0 0 257377 3ed61 421.9/t-Os.o
257825 0 0 257825 3ef21 421.10/t-Os.o
Seems like (at least on 4.2.1) cost of 8 is giving smallest code.
Among 15000 divisions in test program, only signed divisions by power-of-two
grew in size (but they become MUCH faster, as they don't even use multiply now,
let alone div):
@@ -1703 +1703 @@
-0000000f T id_x_16
+00000012 T id_x_16
@@ -1836 +1836 @@
-0000000f T id_x_2
+0000000e T id_x_2
@@ -2030 +2030 @@
-0000000f T id_x_32
+00000012 T id_x_32
id_x_16 was:
movl 4(%esp), %eax
movl $16, %edx
movl %edx, %ecx
cltd
idivl %ecx
ret
Now it is:
movl 4(%esp), %edx
movl %edx, %eax
sarl $31, %eax
andl $15, %eax
addl %edx, %eax
sarl $4, %eax
ret
and also unsigned_x / 28, unsigned_x / 13952, unsigned_x / 56 grew by 1 byte.
The rest either were not changed (and still use div insn) or shrank (typically
by 1 byte, record holders are "unsigned_x / 641" and "unsigned_x / 6700417" -
shrank by 4 bytes).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
2007-01-02 22:45 ` [Bug target/30354] " pinskia at gcc dot gnu dot org
2007-07-25 15:05 ` vda dot linux at googlemail dot com
@ 2007-07-25 15:09 ` vda dot linux at googlemail dot com
2007-07-25 15:17 ` vda dot linux at googlemail dot com
` (10 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: vda dot linux at googlemail dot com @ 2007-07-25 15:09 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from vda dot linux at googlemail dot com 2007-07-25 15:09 -------
Created an attachment (id=13974)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13974&action=view)
Auto-generated test program with 15000 constant divs/mods
Test program, bzipped.
Build with
gcc -fomit-frame-pointer -Os t.c
and see sizes:
nm --size-sort t-Os.o | sort -k3
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
` (2 preceding siblings ...)
2007-07-25 15:09 ` vda dot linux at googlemail dot com
@ 2007-07-25 15:17 ` vda dot linux at googlemail dot com
2007-07-25 15:22 ` vda dot linux at googlemail dot com
` (9 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: vda dot linux at googlemail dot com @ 2007-07-25 15:17 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from vda dot linux at googlemail dot com 2007-07-25 15:17 -------
Created an attachment (id=13975)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13975&action=view)
Test program generator
Test program was generated using gen_test.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
` (3 preceding siblings ...)
2007-07-25 15:17 ` vda dot linux at googlemail dot com
@ 2007-07-25 15:22 ` vda dot linux at googlemail dot com
2009-06-05 16:19 ` aldot at gcc dot gnu dot org
` (8 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: vda dot linux at googlemail dot com @ 2007-07-25 15:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from vda dot linux at googlemail dot com 2007-07-25 15:22 -------
Forgot to mention:
* generator tests signed and unsigned divisions and modulus, both const / x and
x / const, and also tests u = a / b; v = a % b; construct.
* you need to filter gen_test output to weed out dups:
./gen_test | sort | uniq >t.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
` (4 preceding siblings ...)
2007-07-25 15:22 ` vda dot linux at googlemail dot com
@ 2009-06-05 16:19 ` aldot at gcc dot gnu dot org
2009-06-06 13:41 ` hubicka at gcc dot gnu dot org
` (7 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: aldot at gcc dot gnu dot org @ 2009-06-05 16:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from aldot at gcc dot gnu dot org 2009-06-05 16:19 -------
CC'ing honza as i386 maintainer
--
aldot at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |aldot at gcc dot gnu dot
| |org, hubicka at gcc dot gnu
| |dot org
OtherBugsDependingO| |37515
nThis| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
` (5 preceding siblings ...)
2009-06-05 16:19 ` aldot at gcc dot gnu dot org
@ 2009-06-06 13:41 ` hubicka at gcc dot gnu dot org
2009-06-21 16:11 ` vda dot linux at googlemail dot com
` (6 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2009-06-06 13:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from hubicka at gcc dot gnu dot org 2009-06-06 13:41 -------
It seems to make sense to bump cost of idiv a bit, given the fact that there
are register pressure implications.
I would like to however understand what code sequences we produce that are
estimated to be long but ends up being shorter in practice. Would be possible
to try to give me some examples of constants where it is important to bump cost
to 8? It is possible we can simply fix cost estimation in divmod expansion
instead.
Honza
--
hubicka at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2009-06-06 13:41:23
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
` (6 preceding siblings ...)
2009-06-06 13:41 ` hubicka at gcc dot gnu dot org
@ 2009-06-21 16:11 ` vda dot linux at googlemail dot com
2009-06-21 16:12 ` vda dot linux at googlemail dot com
` (5 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: vda dot linux at googlemail dot com @ 2009-06-21 16:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from vda dot linux at googlemail dot com 2009-06-21 16:11 -------
(In reply to comment #7)
> It seems to make sense to bump cost of idiv a bit, given the fact that there
> are register pressure implications.
>
> I would like to however understand what code sequences we produce that are
> estimated to be long but ends up being shorter in practice. Would be possible
> to try to give me some examples of constants where it is important to bump cost
> to 8? It is possible we can simply fix cost estimation in divmod expansion
> instead.
Attached t.c.bz2 is a good source file to experiment with.
With last month's svn snapshot of gcc, I did the following:
/usr/app/gcc-4.4.svn.20090528/bin/gcc -g0 -Os -fomit-frame-pointer
-ffunction-sections -c t.c
objdump -dr t.o >t.asm
with and without the patch, and compared results. (-ffunction-sections are used
merely because they make "objdump -dr" output much more suitable for diffing).
Here is the diff between unpatched and patched gcc's code generated for int_x /
16:
Disassembly of section .text.id_x_16:
0000000000000000 <id_x_16>:
- 0: 89 f8 mov %edi,%eax
- 2: ba 10 00 00 00 mov $0x10,%edx
- 7: 89 d1 mov %edx,%ecx
- 9: 99 cltd
- a: f7 f9 idiv %ecx
- c: c3 retq
+ 0: 8d 47 0f lea 0xf(%rdi),%eax
+ 3: 85 ff test %edi,%edi
+ 5: 0f 49 c7 cmovns %edi,%eax
+ 8: c1 f8 04 sar $0x4,%eax
+ b: c3 retq
int_x / 2:
Disassembly of section .text.id_x_2:
0000000000000000 <id_x_2>:
0: 89 f8 mov %edi,%eax
- 2: ba 02 00 00 00 mov $0x2,%edx
- 7: 89 d1 mov %edx,%ecx
- 9: 99 cltd
- a: f7 f9 idiv %ecx
- c: c3 retq
+ 2: c1 e8 1f shr $0x1f,%eax
+ 5: 01 f8 add %edi,%eax
+ 7: d1 f8 sar %eax
+ 9: c3 retq
As you can see, code become smaller and *much* faster (not even mul insn is
used now).
Here is an example of unsigned_x / 641. In this case, code size is the same,
but the code is faster:
Disassembly of section .text.ud_x_641:
0000000000000000 <ud_x_641>:
- 0: ba 81 02 00 00 mov $0x281,%edx
- 5: 89 f8 mov %edi,%eax
- 7: 89 d1 mov %edx,%ecx
- 9: 31 d2 xor %edx,%edx
- b: f7 f1 div %ecx
+ 0: 89 f8 mov %edi,%eax
+ 2: 48 69 c0 81 3d 66 00 imul $0x663d81,%rax,%rax
+ 9: 48 c1 e8 20 shr $0x20,%rax
d: c3 retq
There is not a single instance of code growth. Either newer gcc is better or
maybe code growth cases are in 32-bit code only.
I will attach t64.asm.diff, take a look if you want to see all changes in
generated code.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
` (7 preceding siblings ...)
2009-06-21 16:11 ` vda dot linux at googlemail dot com
@ 2009-06-21 16:12 ` vda dot linux at googlemail dot com
2009-06-21 16:26 ` rguenth at gcc dot gnu dot org
` (4 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: vda dot linux at googlemail dot com @ 2009-06-21 16:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from vda dot linux at googlemail dot com 2009-06-21 16:12 -------
Created an attachment (id=18040)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18040&action=view)
Comparison of generated code with 4.4.svn.20090528 on x86_64
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
` (8 preceding siblings ...)
2009-06-21 16:12 ` vda dot linux at googlemail dot com
@ 2009-06-21 16:26 ` rguenth at gcc dot gnu dot org
2009-06-21 16:47 ` vda dot linux at googlemail dot com
` (3 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-06-21 16:26 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from rguenth at gcc dot gnu dot org 2009-06-21 16:25 -------
Do we have correct size estimates on idiv with a constant argument at all?
I don't see length attributes on it ...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
` (9 preceding siblings ...)
2009-06-21 16:26 ` rguenth at gcc dot gnu dot org
@ 2009-06-21 16:47 ` vda dot linux at googlemail dot com
2009-06-21 16:48 ` vda dot linux at googlemail dot com
` (2 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: vda dot linux at googlemail dot com @ 2009-06-21 16:47 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from vda dot linux at googlemail dot com 2009-06-21 16:47 -------
In 32-bit code, there are indeed a few cases of code growth.
Here is a full list (id_XXX are signed divides, ud_XXX are unsigned ones):
-00000000 0000000f T id_x_4
+00000000 00000012 T id_x_4
-00000000 0000000f T id_x_8
+00000000 00000012 T id_x_8
-00000000 0000000f T id_x_16
+00000000 00000012 T id_x_16
-00000000 0000000f T id_x_32
+00000000 00000012 T id_x_32
-00000000 00000010 T ud_x_28
+00000000 00000015 T ud_x_28
-00000000 00000010 T ud_x_56
+00000000 00000015 T ud_x_56
-00000000 00000010 T ud_x_13952
+00000000 00000015 T ud_x_13952
They fall into two groups. Signed divisions by power-of-2 grew by 3 bytes but
they are *much faster* now, and considering how often people type "x / 4" and
think "this will be optimized to shift", forgetting that their x is signed, and
they therefore will have divide insn (!), I see it as a good trade. Code
comparison:
00000000 <id_x_16>:
- 0: 8b 44 24 04 mov 0x4(%esp),%eax
- 4: ba 10 00 00 00 mov $0x10,%edx
- 9: 89 d1 mov %edx,%ecx
- b: 99 cltd
- c: f7 f9 idiv %ecx
- e: c3 ret
+ 0: 8b 54 24 04 mov 0x4(%esp),%edx
+ 4: 89 d0 mov %edx,%eax
+ 6: c1 f8 1f sar $0x1f,%eax
+ 9: 83 e0 0f and $0xf,%eax
+ c: 01 d0 add %edx,%eax
+ e: c1 f8 04 sar $0x4,%eax
+ 11: c3 ret
The second group is just a few rare cases where "multiple by reciprocal"
optimization happens to require more processing and code is 5 bytes longer:
00000000 <ud_x_56>:
- 0: 8b 44 24 04 mov 0x4(%esp),%eax
- 4: ba 38 00 00 00 mov $0x38,%edx
- 9: 89 d1 mov %edx,%ecx
- b: 31 d2 xor %edx,%edx
- d: f7 f1 div %ecx
- f: c3 ret
+ 0: 53 push %ebx
+ 1: 8b 4c 24 08 mov 0x8(%esp),%ecx
+ 5: bb 25 49 92 24 mov $0x24924925,%ebx
+ a: c1 e9 03 shr $0x3,%ecx
+ d: 89 c8 mov %ecx,%eax
+ f: f7 e3 mul %ebx
+ 11: 5b pop %ebx
+ 12: 89 d0 mov %edx,%eax
+ 14: c3 ret
This is rare - only three cases in entire t.c.bz2
They are far outweighted by 474 cases where code got smaller.
Most of them are saving only one byte. For example, unsigned_x / 100:
00000000 <ud_x_100>:
- 0: 8b 44 24 04 mov 0x4(%esp),%eax
- 4: ba 64 00 00 00 mov $0x64,%edx
- 9: 89 d1 mov %edx,%ecx
- b: 31 d2 xor %edx,%edx
- d: f7 f1 div %ecx
- f: c3 ret
+ 0: b8 1f 85 eb 51 mov $0x51eb851f,%eax
+ 5: f7 64 24 04 mull 0x4(%esp)
+ 9: 89 d0 mov %edx,%eax
+ b: c1 e8 05 shr $0x5,%eax
+ e: c3 ret
Some cases got shorter by 2 or 4 bytes:
-00000000 00000010 T ud_x_3
+00000000 0000000e T ud_x_3
-00000000 00000010 T ud_x_9
+00000000 0000000e T ud_x_9
-00000000 00000010 T ud_x_67
+00000000 0000000e T ud_x_67
-00000000 00000010 T ud_x_641
+00000000 0000000c T ud_x_641
-00000000 00000010 T ud_x_6700417
+00000000 0000000c T ud_x_6700417
For example, unsigned_x / 9:
00000000 <ud_x_9>:
- 0: 8b 44 24 04 mov 0x4(%esp),%eax
- 4: ba 09 00 00 00 mov $0x9,%edx
- 9: 89 d1 mov %edx,%ecx
- b: 31 d2 xor %edx,%edx
- d: f7 f1 div %ecx
- f: c3 ret
+ 0: b8 39 8e e3 38 mov $0x38e38e39,%eax
+ 5: f7 64 24 04 mull 0x4(%esp)
+ 9: 89 d0 mov %edx,%eax
+ b: d1 e8 shr %eax
+ d: c3 ret
and unsigned_x / 641:
00000000 <ud_x_641>:
- 0: 8b 44 24 04 mov 0x4(%esp),%eax
- 4: ba 81 02 00 00 mov $0x281,%edx
- 9: 89 d1 mov %edx,%ecx
- b: 31 d2 xor %edx,%edx
- d: f7 f1 div %ecx
- f: c3 ret
+ 0: b8 81 3d 66 00 mov $0x663d81,%eax
+ 5: f7 64 24 04 mull 0x4(%esp)
+ 9: 89 d0 mov %edx,%eax
+ b: c3 ret
I will attach t32.asm.diff now
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
` (10 preceding siblings ...)
2009-06-21 16:47 ` vda dot linux at googlemail dot com
@ 2009-06-21 16:48 ` vda dot linux at googlemail dot com
2009-06-30 13:36 ` hubicka at gcc dot gnu dot org
2010-01-08 9:06 ` steven at gcc dot gnu dot org
13 siblings, 0 replies; 18+ messages in thread
From: vda dot linux at googlemail dot com @ 2009-06-21 16:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from vda dot linux at googlemail dot com 2009-06-21 16:48 -------
Created an attachment (id=18041)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18041&action=view)
Comparison of generated code with 4.4.svn.20090528 on i86
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
` (11 preceding siblings ...)
2009-06-21 16:48 ` vda dot linux at googlemail dot com
@ 2009-06-30 13:36 ` hubicka at gcc dot gnu dot org
2010-01-08 9:06 ` steven at gcc dot gnu dot org
13 siblings, 0 replies; 18+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2009-06-30 13:36 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from hubicka at gcc dot gnu dot org 2009-06-30 13:36 -------
Hmm,
looking at the cases it seems that main reason for the win is the fact that
idiv needs integer load instruction that has long immediate and we don't
optimize these for -Os well.
I suppose for -Os following is wrong:
case CONST_INT:
case CONST:
case LABEL_REF:
case SYMBOL_REF:
if (TARGET_64BIT && !x86_64_immediate_operand (x, VOIDmode))
*total = 3;
else if (TARGET_64BIT && !x86_64_zext_immediate_operand (x, VOIDmode))
*total = 2;
else if (flag_pic && SYMBOLIC_CONST (x)
&& (!TARGET_64BIT
|| (!GET_CODE (x) != LABEL_REF
&& (GET_CODE (x) != SYMBOL_REF
|| !SYMBOL_REF_LOCAL_P (x)))))
*total = 1;
else
*total = 0;
return true;
It probably should return actual size of load instruction with full sized
immediate and the individual cases matching RTL codes should know where
instruction allows cheap immediate operand encoding and prevent recursion
counting operand size itself.
I will look into this but won't complain if someone beats me :))
Honza
--
hubicka at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |hubicka at gcc dot gnu dot
|dot org |org
Status|NEW |ASSIGNED
Last reconfirmed|2009-06-06 13:41:23 |2009-06-30 13:36:11
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
` (12 preceding siblings ...)
2009-06-30 13:36 ` hubicka at gcc dot gnu dot org
@ 2010-01-08 9:06 ` steven at gcc dot gnu dot org
13 siblings, 0 replies; 18+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-01-08 9:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from steven at gcc dot gnu dot org 2010-01-08 09:06 -------
Honza, you said in comment #13 that you would look at this -- got news?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
[not found] <bug-30354-4@http.gcc.gnu.org/bugzilla/>
2012-06-28 23:29 ` aldot at gcc dot gnu.org
2013-01-18 10:29 ` vda.linux at googlemail dot com
@ 2021-12-21 11:42 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 18+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-21 11:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
[not found] <bug-30354-4@http.gcc.gnu.org/bugzilla/>
2012-06-28 23:29 ` aldot at gcc dot gnu.org
@ 2013-01-18 10:29 ` vda.linux at googlemail dot com
2021-12-21 11:42 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 18+ messages in thread
From: vda.linux at googlemail dot com @ 2013-01-18 10:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
--- Comment #16 from Denis Vlasenko <vda.linux at googlemail dot com> 2013-01-18 10:29:12 UTC ---
(In reply to comment #15)
> Honza, did you find time to have a look?
>
> I think this regressed alot in 4.6
Not really - it's just .eh_frame section.
I re-ran the tests with two gcc's I have here and sizes look like this:
text data bss dec hex filename
257731 0 0 257731 3eec3 divmod-4.2.1-Os.o
242787 0 0 242787 3b463 divmod-4.6.3-Os.o
Stock (unpatched) gcc improved, juggles registers better. For example:
int ib_100_x(int x) { return (100 / x) ^ (100 % x); }
0: b8 64 00 00 00 mov $0x64,%eax
5: 99 cltd
6: f7 7c 24 04 idivl 0x4(%esp)
- a: 31 c2 xor %eax,%edx
- c: 89 d0 mov %edx,%eax
- e: c3 ret
+ a: 31 d0 xor %edx,%eax
+ c: c3 ret
I believe my patch would improve things still - it is orthogonal to register
allocation.
BTW, just so that we are all on the same page wrt compiler options:
here's the script I use to compile, disassemble, and extract function sizes
from test program in comment 3. Tweakable by setting $PREFIX and/or $CC:
gencode.sh
==========
#!/bin/sh
#PREFIX="i686-"
test "$PREFIX" || PREFIX=""
test "$CC" || CC="${PREFIX}gcc"
test "$OBJDUMP" || OBJDUMP="${PREFIX}objdump"
test "$NM" || NM="${PREFIX}nm"
CC_VER=`$CC --version | sed -n 's/[^ ]* [^ ]* \([3-9]\.[1-9][^ ]*\).*/\1/p'`
test "$CC_VER" || exit 1
build()
{
opt=$1
bname=divmod-$CC_VER${opt}${nail}
# -ffunction-sections makes disasm easier to understand
# (insn offsets start from 0 within every function).
# -fno-exceptions -fno-asynchronous-unwind-tables: die, .eh_frame, die!
$CC \
-m32 \
-fomit-frame-pointer \
-ffunction-sections \
-fno-exceptions \
-fno-asynchronous-unwind-tables \
${opt} t.c -c -o $bname.o \
&& $OBJDUMP -dr $bname.o >$bname.asm \
&& $NM --size-sort $bname.o | sort -k3 >$bname.nm
}
build -Os
#build -O2 #not interesting
#build -O3 #not interesting
size *.o | tee SIZES
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
[not found] <bug-30354-4@http.gcc.gnu.org/bugzilla/>
@ 2012-06-28 23:29 ` aldot at gcc dot gnu.org
2013-01-18 10:29 ` vda.linux at googlemail dot com
2021-12-21 11:42 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 18+ messages in thread
From: aldot at gcc dot gnu.org @ 2012-06-28 23:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354
Bernhard Reutner-Fischer <aldot at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Last reconfirmed|2009-06-30 13:36:11 |2012-06-29
Host|i386-pc-linux-gnu |
Build|i386-pc-linux-gnu |
--- Comment #15 from Bernhard Reutner-Fischer <aldot at gcc dot gnu.org> 2012-06-28 23:28:28 UTC ---
Honza, did you find time to have a look?
I think this regressed alot in 4.6
$ for i in 3.4 4.2 4.4 4.5 4.6 4.7 4.8;do gcc-$i -fomit-frame-pointer -m32 -Os
-o t-$i.o -c t.c;done
$ size t-*.o
text data bss dec hex filename
254731 0 0 254731 3e30b t-3.4.o
257731 0 0 257731 3eec3 t-4.2.o
242787 0 0 242787 3b463 t-4.4.o
242787 0 0 242787 3b463 t-4.5.o
542811 0 0 542811 8485b t-4.6.o
542811 0 0 542811 8485b t-4.7.o
542811 0 0 542811 8485b t-4.8.o
where:
gcc version 3.4.6 (Debian 3.4.6-10)
gcc version 4.2.4 (Debian 4.2.4-6)
gcc version 4.4.7 (Debian 4.4.7-1)
gcc version 4.5.3 (Debian 4.5.3-12)
gcc version 4.6.3 (Debian 4.6.3-1)
gcc version 4.7.1 (Debian 4.7.1-2)
4.8 was pristine (just unrelated fixups)
gcc version 4.8.0 20120514 (experimental) [fixups revision
19d3eef:c8f5cfb:cbf2756acd7df8cfb441025e4512b97b6ef2fd10] (GCC)
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-12-21 11:42 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-02 22:09 [Bug rtl-optimization/30354] New: -Os doesn't optimize a/CONST even if it saves size vda dot linux at googlemail dot com
2007-01-02 22:45 ` [Bug target/30354] " pinskia at gcc dot gnu dot org
2007-07-25 15:05 ` vda dot linux at googlemail dot com
2007-07-25 15:09 ` vda dot linux at googlemail dot com
2007-07-25 15:17 ` vda dot linux at googlemail dot com
2007-07-25 15:22 ` vda dot linux at googlemail dot com
2009-06-05 16:19 ` aldot at gcc dot gnu dot org
2009-06-06 13:41 ` hubicka at gcc dot gnu dot org
2009-06-21 16:11 ` vda dot linux at googlemail dot com
2009-06-21 16:12 ` vda dot linux at googlemail dot com
2009-06-21 16:26 ` rguenth at gcc dot gnu dot org
2009-06-21 16:47 ` vda dot linux at googlemail dot com
2009-06-21 16:48 ` vda dot linux at googlemail dot com
2009-06-30 13:36 ` hubicka at gcc dot gnu dot org
2010-01-08 9:06 ` steven at gcc dot gnu dot org
[not found] <bug-30354-4@http.gcc.gnu.org/bugzilla/>
2012-06-28 23:29 ` aldot at gcc dot gnu.org
2013-01-18 10:29 ` vda.linux at googlemail dot com
2021-12-21 11:42 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).