public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/21182] New: gcc can use registers but uses stack instead
@ 2005-04-23 22:30 vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-04-23 22:32 ` [Bug rtl-optimization/21182] " vda at port dot imtp dot ilyichevsk dot odessa dot ua
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: vda at port dot imtp dot ilyichevsk dot odessa dot ua @ 2005-04-23 22:30 UTC (permalink / raw)
To: gcc-bugs
in this long but relatively simple function gcc
can store all frequently used local variables in registers,
but it fails to do so.
gcc can be forced to do this optimization by asm("reg") modifiers.
Resulting code is ~1k smaller.
# gcc -v
Reading specs from
/.share/usr/app/gcc-3.4.3/bin/../lib/gcc/i386-pc-linux-gnu/3.4.3/specs
Configured with: ../gcc-3.4.3/configure --prefix=/usr/app/gcc-3.4.3
--exec-prefix=/usr/app/gcc-3.4.3 --bindir=/usr/bin --sbindir=/usr/sbin
--libexecdir=/usr/app/gcc-3.4.3/libexec --datadir=/usr/app/gcc-3.4.3/share
--sysconfdir=/etc --sharedstatedir=/usr/app/gcc-3.4.3/var/com
--localstatedir=/usr/app/gcc-3.4.3/var --libdir=/usr/lib
--includedir=/usr/include --infodir=/usr/info --mandir=/usr/man
--with-slibdir=/usr/app/gcc-3.4.3/lib --with-local-prefix=/usr/local
--with-gxx-include-dir=/usr/app/gcc-3.4.3/include/g++-v3
--enable-languages=c,c++ --with-system-zlib --disable-nls --enable-threads=posix
i386-pc-linux-gnu
Thread model: posix
gcc version 3.4.3
--
Summary: gcc can use registers but uses stack instead
Product: gcc
Version: 3.4.3
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: vda at port dot imtp dot ilyichevsk dot odessa dot ua
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: i386-pc-linux-gnu
GCC host triplet: i386-pc-linux-gnu
GCC target triplet: i386-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/21182] gcc can use registers but uses stack instead
2005-04-23 22:30 [Bug rtl-optimization/21182] New: gcc can use registers but uses stack instead vda at port dot imtp dot ilyichevsk dot odessa dot ua
@ 2005-04-23 22:32 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-04-23 22:39 ` pinskia at gcc dot gnu dot org
` (4 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: vda at port dot imtp dot ilyichevsk dot odessa dot ua @ 2005-04-23 22:32 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From vda at port dot imtp dot ilyichevsk dot odessa dot ua 2005-04-23 22:32 -------
Created an attachment (id=8719)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8719&action=view)
testcase. change #if 0 into #if 1 and compare resulting asm
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/21182] gcc can use registers but uses stack instead
2005-04-23 22:30 [Bug rtl-optimization/21182] New: gcc can use registers but uses stack instead vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-04-23 22:32 ` [Bug rtl-optimization/21182] " vda at port dot imtp dot ilyichevsk dot odessa dot ua
@ 2005-04-23 22:39 ` pinskia at gcc dot gnu dot org
2005-04-23 22:49 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-04-23 22:39 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-04-23 22:39 -------
Hmm, on the mainline, we get for wc -l:
1613 t.s
1459 t1.s
t1 is the normal #if 0.
Note I used "-O2 -fomit-frame-pointer".
--
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization, ra
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/21182] gcc can use registers but uses stack instead
2005-04-23 22:30 [Bug rtl-optimization/21182] New: gcc can use registers but uses stack instead vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-04-23 22:32 ` [Bug rtl-optimization/21182] " vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-04-23 22:39 ` pinskia at gcc dot gnu dot org
@ 2005-04-23 22:49 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-04-23 22:54 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: vda at port dot imtp dot ilyichevsk dot odessa dot ua @ 2005-04-23 22:49 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From vda at port dot imtp dot ilyichevsk dot odessa dot ua 2005-04-23 22:49 -------
Aha!
I found out that gcc will use registers with -O3, but not with -O2.
# gcc -O3 serpent.c -S -o serpent-O3.s
# gcc -O2 serpent.c -S -o serpent-O2.s
# ls -l
-rw-r--r-- 1 root root 27975 Apr 24 01:47 serpent-O2.s
-rw-r--r-- 1 root root 21566 Apr 24 01:47 serpent-O3.s
# wc -l serpent-O2.s serpent-O3.s
1558 serpent-O2.s
1265 serpent-O3.s
2823 total
I don't have 4.0.0 here yet...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/21182] gcc can use registers but uses stack instead
2005-04-23 22:30 [Bug rtl-optimization/21182] New: gcc can use registers but uses stack instead vda at port dot imtp dot ilyichevsk dot odessa dot ua
` (2 preceding siblings ...)
2005-04-23 22:49 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
@ 2005-04-23 22:54 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-04-24 13:05 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-05-07 15:24 ` steven at gcc dot gnu dot org
5 siblings, 0 replies; 11+ messages in thread
From: vda at port dot imtp dot ilyichevsk dot odessa dot ua @ 2005-04-23 22:54 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From vda at port dot imtp dot ilyichevsk dot odessa dot ua 2005-04-23 22:54 -------
These are -O2 and -O3 code comparison.
-O3 code have all modified variables in registers
and thus is smaller and most likely faster.
serpent_encrypt:
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
pushl %ebx
subl $256, %esp
movl 8(%ebp), %edx
movl 16(%ebp), %eax
movl 12(%eax), %ebx
movl 12(%edx), %ecx
xorl %ebx, %ecx
movl (%edx), %edi
movl %ecx, -20(%ebp)
xorl (%eax), %edi
movl 8(%edx), %ecx
movl 4(%edx), %ebx
movl -20(%ebp), %esi
xorl 8(%eax), %ecx
orl %edi, -20(%ebp)
xorl 4(%eax), %ebx
xorl %ebx, -20(%ebp)
xorl %esi, %edi
xorl %ecx, %esi
andl %edi, %ebx
xorl %edi, %ecx
notl %esi
xorl -20(%ebp), %edi
movl %edx, -16(%ebp)
serpent_encrypt:
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
pushl %ebx
pushl %edx
movl 8(%ebp), %edi
movl 16(%ebp), %ecx
movl 12(%edi), %eax
xorl 12(%ecx), %eax
movl 8(%edi), %esi
movl 4(%edi), %edx
movl (%edi), %ebx
xorl 8(%ecx), %esi
xorl 4(%ecx), %edx
xorl (%ecx), %ebx
movl %eax, %ecx
orl %ebx, %ecx
xorl %eax, %ebx
xorl %esi, %eax
xorl %edx, %ecx
notl %eax
andl %ebx, %edx
xorl %eax, %edx
xorl %ebx, %esi
xorl %ecx, %ebx
orl %ebx, %eax
xorl %esi, %ebx
andl %edx, %esi
xorl %esi, %eax
notl %edx
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/21182] gcc can use registers but uses stack instead
2005-04-23 22:30 [Bug rtl-optimization/21182] New: gcc can use registers but uses stack instead vda at port dot imtp dot ilyichevsk dot odessa dot ua
` (3 preceding siblings ...)
2005-04-23 22:54 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
@ 2005-04-24 13:05 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-05-07 15:24 ` steven at gcc dot gnu dot org
5 siblings, 0 replies; 11+ messages in thread
From: vda at port dot imtp dot ilyichevsk dot odessa dot ua @ 2005-04-24 13:05 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From vda at port dot imtp dot ilyichevsk dot odessa dot ua 2005-04-24 13:05 -------
With 4.0.0: gcc -O2 gives the same result as gcc -O3,
which is better than gcc 3.4.3 -O2 but worse than 3.4.3 -O3.
For example:
movl %edx, -20(%ebp)
orl %ecx, %edi
movl %ebx, %esi
xorl %ecx, %esi
andl %eax, %ebx
xorl %edi, %ebx
movl %eax, %ecx
notl %ecx
xorl %ebx, %ecx
orl %edi, %eax
xorl %eax, %esi
rorl $19, %esi
rorl $29, -20(%ebp)
xorl %esi, %ebx
xorl -20(%ebp), %ecx
xorl -20(%ebp), %ebx
rorl $31, %ebx
leal 0(,%esi,8), %edx
1) Why %edx was stored in -20(%ebp), there is no %edx usage
in the following insns. %edx value could stay in register
and we can continue to work on its value in register.
2) rorl $31, %ebx == roll $1, %ebx, but 1 bit roll insn is
smaller.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/21182] gcc can use registers but uses stack instead
2005-04-23 22:30 [Bug rtl-optimization/21182] New: gcc can use registers but uses stack instead vda at port dot imtp dot ilyichevsk dot odessa dot ua
` (4 preceding siblings ...)
2005-04-24 13:05 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
@ 2005-05-07 15:24 ` steven at gcc dot gnu dot org
5 siblings, 0 replies; 11+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-05-07 15:24 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Last reconfirmed|0000-00-00 00:00:00 |2005-05-07 15:23:12
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/21182] gcc can use registers but uses stack instead
[not found] <bug-21182-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2013-01-18 0:55 ` vda.linux at googlemail dot com
@ 2013-01-18 0:57 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2013-01-18 0:57 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182
--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> 2013-01-18 00:57:00 UTC ---
It would be interesting to try the trunk which has a newer register allocator
than even 4.6.x/4.7.x.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/21182] gcc can use registers but uses stack instead
[not found] <bug-21182-4@http.gcc.gnu.org/bugzilla/>
2013-01-18 0:48 ` vda.linux at googlemail dot com
2013-01-18 0:51 ` vda.linux at googlemail dot com
@ 2013-01-18 0:55 ` vda.linux at googlemail dot com
2013-01-18 0:57 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 11+ messages in thread
From: vda.linux at googlemail dot com @ 2013-01-18 0:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182
--- Comment #8 from Denis Vlasenko <vda.linux at googlemail dot com> 2013-01-18 00:55:37 UTC ---
Grrr, another mistake. Correcting again:
Conclusion:
gcc-3.4.3 -O3 was close to ideal.
^^^^^^^^^
gcc-4.2.1 is worse.
gcc-4.6.3 got better a bit, still not as good as gcc-3.4.3 -O3 used to be.
^^^^^^^^^^^^^^^^^^^^^^^^^
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/21182] gcc can use registers but uses stack instead
[not found] <bug-21182-4@http.gcc.gnu.org/bugzilla/>
2013-01-18 0:48 ` vda.linux at googlemail dot com
@ 2013-01-18 0:51 ` vda.linux at googlemail dot com
2013-01-18 0:55 ` vda.linux at googlemail dot com
2013-01-18 0:57 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 11+ messages in thread
From: vda.linux at googlemail dot com @ 2013-01-18 0:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182
Denis Vlasenko <vda.linux at googlemail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vda.linux at googlemail dot
| |com
--- Comment #7 from Denis Vlasenko <vda.linux at googlemail dot com> 2013-01-18 00:51:01 UTC ---
"gcc-4.6.3 got better a bit, still not as good as gcc-4.6.3 -O3."
I meant:
gcc-4.6.3 got better a bit, still not as good as gcc-3.4.3 -O3 used to be.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/21182] gcc can use registers but uses stack instead
[not found] <bug-21182-4@http.gcc.gnu.org/bugzilla/>
@ 2013-01-18 0:48 ` vda.linux at googlemail dot com
2013-01-18 0:51 ` vda.linux at googlemail dot com
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: vda.linux at googlemail dot com @ 2013-01-18 0:48 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182
--- Comment #6 from Denis Vlasenko <vda.linux at googlemail dot com> 2013-01-18 00:48:23 UTC ---
Created attachment 29200
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29200
Updated testcase, build heper, and results of testing with different gcc
versions
Tarball contains:
serpent.c:
the original testcase, only with "#ifdef NAIL_REGS" instead of "#if 0" which
allows test compiles w/o editing it. Basically, "gcc -DNAIL_REGS serpent.c"
will try to force gcc to use only registers instead of stack.
gencode.sh:
builds serpent.c with -O2 and -O3, with and without -DNAIL_REGS. The object
file names contain gcc version and used options. Then they are objdump'ed and
output saved. Tweakable with setting $PREFIX and/or $CC.
No -fomit-frame-pointer used: the testcase can be compiled so that stack is not
used even without that option.
Disassembly:
serpent-O2-3.4.3.asm
serpent-O2-4.2.1.asm
serpent-O2-4.6.3.asm
serpent-O2-DNAIL_REGS-3.4.3.asm
serpent-O2-DNAIL_REGS-4.2.1.asm
serpent-O2-DNAIL_REGS-4.6.3.asm
serpent-O3-3.4.3.asm
serpent-O3-4.2.1.asm
serpent-O3-4.6.3.asm
serpent-O3-DNAIL_REGS-3.4.3.asm
serpent-O3-DNAIL_REGS-4.2.1.asm
serpent-O3-DNAIL_REGS-4.6.3.asm
Object files:
text data bss dec hex filename
3260 0 0 3260 cbc serpent-O2-DNAIL_REGS-3.4.3.o
3260 0 0 3260 cbc serpent-O3-DNAIL_REGS-3.4.3.o
3292 0 0 3292 cdc serpent-O3-3.4.3.o
3536 0 0 3536 dd0 serpent-O2-4.6.3.o
3536 0 0 3536 dd0 serpent-O3-4.6.3.o
3845 0 0 3845 f05 serpent-O2-DNAIL_REGS-4.6.3.o
3845 0 0 3845 f05 serpent-O3-DNAIL_REGS-4.6.3.o
3877 0 0 3877 f25 serpent-O2-4.2.1.o
3877 0 0 3877 f25 serpent-O3-4.2.1.o
4302 0 0 4302 10ce serpent-O2-3.4.3.o
4641 0 0 4641 1221 serpent-O2-DNAIL_REGS-4.2.1.o
4641 0 0 4641 1221 serpent-O3-DNAIL_REGS-4.2.1.o
Take a look inside serpent-O2-DNAIL_REGS-3.4.3.asm file.
This is what I want to get without asm hacks: the smallest code, uses no stack.
gcc-3.4.3 -O3 comes close: it does spill a few words to stack (search for
(%ebp)), but is generally good code (close to ideal?).
All other attempts fare worse:
gcc-3.4.3 -O2: code is significantly worse than -O3.
gcc-4.2.1 -O2/-O3: code is better than gcc-3.4.3 -O2, worse than gcc-4.6.3
gcc-4.6.3 -O2/-O3: six instances of spills to stack . Code is still not as good
as gcc-3.4.3 -O3. (-DNAIL_REGS only confuses it more, unlike 3.4.3).
Stack usage summary:
$ grep 'sub.*,%esp' *.asm | grep -v DNAIL_REGS
serpent-O2-3.4.3.asm: 6: 81 ec 00 01 00 00 sub $0x100,%esp
serpent-O2-4.2.1.asm: 6: 83 ec 78 sub $0x78,%esp
serpent-O2-4.6.3.asm: 4: 83 ec 04 sub $0x4,%esp
serpent-O3-4.2.1.asm: 6: 83 ec 78 sub $0x78,%esp
serpent-O3-4.6.3.asm: 4: 83 ec 04 sub $0x4,%esp
(serpent-O3-3.4.3.asm is not listed, but it allocates and uses one word on
stack by push insn).
Modules with best (= minimal) stack usage:
$ grep -F -e '(%esp)' -e '(%ebp)' serpent-O2-DNAIL_REGS-3.4.3.asm
6: 8b 75 08 mov 0x8(%ebp),%esi
9: 8b 7d 10 mov 0x10(%ebp),%edi
ca9: 8b 75 0c mov 0xc(%ebp),%esi
$ grep -F -e '(%esp)' -e '(%ebp)' serpent-O3-3.4.3.asm
7: 8b 7d 08 mov 0x8(%ebp),%edi
a: 8b 4d 10 mov 0x10(%ebp),%ecx
18c: 89 7d f0 mov %edi,-0x10(%ebp)
1dd: 8b 45 f0 mov -0x10(%ebp),%eax
23b: 8b 75 f0 mov -0x10(%ebp),%esi
299: 8b 7d f0 mov -0x10(%ebp),%edi
432: 8b 55 f0 mov -0x10(%ebp),%edx
4a0: 8b 4d f0 mov -0x10(%ebp),%ecx
50e: 8b 7d f0 mov -0x10(%ebp),%edi
84f: 8b 45 f0 mov -0x10(%ebp),%eax
8b9: 8b 75 f0 mov -0x10(%ebp),%esi
923: 8b 7d f0 mov -0x10(%ebp),%edi
cb6: 8b 55 0c mov 0xc(%ebp),%edx
$ grep -F -e '(%esp)' -e '(%ebp)' serpent-O3-4.6.3.asm
7: 8b 4c 24 20 mov 0x20(%esp),%ecx
b: 8b 44 24 18 mov 0x18(%esp),%eax
22e: 89 0c 24 mov %ecx,(%esp)
239: 23 3c 24 and (%esp),%edi
588: 89 0c 24 mov %ecx,(%esp)
58f: 23 3c 24 and (%esp),%edi
8f4: 89 0c 24 mov %ecx,(%esp)
8fd: 23 3c 24 and (%esp),%edi
c60: 89 0c 24 mov %ecx,(%esp)
c6b: 23 3c 24 and (%esp),%edi
d37: 89 14 24 mov %edx,(%esp)
d5a: 8b 44 24 1c mov 0x1c(%esp),%eax
d5e: 33 14 24 xor (%esp),%edx
Conclusion:
gcc-4.6.3 -O3 was close to ideal.
gcc-4.2.1 is worse.
gcc-4.6.3 got better a bit, still not as good as gcc-4.6.3 -O3.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-01-18 0:57 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-23 22:30 [Bug rtl-optimization/21182] New: gcc can use registers but uses stack instead vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-04-23 22:32 ` [Bug rtl-optimization/21182] " vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-04-23 22:39 ` pinskia at gcc dot gnu dot org
2005-04-23 22:49 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-04-23 22:54 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-04-24 13:05 ` vda at port dot imtp dot ilyichevsk dot odessa dot ua
2005-05-07 15:24 ` steven at gcc dot gnu dot org
[not found] <bug-21182-4@http.gcc.gnu.org/bugzilla/>
2013-01-18 0:48 ` vda.linux at googlemail dot com
2013-01-18 0:51 ` vda.linux at googlemail dot com
2013-01-18 0:55 ` vda.linux at googlemail dot com
2013-01-18 0:57 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).