From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5171 invoked by alias); 18 Jan 2013 00:48:54 -0000 Received: (qmail 5128 invoked by uid 48); 18 Jan 2013 00:48:30 -0000 From: "vda.linux at googlemail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/21182] gcc can use registers but uses stack instead Date: Fri, 18 Jan 2013 00:48:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: vda.linux at googlemail dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2013-01/txt/msg01676.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182 --- Comment #6 from Denis Vlasenko 2013-01-18 00:48:23 UTC --- Created attachment 29200 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29200 Updated testcase, build heper, and results of testing with different gcc versions Tarball contains: serpent.c: the original testcase, only with "#ifdef NAIL_REGS" instead of "#if 0" which allows test compiles w/o editing it. Basically, "gcc -DNAIL_REGS serpent.c" will try to force gcc to use only registers instead of stack. gencode.sh: builds serpent.c with -O2 and -O3, with and without -DNAIL_REGS. The object file names contain gcc version and used options. Then they are objdump'ed and output saved. Tweakable with setting $PREFIX and/or $CC. No -fomit-frame-pointer used: the testcase can be compiled so that stack is not used even without that option. Disassembly: serpent-O2-3.4.3.asm serpent-O2-4.2.1.asm serpent-O2-4.6.3.asm serpent-O2-DNAIL_REGS-3.4.3.asm serpent-O2-DNAIL_REGS-4.2.1.asm serpent-O2-DNAIL_REGS-4.6.3.asm serpent-O3-3.4.3.asm serpent-O3-4.2.1.asm serpent-O3-4.6.3.asm serpent-O3-DNAIL_REGS-3.4.3.asm serpent-O3-DNAIL_REGS-4.2.1.asm serpent-O3-DNAIL_REGS-4.6.3.asm Object files: text data bss dec hex filename 3260 0 0 3260 cbc serpent-O2-DNAIL_REGS-3.4.3.o 3260 0 0 3260 cbc serpent-O3-DNAIL_REGS-3.4.3.o 3292 0 0 3292 cdc serpent-O3-3.4.3.o 3536 0 0 3536 dd0 serpent-O2-4.6.3.o 3536 0 0 3536 dd0 serpent-O3-4.6.3.o 3845 0 0 3845 f05 serpent-O2-DNAIL_REGS-4.6.3.o 3845 0 0 3845 f05 serpent-O3-DNAIL_REGS-4.6.3.o 3877 0 0 3877 f25 serpent-O2-4.2.1.o 3877 0 0 3877 f25 serpent-O3-4.2.1.o 4302 0 0 4302 10ce serpent-O2-3.4.3.o 4641 0 0 4641 1221 serpent-O2-DNAIL_REGS-4.2.1.o 4641 0 0 4641 1221 serpent-O3-DNAIL_REGS-4.2.1.o Take a look inside serpent-O2-DNAIL_REGS-3.4.3.asm file. This is what I want to get without asm hacks: the smallest code, uses no stack. gcc-3.4.3 -O3 comes close: it does spill a few words to stack (search for (%ebp)), but is generally good code (close to ideal?). All other attempts fare worse: gcc-3.4.3 -O2: code is significantly worse than -O3. gcc-4.2.1 -O2/-O3: code is better than gcc-3.4.3 -O2, worse than gcc-4.6.3 gcc-4.6.3 -O2/-O3: six instances of spills to stack . Code is still not as good as gcc-3.4.3 -O3. (-DNAIL_REGS only confuses it more, unlike 3.4.3). Stack usage summary: $ grep 'sub.*,%esp' *.asm | grep -v DNAIL_REGS serpent-O2-3.4.3.asm: 6: 81 ec 00 01 00 00 sub $0x100,%esp serpent-O2-4.2.1.asm: 6: 83 ec 78 sub $0x78,%esp serpent-O2-4.6.3.asm: 4: 83 ec 04 sub $0x4,%esp serpent-O3-4.2.1.asm: 6: 83 ec 78 sub $0x78,%esp serpent-O3-4.6.3.asm: 4: 83 ec 04 sub $0x4,%esp (serpent-O3-3.4.3.asm is not listed, but it allocates and uses one word on stack by push insn). Modules with best (= minimal) stack usage: $ grep -F -e '(%esp)' -e '(%ebp)' serpent-O2-DNAIL_REGS-3.4.3.asm 6: 8b 75 08 mov 0x8(%ebp),%esi 9: 8b 7d 10 mov 0x10(%ebp),%edi ca9: 8b 75 0c mov 0xc(%ebp),%esi $ grep -F -e '(%esp)' -e '(%ebp)' serpent-O3-3.4.3.asm 7: 8b 7d 08 mov 0x8(%ebp),%edi a: 8b 4d 10 mov 0x10(%ebp),%ecx 18c: 89 7d f0 mov %edi,-0x10(%ebp) 1dd: 8b 45 f0 mov -0x10(%ebp),%eax 23b: 8b 75 f0 mov -0x10(%ebp),%esi 299: 8b 7d f0 mov -0x10(%ebp),%edi 432: 8b 55 f0 mov -0x10(%ebp),%edx 4a0: 8b 4d f0 mov -0x10(%ebp),%ecx 50e: 8b 7d f0 mov -0x10(%ebp),%edi 84f: 8b 45 f0 mov -0x10(%ebp),%eax 8b9: 8b 75 f0 mov -0x10(%ebp),%esi 923: 8b 7d f0 mov -0x10(%ebp),%edi cb6: 8b 55 0c mov 0xc(%ebp),%edx $ grep -F -e '(%esp)' -e '(%ebp)' serpent-O3-4.6.3.asm 7: 8b 4c 24 20 mov 0x20(%esp),%ecx b: 8b 44 24 18 mov 0x18(%esp),%eax 22e: 89 0c 24 mov %ecx,(%esp) 239: 23 3c 24 and (%esp),%edi 588: 89 0c 24 mov %ecx,(%esp) 58f: 23 3c 24 and (%esp),%edi 8f4: 89 0c 24 mov %ecx,(%esp) 8fd: 23 3c 24 and (%esp),%edi c60: 89 0c 24 mov %ecx,(%esp) c6b: 23 3c 24 and (%esp),%edi d37: 89 14 24 mov %edx,(%esp) d5a: 8b 44 24 1c mov 0x1c(%esp),%eax d5e: 33 14 24 xor (%esp),%edx Conclusion: gcc-4.6.3 -O3 was close to ideal. gcc-4.2.1 is worse. gcc-4.6.3 got better a bit, still not as good as gcc-4.6.3 -O3.