From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11309 invoked by alias); 6 Feb 2015 13:56:23 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 6383 invoked by uid 48); 6 Feb 2015 13:56:19 -0000 From: "enkovich.gnu at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/64960] New: Inefficient address pre-computation in PIC mode Date: Fri, 06 Feb 2015 13:56:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: enkovich.gnu at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-02/txt/msg00573.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64960 Bug ID: 64960 Summary: Inefficient address pre-computation in PIC mode Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com After EBX was unfixed in i386 PIC target, we may see addresses of static objects are loaded from GOT and placed to the stack for later usage. It allows to reuse PIC register for other purposes. But in cases when PIC register is still used (e.g. for calls) it may cause inefficiency in produced code. Here is an example: >cat test.c void f (int); int val1, *val2, val3; int test (int max) { int i; for (i = 0; i < max; i++) { val1 += val2[i]; f (val3); } } >gcc test.c -O2 -fPIE -S -m32 -ffixed-esi -ffixed-edi -ffixed-edx >cat test.s ... movl val1@GOT(%ebx), %eax <-- may be removed xorl %ebp, %ebp movl %eax, 4(%esp) <-- may be removed movl val2@GOT(%ebx), %eax <-- may be removed movl %eax, 8(%esp) <-- may be removed movl val3@GOT(%ebx), %eax <-- may be removed movl %eax, 12(%esp) <-- may be removed .L3: movl 8(%esp), %eax <-- equal to movl val2@GOT(%ebx), %eax subl $12, %esp movl (%eax), %ecx movl 16(%esp), %eax <-- equal to movl val1@GOT(%ebx), %eax movl (%ecx,%ebp,4), %ecx addl %ecx, (%eax) addl $1, %ebp movl 24(%esp), %eax <-- equal to movl val3@GOT(%ebx), %eax pushl (%eax) call f@PLT addl $16, %esp cmpl %ebp, 32(%esp) jne .L3 ... Also storing value on the stack doesn't benefit on static objects optimization performed by linker which transforms movl @GOT into lea instruction. It would be useful to avoid early address computation in case PIC register is available at address usage. Here is a code generated by GCC 4.9: xorl %ebp, %ebp .L2: movl val2@GOT(%ebx), %eax subl $12, %esp movl (%eax), %ecx movl val1@GOT(%ebx), %eax movl (%ecx,%ebp,4), %ecx addl %ecx, (%eax) addl $1, %ebp movl val3@GOT(%ebx), %eax pushl (%eax) call f@PLT addl $16, %esp cmpl 16(%esp), %ebp jne .L2 Used gcc (GCC) 5.0.0 20150205 (experimental).