From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20143 invoked by alias); 20 Feb 2010 08:28:45 -0000 Received: (qmail 20112 invoked by uid 48); 20 Feb 2010 08:28:32 -0000 Date: Sat, 20 Feb 2010 08:28:00 -0000 Subject: [Bug target/43129] New: Simplify global variable's address loading with option -fpic X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "carrot at google dot com" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2010-02/txt/msg02031.txt.bz2 Compile the following code with options -march=armv5te -mthumb -Os -fpic extern int i; int foo(int j) { int t = i; i = j; return t; } GCC generates following code: foo: ldr r3, .L2 // A ldr r2, .L2+4 // B .LPIC0: add r3, pc // A ldr r3, [r3, r2] // B @ sp needed for prologue ldr r2, [r3] str r0, [r3] mov r0, r2 bx lr .L3: .align 2 .L2: .word _GLOBAL_OFFSET_TABLE_-(.LPIC0+4) // A .word i(GOT) // B Instructions marked A compute the address of GOT table, instructions marked B load the global variables GOT entry to get actual address of the global variable. There are 4 instructions and 2 constant pool entries in total. It can be simplified by applying the fact that the offset from label .LPIC0 to any GOT entry is fixed at linking time. The result is: foo: ldr r3, .L2 // C .LPIC0: add r3, pc // C ldr r3, [r3] // C @ sp needed for prologue ldr r2, [r3] str r0, [r3] mov r0, r2 bx lr .L3: .align 2 .L2: .word ABS_ADDRESS_OF_GOT_ENTRY_FOR_i -(.LPIC0+4) // C The instructions marked C load the actual address of a global variable. It uses only 3 instructions and 1 constant pool entry. It is both smaller and faster. But it is not always beneficial to use instruction sequence C. If there are many global variable accesses, by using code sequence B, each one global variable need 2 extra instructions to load its address. But using code sequence C, each one global variable need 3 extra instructions to load its address. Suppose there are n global variables, the code size needed to compute the actual addresses by instruction sequence A and B is: code_size(A) + code_size(B) * n = 2*2 + 4 + (2*2 + 4) * n = 8n + 8 <1> The code size needed by instruction sequence C is: code_size(C) = (3*2 + 4) * n = 10n <2> Let <1> = <2>, we get 8n + 8 = 10n n = 4 So if there are more than 4 global variables' access instruction sequence A and B is smaller, if there are less than 4 global variables' access instruction sequence C is smaller. If there are 4 global variables' access both methods have same code size. But code sequence C has one less memory load (the load in instructions A) and use one less register(the global register hold the GOT address). So code sequence C is still faster. For arm instruction set, both methods have same code sequence, but with different code size, now we have 4 * 2 + 4 + (4 * 2 + 4) * n = (4 * 3 + 4) * n n = 3 So the threshold value of n is 3 for arm instruction set. Now the problem is how to represent the offset from a code label to a global variable's GOT entry. Ian mentioned that arm relocation R_ARM_GOT_PREL can be used, but I can't find how to represent this relocation in gnu assembler. Any suggestions? -- Summary: Simplify global variable's address loading with option - fpic Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: carrot at google dot com GCC build triplet: i686-linux GCC host triplet: i686-linux GCC target triplet: arm-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129