From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10252 invoked by alias); 24 Jul 2002 18:34:54 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 10147 invoked from network); 24 Jul 2002 18:34:53 -0000 Received: from unknown (HELO hub.ott.qnx.com) (209.226.137.76) by sources.redhat.com with SMTP; 24 Jul 2002 18:34:53 -0000 Received: from smtp.ott.qnx.com (smtp.ott.qnx.com [10.0.2.158]) by hub.ott.qnx.com (8.9.3/8.9.3) with ESMTP id OAA16466 for ; Wed, 24 Jul 2002 14:32:09 -0400 Received: from node128.ott.qnx.com (node128 [10.0.0.128]) by smtp.ott.qnx.com (8.8.8/8.6.12) with ESMTP id OAA21628 for ; Wed, 24 Jul 2002 14:27:58 -0400 Received: by node128.ott.qnx.com (8.9.3/8.9.3) id SAA897593406 for gcc@gcc.gnu.org; Wed, 24 Jul 2002 18:32:36 GMT Message-Id: <200207241832.SAA897593406@node128.ott.qnx.com> Subject: QNX Neutrino mips PIC - RFC To: gcc@gcc.gnu.org Date: Wed, 24 Jul 2002 15:53:00 -0000 From: "Graeme Peterson" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SW-Source: 2002-07/txt/msg01171.txt.bz2 Hi, all. I am working on QNX Neutrino support in gcc, gdb and binutils for arm, mips, ppc, sh4 and x86. Here is a doc describing the QNX PIC conventions used on mips. Any and all feedback appreciated. I have also posted this on the binutils mailing list. Thanks in advance. Regards, GP ======================= This is a preliminary document describing the calling convention used by PIC code on QNX/Neutrino running on the MIPS. A. Introduction The MIPS ABI describes a calling-convention for implementing Position- Independent Code ('PIC'). While the ABI calling convention is well-established, it has a couple of drawbacks which make it less than ideal for use in an embedded environment. These are listed below: 1. Because of the way an ABI PIC function determines the address of its Global Offset Table ('GOT'), it requires its own address to be passed in register $25 on function entry. The practical consequence of this requirement is that all functions (in both the PIC libraries and the executable) have to do indirect calls, i.e. calls through register $25. This means that all code must be compiled PIC (and pay the size penalty of PIC code). 2. The GCC compiler/assembler does not do a great job at code generation for MIPS 'abicalls' code (PIC code). Unnessecary NOP's are being inserted, and the function prologue always contains the code to compute the GOT address, even if there were no GOT references within that function. The first point was particularly troublesome, as it meant that all applications using a shared library would have to be compiled PIC, which results in a significant code size increase. For both of these reasons, we decided that simple modifications to the calling convention could solve the first problem. While coding the new calling convention in GCC, we also implemented various optimizations, which reduced the code expansion of PIC code. The following sections describe the new MIPS PIC convention, hereafter called "QNX PIC". B. QNX PIC calling convention on MIPS The calling convention for PIC code follows the ABI spec for register assignement, stack layout and parameter passing. However, it differs from the ABI in the following respects: 1. PIC code should never damage the gp ($28) register. 2. PIC code reserves register s7 ($23) to store the address of its GOT. All symbol references within that PIC module ("library") are made through the GOT, and are thus addressed as offsets from s7. 3. Every PIC function which needs to access a symbol from the GOT should load register s7 at the end of the function prologue, before any GOT symbols are accessed. The code used to load s7 with the address of the GOT is as follows: bltzal $0,0 nop 0: lui $s7, %gothi addiu $s7, $s7, %gotlo add $s7, $s7, $ra The %gothi / %gotlo pair are special relocations output by the assembler. Since the above code implicitely destroys $ra and $s7, they must be saved in the function prologue prior to the loading of the GOT. 4. All function calls from a PIC function have to be indirect calls, done through a register. However, this does not have to be $25 as in ABI PIC code: la $t3, printf jalr $t3 which becomes: lw $t3,printf@got($s7) jalr $t3 Note that the notation "printf@got" simply means "offset of address entry for printf in GOT". 5. All global data references also have to be done through the got, i.e.: lw $t1,myglobal@got($s7) lw $t0,0($t1) With the changes above, QNX PIC code is truly relocatable, and does not require the calling code to be compiled PIC. Thus, the non-library code (the "executables") can be normally-compiled MIPS objects. C. Relocations In order for the executable and the library to share global data, we must define a new copy relocation type. This is similar to what is already defined in the X86 and PPC ABIs. The new relocation is defined as follows: #define R_MIPS_QNX_COPY 126 An R_MIPS_QNX_COPY relocation is emitted by the linker whenever a data symbol defined in a shared library is used in an executable. It results in space being allocated for this symbol in the executable's bss. At process startup, the dynamic linker copies the data from the library to the process, and ensures that all library code points to the executable's copy of the symbol. D. Calling library functions from the main executable Calling functions in the library from non-PIC code (i.e. from the main executable) must be done through stubs. These are generated automatically by the linker for any function that is located in a shared library and is called by the main executable. The stub's purpose is to load that function's address from the executable's GOT, and then jump to the function. For example, if the executable calls printf(), then the following stub will be generated (and the executable will actually call this stub instead of directly calling printf): printf_stub: lw $25, printf@got($gp) jr $25 nop E. Toolchain modifications In order to implement QNX PIC code generation, the following modifications to the toolchain were needed: 1. CC1: Modify cc1 so that, when the -mqnxpic option is passed, it generates code which follows the above calling convention. Note that the code to compute the GOT address in the function prologue is generated by the assembler. The compiler outputs the ".cpload" pseudo-op, which the assembler expands. The compiler also instructs the assembler to generate QNX PIC code by emmitting the ".set qnxpiccalls" at the beginning of every assembly file. An example of cc1 output for QNX PIC code is shown below: __________________________________________ .file 1 "test.c" .qnxpiccalls gcc2_compiled.: __gnu_compiled_c: .globl main .ent main main: .frame $fp,72,$31 # vars= 32, regs= 4/0, args= 24, extra= 0 subu $sp,$sp,72 sw $ra,68($sp) sw $fp,64($sp) sw $s7,60($sp) sw $s0,56($sp) move $fp,$sp .cpload $31 # Psuedo-op to load GOT ptr into s7 la $16,printf jal $31,$16 __________________________________________ Thus, registers which need to be saved are pushed on the stack in the function prologue, including $ra and $s7 which are destroyed by the ".cpload" pseudo-op. 2. GAS The GNU assembler ("GAS") was also modified to generate QNX PIC code. As mentionned above, the ".set qnxpiccalls" pseudo-op can be used to indicate to the assembler that QNX PIC code is being generated. The assembler will also expand the ".cpload" pseudo-op into the right code sequence (including the appropriate relocations). The assembler's behavior with respect to global symbols defined in the current source file was modified. The default behavior is for the assembler to emit a single "section" GOT symbol for the file's global data, and compute address of the data symbols as offsets from that section symbol. This has the advantage of saving GOT entries for global symbols which are only used in the source file where they are defined, but has the disadvantage that it is impossible to override which copy of a given global symbol that source file point to. Thus, when several libraries define the same data symbol, it may not be possible to have all functions point to the same copy of that symbol. In the case of QNX PIC code, all global symbols get a distinct GOT entry, which solves that problem. Modifications were also done so that GAS did not emit unneccesary nop's when generating code for mips2+ CPU's. Other optimizations also included replacing the "nop" in the ".cpload" pseudo-op by an appropriate op-code, if one was found in the function prologue. The output from GAS for the above assembly code is shown below: -------------------------------------- addiu $sp,$sp,-72 sw $ra,68($sp) sw $fp,64($sp) sw $s7,60($sp) bltzal $zero,0f sw $s0,56($sp) # Assembler optimization 0: lui $s7,0x0 # GOTHI addiu $s7,$s7,0 # GOTLO addu $s7,$s7,$ra lw $s0,0($s7) # GOT16: offset of printf in GOT jalr $s0 --------------------------------------- 3. LD Modifications were also done to 'ld', the GNU linker. These include generating the R_MIPS_QNX_COPY relocations. The second was to have the linker generate the proper stubs. E. Toolchain optimizations GCC code generation was optimized in several ways: - Calls to static functions within the same modules are done using a branch ('bal') instead of a jump. This is implicitely position-independent. - Do not output the .cpload ipseudo-op (to load the GOT address into s7) for functions that do not require it. This includes leaf functions that do not reference any global data, non-leaf functions who only call themselves recursively, and functions which only call static functions in the same module. - Allow GCC to optimize the filling of the branch delay slot for QNX PIC code. - Have GCC explicitely load funtion adresses into a register and do jumps through that register, instead of having the assembler expand this. This allows GCC to do commom subexpression elimination of function adresses, and also allows the GCC scheduler to do the address load a few cycles before the jump.