From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-57082-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 10252 invoked by alias); 24 Jul 2002 18:34:54 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 10147 invoked from network); 24 Jul 2002 18:34:53 -0000
Received: from unknown (HELO hub.ott.qnx.com) (209.226.137.76)
  by sources.redhat.com with SMTP; 24 Jul 2002 18:34:53 -0000
Received: from smtp.ott.qnx.com (smtp.ott.qnx.com [10.0.2.158])
	by hub.ott.qnx.com (8.9.3/8.9.3) with ESMTP id OAA16466
	for <gcc@gcc.gnu.org>; Wed, 24 Jul 2002 14:32:09 -0400
Received: from node128.ott.qnx.com (node128 [10.0.0.128]) by smtp.ott.qnx.com (8.8.8/8.6.12) with ESMTP id OAA21628 for <gcc@gcc.gnu.org>; Wed, 24 Jul 2002 14:27:58 -0400
Received: by node128.ott.qnx.com (8.9.3/8.9.3) id SAA897593406
	for gcc@gcc.gnu.org; Wed, 24 Jul 2002 18:32:36 GMT
Message-Id: <200207241832.SAA897593406@node128.ott.qnx.com>
Subject: QNX Neutrino mips PIC - RFC
To: gcc@gcc.gnu.org
Date: Wed, 24 Jul 2002 15:53:00 -0000
From: "Graeme Peterson" <gp@qnx.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-SW-Source: 2002-07/txt/msg01171.txt.bz2

Hi, all.

I am working on QNX Neutrino support in gcc, gdb and binutils
for arm, mips, ppc, sh4 and x86.

Here is a doc describing the QNX PIC conventions used on
mips.  Any and all feedback appreciated.  

I have also posted this on the binutils mailing list.

Thanks in advance.

Regards,
GP

=======================


This is a preliminary document describing the calling convention 
used by PIC code on QNX/Neutrino running on the MIPS.


A. Introduction

The MIPS ABI describes a calling-convention for implementing Position-
Independent Code ('PIC'). While the ABI calling convention is well-established, 
it has a couple of drawbacks which make it less than ideal for use in an
embedded environment. These are listed below:

1. Because of the way an ABI PIC function determines the address of its 
Global Offset Table ('GOT'), it requires its own address to be passed
in register $25 on function entry. The practical consequence of this 
requirement is that all functions (in both the PIC libraries 
and the executable) have to do indirect calls, i.e. calls through
register $25. This means that all code must be compiled PIC (and pay the size
penalty of PIC code).

2. The GCC compiler/assembler does not do a great job at code generation for 
MIPS 'abicalls' code (PIC code). Unnessecary NOP's are being inserted, and the
function prologue always contains the code to compute the GOT address, even
if there were no GOT references within that function.

The first point was particularly troublesome, as it meant that all applications
using a shared library would have to be compiled PIC, which results in
a significant code size increase.

For both of these reasons, we decided that simple modifications to the 
calling convention could solve the first problem. While coding the
new calling convention in GCC, we also implemented various optimizations, which
reduced the code expansion of PIC code. The following sections describe the
new MIPS PIC convention, hereafter called "QNX PIC".


B. QNX PIC calling convention on MIPS

The calling convention for PIC code follows the ABI spec for register 
assignement, stack layout and parameter passing. However, it differs from
the ABI in the following respects:

1. PIC code should never damage the gp ($28) register.

2. PIC code reserves register s7 ($23) to store the address of its GOT.
   All symbol references within that PIC module ("library") are made 
   through the GOT, and are thus addressed as offsets from s7. 

3. Every PIC function which needs to access a symbol from the GOT should
   load register s7 at the end of the function prologue, before any GOT symbols
   are accessed. The code used to load s7 with the address of the GOT is as 
   follows:

   bltzal 	$0,0
   nop
0: lui		$s7, %gothi
   addiu	$s7, $s7, %gotlo
   add		$s7, $s7, $ra

   The %gothi / %gotlo pair are special relocations output by the assembler.

   Since the above code implicitely destroys $ra and $s7, they must be saved 
   in the function prologue prior to the loading of the GOT. 

4. All function calls from a PIC function have to be indirect calls, done
   through a register. However, this does not have to be $25 as in ABI PIC
   code:

   la 	$t3, printf
   jalr	$t3 

   which becomes:

   lw	$t3,printf@got($s7)
   jalr	$t3

   Note that the notation "printf@got" simply means "offset of address 
   entry for printf in GOT".

5. All global data references also have to be done through the got, i.e.:

   lw	$t1,myglobal@got($s7)
   lw	$t0,0($t1)

With the changes above, QNX PIC code is truly relocatable, and does not
require the calling code to be compiled PIC. Thus, the non-library code 
(the "executables") can be normally-compiled MIPS objects.


C. Relocations

In order for the executable and the library to share global data, we
must define a new copy relocation type. This is similar to what is
already defined in the X86 and PPC ABIs. The new relocation is defined
as follows:

#define R_MIPS_QNX_COPY 126

An R_MIPS_QNX_COPY relocation is emitted by the linker whenever a data symbol
defined in a shared library is used in an executable. It results in space
being allocated for this symbol in the executable's bss. At process startup,
the dynamic linker copies the data from the library to the process, and 
ensures that all library code points to the executable's copy of the
symbol.


D. Calling library functions from the main executable

Calling functions in the library from non-PIC code (i.e. from the main 
executable) must be done through stubs. These are generated automatically 
by the linker for any function that is located in a shared library
and is called by the main executable. The stub's purpose is to load that
function's address from the executable's GOT, and then jump to the 
function. For example, if the executable calls printf(), then the following
stub will be generated (and the executable will actually call this stub
instead of directly calling printf):

	printf_stub:
		lw	$25, printf@got($gp)
		jr	$25
		nop


E. Toolchain modifications

In order to implement QNX PIC code generation, the following modifications 
to the toolchain were needed:

1. CC1:
	Modify cc1 so that, when the -mqnxpic option is passed, it generates
	code which follows the above calling convention. Note that the code
	to compute the GOT address in the function prologue 
	is generated by the assembler. The compiler
	outputs the ".cpload" pseudo-op, which the assembler expands. 
	The compiler also instructs the assembler to generate QNX PIC 
	code by emmitting the ".set qnxpiccalls" at the beginning of 
	every assembly file. An example of
	cc1 output for QNX PIC code is shown below:

__________________________________________ 
	.file	1 "test.c"
	.qnxpiccalls
gcc2_compiled.:
__gnu_compiled_c:
	.globl	main
	.ent	main
main:
	.frame	$fp,72,$31		# vars= 32, regs= 4/0, args= 24, extra= 0
	subu	$sp,$sp,72
	sw	$ra,68($sp)
	sw	$fp,64($sp)
	sw	$s7,60($sp)
	sw	$s0,56($sp)
	move	$fp,$sp
	.cpload $31			# Psuedo-op to load GOT ptr into s7
	la	$16,printf
	jal	$31,$16
__________________________________________
	
	Thus, registers which need to be saved are pushed on the stack
	in the function prologue, including $ra and $s7 which are destroyed
	by the ".cpload" pseudo-op.

2. GAS
	The GNU assembler ("GAS") was also modified to generate QNX PIC code.
	As mentionned above, the ".set qnxpiccalls" pseudo-op can be used to
	indicate to the assembler that QNX PIC code is being generated. The
	assembler will also expand the ".cpload" pseudo-op into the right 
	code sequence (including the appropriate relocations). 

	The assembler's behavior with respect to global symbols 
	defined in the current source file was modified. The default
	behavior is for the assembler to emit a single "section" GOT symbol
	for the file's global data, and compute address of the data symbols
	as offsets from that section symbol. This has the advantage of saving
	GOT entries for global symbols which are only used in the source file
	where they are defined, but has the disadvantage that it is impossible
	to override which copy of a given global symbol that source file
	point to. Thus, when several libraries define the same data symbol, 
	it may not be possible to have all functions point to the same copy 
	of that symbol. In the case of QNX PIC code, all global symbols
	get a distinct GOT entry, which solves that problem.
	
	Modifications were also done so that GAS did not emit unneccesary
	nop's when generating code for mips2+ CPU's. Other optimizations also
	included replacing the "nop" in the ".cpload" pseudo-op by an appropriate
	op-code, if one was found in the function prologue. The output from GAS
	for the above assembly code is shown below:

--------------------------------------
	addiu	$sp,$sp,-72
	sw		$ra,68($sp)
	sw		$fp,64($sp)
	sw		$s7,60($sp)
	bltzal	$zero,0f
	sw		$s0,56($sp)			# Assembler optimization
0:	lui		$s7,0x0				# GOTHI
	addiu	$s7,$s7,0				# GOTLO
	addu	$s7,$s7,$ra
	lw		$s0,0($s7)			# GOT16: offset of printf in GOT
	jalr	$s0
---------------------------------------


3. LD
	Modifications were also done to 'ld', the GNU linker. These include
	generating the R_MIPS_QNX_COPY relocations. The second was to have
	the linker generate the proper stubs.  


E. Toolchain optimizations

	GCC code generation was optimized in several ways:

	- Calls to static functions within the same modules are done 
	using a branch ('bal') instead of a jump. This is implicitely 
	position-independent.

	- Do not output the .cpload ipseudo-op (to load the GOT address into 
	s7) for functions that do not require it. This includes leaf 
	functions that do not reference any global data, non-leaf 
	functions who only call themselves recursively, and functions 
	which only call static functions in the same module.

	- Allow GCC to optimize the filling of the branch delay slot for
	QNX PIC code.

	- Have GCC explicitely load funtion adresses into a register and 
	do jumps through that register, instead of having the assembler 
	expand this. This allows GCC to do commom subexpression 
	elimination of function adresses, and also allows the GCC 
	scheduler to do the address load a few cycles 
	before the jump.