From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16015 invoked by alias); 10 Mar 2013 23:43:46 -0000 Received: (qmail 15973 invoked by uid 48); 10 Mar 2013 23:43:32 -0000 From: "olegendo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/56592] New: [SH] Add vector ABI Date: Sun, 10 Mar 2013 23:43:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: olegendo at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2013-03/txt/msg00814.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56592 Bug #: 56592 Summary: [SH] Add vector ABI Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned@gcc.gnu.org ReportedBy: olegendo@gcc.gnu.org CC: kkojima@gcc.gnu.org Target: sh*-*-* On SH there are a couple of ABI related issues which unfortunately can't be all fixed without breaking binary compatibility. Thus the idea to add a new ABI which can be selected by a target -mabi=vector option. Already existing ABIs could also be selected based on this option: -mrenesas -> -mabi=renesas -mnorenesas -> -mabi=gnu Some of the primary issues that the vector ABI is supposed to improve are: ---------------------- PR 13423 sh-elf: V4SFmode passed in integer registers float vectors, float arrays (of fixed size) or structs of floats when passed by value should be passed in FP regs entirely. The current ABI allows passing of up to 8 FP regs (FR4..FR11), so there would be space to pass two 4D float vectors. It should also be possible to return a 4D float vectors in registers. Since FR0..FR11 are call clobbered, they can as well be used to return multiple vectors. ---------------------- PR 53513 SH Target: Add support for fschg and fpchg insns Although this PR could be solved without breaking the ABI too much, there are some issues which could be fixed in a new ABI. The current approach is to use two global variables (__fpscr_values) in order to perform FPU single/double mode switching. The default FPU precision setting is defined by an -m option. Currently there are three such FPU default modes: - double mode default - single mode default - single mode only When changing the FPU mode the current FPSCR setting is overwritten with one of the global values from __fpscr_values. This is the fastest way (on non-SH4A) to perform a mode switch, but it has some disadvantages. One of them is PR 6526. In general all information in FPSCR is lost after performing a mode switch this way, e.g. it is not possible to read FPU exception causes after a series of operations. Moreover, in multi-threaded environments it is not possible to set the default FPSCR setting (e.g. rounding mode or denormal handling) for threads independently. In order to minimize mode switches the function signature can be taken into account when deciding the default FPU precision for a particular function. E.g. when a function has any double precision arguments, it can be assumed that the function will use the double values in some way. Thus the default entry mode for such a function should be 'double'. Similarly, for functions that return double values it can be better to leave the function with 'double' mode. Because of this, '-m4 -mvabi' and '-m4-single -mvabi' would actually result in the same ABI. It should also be possible to override the FPSCR.PR settings for function entry and function leave via function attributes. This can be useful e.g. in cases where hand written asm FPU routines are invoked from C/C++ code that expect certain settings. E.g. code that uses the 'frchg' insn to flip FPSCR.FR bit on SH4 must be executed with FPSCR.PR = 0. ---------------------- PR 52441 Target: Double sign/zero extensions for function arguments Values that are passed in registers that are < 32 bit in size have usually undefined high bits. The standard GNU calling convention thus performs sign/zero extension of such values before the function call and inside the function itself. The Renesas calling convention (-mrenesas) however only extends values inside the function. Whether an extension is actually required at all depends on how the value is used. This is known only inside of a function. Thus adopting the Renesas calling convention in this case is more efficient. ---------------------- Register ordering for arguments. I don't remember in which PR this was mentioned but the current GNU calling convention allocates FR registers on big endian like: FR4 = arg0 FR5 = arg1 FR6 = arg2 FR7 = arg3 ... and on little endian: FR4 = arg1 FR5 = arg0 FR6 = arg3 FR7 = arg2 ... This can make writing endian neutral asm code more complicated. The ordering for little endian should be the same as for big endian (which is also equivalent to the -mrenesas ABI). ---------------------- Alignment of double precision FP values. Currently the default alignment for those is 32 bit and can be changed to 64 bit by the option -mdalign. In order to be able to maximize the utilization of 64 bit fmov insns, 64 bit double alignment should be the default. ---------------------- Boolean function return values A boolean return value of a function tends to be produced inside the function by using some sort of comparison insns which store the comparison result in the T bit. The T bit is then transferred to a GP reg before returning from the function. On the caller side, the value in the GP reg is then often tested for != 0 followed by a conditional branch. The redundant != 0 test can be eliminated by returning boolean values in the T bit directly. However, there might be compatibility problems with C code that typedefs its own bool type as signed/unsigned char or something else. ---------------------- Variadic functions Passing variable number of arguments ('...') over the stack as it is currently done with -mrenesas tends to produce more efficient code, especially when traversing the va_list . ---------------------- ABI summary I've got so far R0..R3: Call-clobbered. Function return values / scratch registers. High bits of values < 32 bit are undefined. R4..R7: Call-clobbered. Function arguments / scratch registers. High bits of values < 32 bit are undefined. R8..R15: Call-saved. R15: stack pointer R14: frame pointer (optional) R12: GOT pointer (optional, for PIC code) PR: Call-saved. Function return address. SR.S: '0' (MAC saturation disabled) at function entry and function leave. SR.T: Call-clobbered. Boolean return value. SR.M, SR.Q: Call-clobbered. Other SR bits: Ignored by the compiler. GBR: Call-saved. Pointer to current execution context (thread). MACL,MACH: Call-clobbered. Scratch registers. FPUL: Call-clobbered. Scratch register. FR0..FR3: Call-clobbered. Function return values / scratch registers. FR4..FR7: Call-clobbered. Function arguments / return values / scratch registers. FR8..FR11: Call-clobbered. Function arguments / scratch registers. FR12..FR15: Call-saved. Local variables. XF0..XF15: Undefined, not modified by compiler generated code. FPSCR.FR: Undefined, not modified by compiler generated code. FPSCR.SZ: '0' (32 bit fmov) on function entry / leave by default. FPSCR.PR: Function entry: '0' (single precision) if the function takes no floating point arguments, or if the number of 'float' arguments is greater than the number of 'double' arguments, '1' otherwise. Function leave: Unmodified if the function returns 'void' or integral values or aggregates. '0' if the function returns more 'float' values than 'double' values, '1' otherwise. '0' on exception handler entry. Other FPSCR bits: Undefined, not modified by compiler generated code. When counting the number of 'float' and 'double' values elements of vectors are counted as individual values. I.e. a 4D 'float' vector has more 'float' values than a 2D 'double' vector has 'double' values. va_args are ignored. Function argument/return value aggregates are decomposed so that the individual members can be passed in different register classes, based on the data type. E.g. struct FuncArg { int a; // -> r4 int b; // -> r5 float c; // -> fr4 }; struct FuncArg { int a; // -> r4 int b; // -> r5 float c; // -> fr4 double d; // -> dr6 (fr6:fr7) bool e; // -> T float f; // -> fr5 }; struct FuncArg { int a; // -> r4 int b; // -> r5 int c; // -> r6 int d; // -> r7 }; struct FuncArg { int a; // -> r4 int b; // -> r5 int c; // -> r6 long long d; // -> stack short e; // -> r7 }; struct FuncArg { float a; // -> fr4 float b; // -> fr5 float c; // -> fr6 float d; // -> fr7 }; Return values/aggregates that don't fit into registers are returned partially in registers and partially onto the caller's stack. In this case R2 is used to pass the hidden pointer to the remaining return values. Argument aggregates that don't fit into registers are passed partially in registers and the remaining pieces are pushed onto the stack. va_args are passed on the stack entirely (simpler traversal of va_list). 'double' values are passed in DR registers, where the high 32 bits are passed in FR(n*2) and the low 32 bits in FR(n*2+1) regardless of the endian setting. 4D 'float' vectors are passed in FV registers, i.e. FR(n*4), in order to avoid reg copies before vector insns (fipr, ftrv). SH targets that don't support double precision floating-point in hardware handle the operations in software, but should accept the same ABI otherwise. This would fix e.g. PR 36939. I'm not sure how to integrate untyped calls and whether this kind of ABI would require additional extensions to GDB. Probably there are also lots of other details missing for this to be a complete ABI definition. Any suggestions and feedback is highly appreciated.