public inbox for gnu-gabi@sourceware.org
 help / color / mirror / Atom feed
* [AArch64 ELF ABI] Vector calls and lazy binding on AArch64
@ 2019-01-01  0:00 Szabolcs Nagy
  2019-01-01  0:00 ` Florian Weimer
  2019-01-01  0:00 ` Szabolcs Nagy
  0 siblings, 2 replies; 6+ messages in thread
From: Szabolcs Nagy @ 2019-01-01  0:00 UTC (permalink / raw)
  To: GNU C Library, Binutils, GCC Development, gnu-gabi
  Cc: nd, Ramana Radhakrishnan, Richard Earnshaw, Tejas Belagod,
	Richard Sandiford, Steve Ellcey, Richard Henderson

The lazy binding code of aarch64 currently only preserves q0-q7 of the
fp registers, but for an SVE call [AAPCS64+SVE] it should preserve p0-p3
and z0-z23, and for an AdvSIMD vector call [VABI64] it should preserve
q0-q23. (Vector calls are extensions of the base PCS [AAPCS64].)

A possible fix is to save and restore the additional register state in
the lazy binding entry code, this was discussed in

  https://sourceware.org/ml/libc-alpha/2018-08/msg00017.html

the main objections were

(1) Linux may optimize the kernel entry code for processes that don't
    use SVE, so lazy binding should avoid accessing SVE registers.

(2) If this is fixed in the dynamic linker, vector calls will not be
    backward compatible with old glibc.

(3) The saved SVE register state can be large (> 8K), so binaries that
    work today may run out of stack space on an SVE system during lazy
    binding (which can e.g. happen in a signal handler on a tiny stack).

and the proposed solution was to force bind now semantics for vector
functions e.g. by not calling them via PLT. This turned out to be harder
than I expected. I no longer think (1) and (2) are critically important,
but (3) is a correctness issue which is hard to argue away (would
require larger stack allocations to accommodate the worst case stack
size increase, but the stack allocation is not always under the control
of glibc, so it cannot provide strict guarantees).

Some approaches to make symbols "bind now" were discussed at

  https://groups.google.com/forum/#!topic/generic-abi/Bfb2CwX-u4M

The ABI change draft is below the notes, it requires marking symbols
in the ELF symbol table that follow the vector PCS (or other variant
PCS conventions). This is most relevant to dynamic linkers with lazy
binding support and to ELF linkers targeting AArch64, but assemblers
will need to be updated too.

Note 1: the dynamic linker may have to run user code during lazy binding
because of ifunc resolvers, so it cannot avoid clobbering fp regs.

Note 2: the tlsdesc entry is also affected by (3), so either the the
initial DTV setup should avoid clobbering fp regs or the SVE register
state should not be callee-preserved by the tlsdesc call ABI (the latter
was chosen, which is backward compatible with old dynamic linkers, but
tls access from SVE code is as expensive as an extern call now: the
caller has to spill).

Note 3: signal frame and SVE register spills in code using SVE can also
lead to variable stack usage (AT_MINSIGSZTKSZ was introduced to address
the former issue on linux) so it is a valid approach to just increase
min stack size limits on aarch64 compared to other targets (this is less
invasive, but does not fix old binaries).

Note 4: the proposal requires marking symbols in asm and elf objects, so
it is not compatible with existing tooling (old as or ld cannot create
valid vector function symbol references or definitions) and it is only
effective with a new dynamic linker.

Note 5: -fno-plt style code generation for vector function calls might
have worked too, but on aarch64 it requires compiler and linker changes
to avoid PLT in position dependent code when that is emitted for the
sake of pointer equality. It also requires tightening the ABI to ensure
the static linker does not introduce PLT when processing certain static
relocations. This approach would generate suboptimal static linked code
(the no-plt code is hard to relax into direct calls on aarch64) fragile
(easy to accidentally introduce a PLT) and hard to diagnose.

Note 6: the proposed solution applies to both SVE calls and AdvSIMD
vector calls, even though some issues only apply to SVE.

Note 7: a separate dynamic linker entry point for variant PCS calls
may be introduced (requires further ELF changes for a PLT0 like stub)
or the dynamic linker may decide to always preserve all registers or
decide to always bind symbols at load time.


AAELF64: in the Symbol Table section add

 st_other Values
     The  st_other  member  of  a symbol table entry specifies the symbol's
     visibility in the lowest 2 bits.  The top 6 bits  are  unused  in  the
     generic  ELF ABI [SCO-ELF], and while there are no values reserved for
     processor-specific semantics, many other architectures have used these
     bits.

     The  defined  processor-specific  st_other  flag  values are listed in
     Table 4-5-1.

 Table 4-5-1, Processor specific st_other flags
             +------------------------+------+---------------------+
             |Name                    | Mask | Comment             |
             +------------------------+------+---------------------+
             |STO_AARCH64_VARIANT_PCS | 0x80 | The        function |
             |                        |      | associated with the |
             |                        |      | symbol may follow a |
             |                        |      | variant   procedure |
             |                        |      | call  standard with |
             |                        |      | different  register |
             |                        |      | usage convention.   |
             +------------------------+------+---------------------+

     A  symbol  table entry that is marked with the STO_AARCH64_VARIANT_PCS
     flag set in its st_other field may be associated with a function  that
     follows  a  variant  procedure  call  standard with different register
     usage convention from the one  defined  in  the  base  procedure  call
     standard  for  the  list  of  argument,  caller-saved and callee-saved
     registers [AAPCS64].  The rules  in  the  Call  and  Jump  relocations
     section  still  apply to such functions, and if a subroutine is called
     via a symbol reference that  is  marked  with  STO_AARCH64_VARIANT_PCS
     then  code that runs between the calling routine and called subroutine
     must preserve the contents of all registers except IP0,  IP1  and  the
     condition code flags [AAPCS64].

     Static  linkers  must  preserve  the  marking  and propagate it to the
     dynamic symbol table if any reference or definition of the  symbol  is
     marked  with STO_AARCH64_VARIANT_PCS, and add a DT_AARCH64_VARIANT_PCS
     dynamic tag if required by the Dynamic Section section.

     NOTE:
        In particular, when a call is made via the PLT entry  of  a  symbol
        marked with STO_AARCH64_VARIANT_PCS, a dynamic linker cannot assume
        that the call follows the register usage  convention  of  the  base
        procedure call standard.

        An  example  of  a  function  that follows a variant procedure call
        standard with different register usage convention is one that takes
        parameters in scalable vector or predicate registers.


AAELF64: in the Dynamic Section section add

 Table 5-4, AArch64 specific dynamic array tags
   +-----------------------+------------+-------+------------+---------------+
   |Name                   | Value      | d_un  | Executable | Shared Object |
   +-----------------------+------------+-------+------------+---------------+
   |DT_AARCH64_VARIANT_PCS | 0x70000005 | d_val | Platform   | Platform      |
   |                       |            |       | specific   | Specific      |
   +-----------------------+------------+-------+------------+---------------+

     DT_AARCH64_VARIANT_PCS must be present if there are  R_<CLS>_JUMP_SLOT
     relocations     that     reference    symbols    marked    with    the
     STO_AARCH64_VARIANT_PCS flag set in their st_other field.


VABI64: after the Vector Procedure Call Standard section add

 Dynamic linking for AAVPCS
     On ELF platforms with dynamic linking support, symbol definitions  and
     references must be marked with the STO_AARCH64_VARIANT_PCS flag set in
     their st_other field if the following holds:

     1. the symbol is visible outside of its defining component (executable
        file or shared object), and

     2. the  symbol  is  associated  with  a  function following the AAVPCS
        convention.

     For more information on STO_AARCH64_VARIANT_PCS, see AAELF64.

     NOTE:
        Marking all function symbol definitions and references is  a  valid
        way of implementing this requirement.


[AAELF64]: ELF for the Arm 64-bit Architecture (AArch64)
           https://developer.arm.com/docs/ihi0056/latest
[VABI64]:  Vector Function ABI Specification for AArch64
           https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi
[AAPCS64]: Procedure Call Standard for the Arm 64-bit Architecture (AArch64)
           https://developer.arm.com/docs/ihi0055/latest
[AAPCS64+SVE]: Procedure Call Standard for the ARM 64-bit Architecture
           (AArch64) with SVE support
           https://developer.arm.com/docs/100986/latest
[SCO-ELF]: System V Application Binary Interface
           http://www.sco.com/developers/gabi/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-06-28 13:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-01  0:00 [AArch64 ELF ABI] Vector calls and lazy binding on AArch64 Szabolcs Nagy
2019-01-01  0:00 ` Florian Weimer
2019-01-01  0:00   ` Szabolcs Nagy
2019-01-01  0:00     ` Florian Weimer
2019-01-01  0:00       ` Szabolcs Nagy
2019-01-01  0:00 ` Szabolcs Nagy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).