public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
From: "H.J. Lu" <hjl.tools@gmail.com>
To: discuss@x86-64.org
Cc: GCC Development <gcc@gcc.gnu.org>,
	Binutils <binutils@sourceware.org>,
		GNU C Library <libc-alpha@sourceware.org>,
	GDB <gdb@sourceware.org>,
	x32-abi@googlegroups.com
Subject: [x86-64 psABI] RFC: Extend x86-64 psABI to support x32
Date: Mon, 14 May 2012 17:31:00 -0000	[thread overview]
Message-ID: <CAMe9rOqE84CeCEZHxahccP2obgb50zdJWuY0z3UzWnDYn=g_4A@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 414 bytes --]

Hi,

Support for the x32 psABI:

http://sites.google.com/site/x32abi/

is added in Linux kernel 3.4-rc1.  X32 uses the ILP32 model for x86-64
instruction set with size of long and pointers == 4 bytes.  X32 is
already supported in GCC 4.7.0 and binutils 2.22.  I am now working
to integrate x32 support into GLIBC 2.16 and GDB 7.5   Here is a
patch to extend x86-64 psABI for x32.  Any comments?

Thanks.

-- 
H.J.

[-- Attachment #2: psabi-x32.patch --]
[-- Type: application/octet-stream, Size: 27058 bytes --]

2012-05-14  H.J. Lu  <hongjiu.lu@intel.com>

	* abi.tex (title): Mention LP64/X32. 
	(author): Add H.J. Lu and Milind Girkar.
	Include x32.tex.

	* development.tex: Add _ILP32 and __ILP32__ for x32.  Also
	document _LP64 and __LP64__.

	* dl.tex: List X32 program interpreter.

	* introduction.tex (Introduction): Add a label.
	Describe X32 and LP64.

	* low-level-sys-info.tex (Scalar Types table): Add X32/LP64 to
	long and long long.  Modify long and pointer types for X32 and
	LP64.  Use \myfontsize instead of \small.
	(Architectural Constraints): Add a lebel.  Mention small model
	for X32.

	* macros.tex (myfontsize): New.

	* object-files.tex (Programming Model): New subsubsection
	(File Class): Likewise.
	(Data Encoding): Likewise.
	(Processor identification}): Likewise.
	(Relocation Types): Add wordclass.  Allow Elf32_Rel relocations
	within x32 executable files or shared objects.
	(Relocation Types): Use small font.  Mark R_X86_64_GLOB_DAT,
	R_X86_64_JUMP_SLOT, R_X86_64_RELATIVE and R_X86_64_IRELATIVE
	with wordclass.  Mark R_X86_64_PC64, R_X86_64_GOTOFF64 and
	R_X86_64_SIZE64 used only for LP64.  Add R_X86_64_RELATIVE64 for
	x32.

	* x32.tex: New file.

diff --git a/abi.tex b/abi.tex
index 2b56d94..4de644e 100644
--- a/abi.tex
+++ b/abi.tex
@@ -5,13 +5,16 @@
 \begin{document}
 
 \author{Edited by\\
+  H.J. Lu\thanks{hongjiu.lu@intel.com},
+  Milind Girkar\thanks{milind.girkar@intel.com},\\
   Michael Matz\thanks{matz@suse.de},
   Jan Hubi\v{c}ka\thanks{jh@suse.cz}, Andreas Jaeger\thanks{aj@suse.de},
   Mark Mitchell\thanks{mark@codesourcery.com}}
 
 \title{System V Application Binary Interface\\
-{\Large AMD64 Architecture Processor Supplement\\
-Draft Version \version}}
+{\Large AMD64 Architecture Processor Supplement}\\
+{\large (With LP64 and X32 Programming Models)}\\
+{\Large Draft Version \version}}
 \maketitle
 \tableofcontents
 \listoftables
@@ -99,6 +102,7 @@ Draft Version \version}}
   place or removed completely.}
 \include{conventions}
 \include{fortran}
+\include{x32}
 
 \appendix
 \include{kernel}
diff --git a/development.tex b/development.tex
index d1388b5..10669e1 100644
--- a/development.tex
+++ b/development.tex
@@ -2,18 +2,24 @@
 \chapter{Development Environment}
 
 During compilation of C or C++ code at least the symbols in
-table \ref{prepro_defines} are defined by the pre-processor.
+table \ref{prepro_defines} are defined by the pre-processor
+\footnote{\code{__LP64} and \code{__LP64__} were added to GCC 3.3 in
+March, 2003.}.
 
 \begin{table}[H]
 \Hrule
 \caption{Predefined Pre-Processor Symbols}
 \label{prepro_defines}
-  \begin{center}\code{
-    \begin{tabular}[t]{l}
-      __amd64\\
-      __amd64__\\
-      __x86_64\\
-      __x86_64__\\
+  \begin{center}\small\code{
+    \begin{tabular}[t]{ll}
+      __amd64      & Defined for both LP64 and X32 programming models.\\
+      __amd64__    & Defined for both LP64 and X32 programming models.\\
+      __x86_64     & Defined for both LP64 and X32 programming models.\\
+      __x86_64__   & Defined for both LP64 and X32 programming models.\\
+      _LP64        & Defined for LP64 programming model.\\
+      __LP64__     & Defined for LP64 programming model.\\
+      _ILP32       & Defined for X32 programming model.\\
+      __ILP32__    & Defined for X32 programming model.\\
     \end{tabular}
   }\end{center}
 \Hrule
diff --git a/dl.tex b/dl.tex
index a67f4f8..8fecec7 100644
--- a/dl.tex
+++ b/dl.tex
@@ -355,17 +355,24 @@ use.
 
 \subsection{Program Interpreter}
 
-There is one valid \textindex{program interpreter} for
-programs conforming to the \xARCH ABI:
-
-\bigskip
-\path{/lib/ld64.so.1}
-
-However, Linux puts this in
-
-\bigskip
-\path{/lib64/ld-linux-x86-64.so.2}
+The valid \textindex{program interpreter} for programs conforming to the
+\xARCH ABI is listed in Table \ref{interp}, which also contains the
+\textindex{program interpreter} used by Linux.
 
+\begin{figure}
+  \caption{\xARCH Program Interpreter}
+  \label{interp}
+  \begin{center}
+    \begin{tabular}[t]{l|l|l}
+      \multicolumn{1}{c}{Data Model} & \multicolumn{1}{c}{Path} &
+      \multicolumn{1}{c}{Linux Path} \\
+      \hline
+      LP64 & \path{/lib/ld64.so.1} & \path{/lib64/ld-linux-x86-64.so.2} \\
+      \hline
+      X32 & \path{/lib/ldx32.so.1} & \path{/libx32/ld-linux-x32.so.2} \\
+    \end{tabular}
+  \end{center}
+\end{figure}
 
 \subsection{Initialization and Termination Functions}
 
diff --git a/introduction.tex b/introduction.tex
index 2148ab9..aa89e87 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -1,4 +1,4 @@
-\chapter{Introduction}
+\chapter{Introduction\label{intro}}
 
 The AMD64\footnote{AMD64 has been previously called x86-64.  The
   latter name is used in a number of places out of historical reasons
@@ -15,6 +15,13 @@ compatibility modes.  The \xARCH ABI does not apply to such programs;
 this document applies only programs running in the ``long'' mode
 provided by the \xARCH architecture.
 
+Binaries using the \xARCH instruction set may program to either a 32-bit
+model, in which the C data types \code{int}, \code{long} and all
+pointer types are 32-bit objects (X32); or to a 64-bit model,
+in which the C \code{int} type is 32-bits but the C \code{long} type
+and all pointer types are 64-bit objects (LP64). This specification
+covers both LP64 and X32 programming models.
+
 Except where otherwise noted, the \xARCH architecture ABI follows the
 conventions described in the \intelabi.  Rather than replicate the
 entire contents of the \intelabi, the \xARCH ABI indicates only those
diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index b030e42..15b5a5d 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -32,7 +32,7 @@ scalar types and the processor's.  \code{__int128}, \code{__float128},
   \caption{Scalar Types}\label{basic-types}
 { % Use small here - the table is still too large
   % Has anybody an idea how to shrink the table so that it fits the page?
-  \small
+  \myfontsize
   \begin{tabular}{l|l|c|c|l}
     \hline\noalign{\smallskip}
      & &  & \multicolumn{1}{c|}{Alignment} & \multicolumn{1}{c|}{\xARCH} \\
@@ -58,12 +58,19 @@ scalar types and the processor's.  \code{__int128}, \code{__float128},
     \cline{2-5}
     & \texttt{unsigned int} & 4 & 4 & unsigned \fourbyte \\
     \cline{2-5}
-    & \texttt{long} & 8 & 8 & signed \eightbyte \\
-    & \texttt{signed long} & & \\
-    & \texttt{long long} & & \\
+    & \texttt{long (LP64)} & 8 & 8 & signed \eightbyte \\
+    & \texttt{signed long (LP64)} & & \\
+    \cline{2-5}
+    & \texttt{unsigned long (LP64)} & 8 & 8 & unsigned \eightbyte \\
+    \cline{2-5}
+    & \texttt{long (X32)} & 4 & 4 & signed \fourbyte \\
+    & \texttt{signed long (X32)} & & \\
+    \cline{2-5}
+    & \texttt{unsigned long (X32)} & 4 & 4 & unsigned \fourbyte \\
+    \cline{2-5}
+    & \texttt{long long} & 8 & 8 & signed \eightbyte \\
     & \texttt{signed long long} & & \\
     \cline{2-5}
-    & \texttt{unsigned long} & 8 & 8 & unsigned \eightbyte \\
     & \texttt{unsigned long long} & 8 & 8 & unsigned \eightbyte \\
     \cline{2-5}
     & \texttt{__int128}$^{\dagger\dagger}$ & 16 & 16 & signed \sixteenbyte \\
@@ -71,8 +78,12 @@ scalar types and the processor's.  \code{__int128}, \code{__float128},
     \cline{2-5}
     & \texttt{unsigned __int128}$^{\dagger\dagger}$ & 16 & 16 & unsigned \sixteenbyte \\
     \hline
-    Pointer & \texttt{\textit{any-type} *} & 8 & 8 & unsigned \eightbyte \\
-    & \texttt{\textit{any-type} (*)()} & & \\
+    Pointer
+    & \texttt{\textit{any-type} * (LP64)} & 8 & 8 & unsigned \eightbyte \\
+    & \texttt{\textit{any-type} (*)() (LP64)} & & \\
+    \cline{2-5}
+    & \texttt{\textit{any-type} * (X32)} & 4 & 4 & unsigned \fourbyte \\
+    & \texttt{\textit{any-type} (*)() (X32)} & & \\
     \hline
     Floating-& \texttt{float} & 4 & 4 & single (IEEE-754) \\
     \cline{2-5}
@@ -188,9 +199,16 @@ integral values of a specified size.
       \texttt{int} & 1 to 32 & 0 to $2^{w}-1$ \\
       \texttt{unsigned int} & & 0 to $2^{w}-1$ \\
       \hline
-      \texttt{signed long} & & $-2^{w - 1}$ to $2^{w-1}-1$ \\
-      \texttt{long} & 1 to 64 & 0 to $2^{w}-1$ \\
-      \texttt{unsigned long} & & 0 to $2^{w}-1$ \\
+      \texttt{signed long (LP64)} & & $-2^{w - 1}$ to $2^{w-1}-1$ \\
+      \texttt{long (LP64)} & 1 to 64 & 0 to $2^{w}-1$ \\
+      \texttt{unsigned long (LP64)} & & 0 to $2^{w}-1$ \\
+      \hline
+      \texttt{long (X32)} & 1 to 32 & 0 to $2^{w}-1$ \\
+      \texttt{unsigned long (X32)} & & 0 to $2^{w}-1$ \\
+      \hline
+      \texttt{signed long long} & & $-2^{w - 1}$ to $2^{w-1}-1$ \\
+      \texttt{long long} & 1 to 64 & 0 to $2^{w}-1$ \\
+      \texttt{unsigned long long} & & 0 to $2^{w}-1$ \\
     \end{tabular}
   \end{center}
 \Hrule
@@ -1102,7 +1120,7 @@ operations such as calling functions, accessing static objects, and
 transferring control from one part of a program to another.  Unlike
 previous material, this material is not normative.
 
-\subsection{Architectural Constraints}
+\subsection{Architectural Constraints\label{models}}
 
 The \xARCH architecture usually does not allow an instruction to encode
 arbitrary
@@ -1233,6 +1251,9 @@ that are of general interest:
 
 \end{description}
 
+Only small code model and small position independent code model
+(\textindex{PIC}) are used in X32 binaries.
+
 \subsection{Conventions}
 
 In this document some special assembler symbols are used in the coding
diff --git a/macros.tex b/macros.tex
index 0d20eac..9c4f915 100644
--- a/macros.tex
+++ b/macros.tex
@@ -107,6 +107,8 @@
 
 \newcommand*{\cbnew}{\marginpar{\textsf{New}}}
 
+\newcommand{\myfontsize}{\fontsize{9}{11}\selectfont}
+
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: "abi"
diff --git a/object-files.tex b/object-files.tex
index 4705e96..d36f4c2 100644
--- a/object-files.tex
+++ b/object-files.tex
@@ -5,22 +5,28 @@
 
 \subsection{Machine Information}
 
-For file identification in \texttt{e_ident}, the \xARCH architecture
-requires the following values.
+\subsubsection{Programming Model}
 
-\begin{table}[H]
-\Hrule
-  \caption{\xARCH Identification}
-  \begin{center}
-    \begin{tabular}[t]{l|l}
-      \multicolumn{1}{c}{Position} & \multicolumn{1}{c}{Value} \\
-      \hline
-      \texttt{e_ident[EI_CLASS]} & \texttt{ELFCLASS64} \\
-      \texttt{e_ident[EI_DATA]} & \texttt{ELFDATA2LSB}
-    \end{tabular}
-  \end{center}
-\Hrule
-\end{table}
+As described in Section \ref{intro}, binaries using the \xARCH instruction
+set may program to either a 32-bit model, in which the C data
+types \code{int}, \code{long} and all pointer types are 32-bit objects
+(X32); or to a 64-bit model, in which the C code{int} type is 32-bits
+but the C \code{long} type and all pointer types are 64-bit objects (LP64).
+This specification describes both binaries that use the X32 and the LP64
+model.
+
+\subsubsection{File Class}
+
+For \xARCH X32 objects, the file class value in e_ident[EI_CLASS] must
+be ELFCLASS32. For \xARCH LP64 objects, the file class value must be
+ELFCLASS64.
+
+\subsubsection{Data Encoding}
+
+For the data encoding in e_ident[EI_DATA], \xARCH objects use
+ELFDATA2LSB.
+
+\subsubsection{Processor identification}
 
 Processor identification resides in the ELF headers
 \texttt{e_machine} member and must have the value
@@ -397,6 +403,8 @@ Figure \ref{reloc_fields} shows the allowed relocatable fields.
                   with arbitrary byte alignment.  These values use
                   the same byte order as other word values in the
                   \xARCH architecture. \\
+\textit{wordclass} & This specifies \textit{word64} for LP64 and
+		     specifies \textit{word32} for X32. \\ 
 \end{tabular*}
 
 The following notations are used for specifying relocations in table
@@ -421,13 +429,19 @@ The following notations are used for specifying relocations in table
   relocation entry.
 \end{description}
 
-The \xARCH ABI architectures uses only \texttt{Elf64_Rela} relocation
+The \xARCH LP64 ABI architecture uses only \texttt{Elf64_Rela} relocation
 entries with explicit addends.  The \code{r_addend} member serves as
 the relocation addend.
 
+The \xARCH X32 ABI architecture uses only \texttt{Elf32_Rela} relocation
+entries in relocatable files.  Relocations contained within executable
+files or shared objects may use either \texttt{Elf32_Rela} relocation
+or \texttt{Elf32_Rel} relocation.
+
 \begin{table}[H]
 \Hrule
   \caption{Relocation Types}
+  \small
   \label{tab-relocations}
   \begin{center}
     \begin{tabular}[t]{l|r|l|l}
@@ -442,9 +456,9 @@ the relocation addend.
       \texttt{R_X86_64_GOT32} & 3 & \textit{word32} & \texttt{G + A} \\
       \texttt{R_X86_64_PLT32} & 4 & \textit{word32} & \texttt{L + A - P} \\
       \texttt{R_X86_64_COPY}  & 5 & none            & none \\
-      \texttt{R_X86_64_GLOB_DAT} & 6 & \textit{word64} & \texttt{S} \\
-      \texttt{R_X86_64_JUMP_SLOT} & 7 & \textit{word64} & \texttt{S} \\
-      \texttt{R_X86_64_RELATIVE} & 8 & \textit{word64} & \texttt{B + A} \\
+      \texttt{R_X86_64_GLOB_DAT} & 6 & \textit{wordclass} & \texttt{S} \\
+      \texttt{R_X86_64_JUMP_SLOT} & 7 & \textit{wordclass} & \texttt{S} \\
+      \texttt{R_X86_64_RELATIVE} & 8 & \textit{wordclass} & \texttt{B + A} \\
       \texttt{R_X86_64_GOTPCREL} & 9 & \textit{word32} & \texttt{G + GOT + A - P} \\
       \texttt{R_X86_64_32}    & 10 & \textit{word32} & \texttt{S + A} \\
       \texttt{R_X86_64_32S}   & 11 & \textit{word32} & \texttt{S + A} \\
@@ -460,17 +474,22 @@ the relocation addend.
       \texttt{R_X86_64_DTPOFF32}   & 21 & \textit{word32} &  \\
       \texttt{R_X86_64_GOTTPOFF}   & 22 & \textit{word32} &  \\
       \texttt{R_X86_64_TPOFF32}   & 23 & \textit{word32} &  \\
-      \texttt{R_X86_64_PC64}  & 24 & \textit{word64} & \texttt{S + A - P} \\
-      \texttt{R_X86_64_GOTOFF64} & 25 & \textit{word64} & \texttt{S + A - GOT} \\
+      \texttt{R_X86_64_PC64} $^\dagger$ & 24 & \textit{word64} & \texttt{S + A - P} \\
+      \texttt{R_X86_64_GOTOFF64} $^\dagger$ & 25 & \textit{word64} & \texttt{S + A - GOT} \\
       \texttt{R_X86_64_GOTPC32} & 26 & \textit{word32} & \texttt{GOT + A - P} \\
       \texttt{R_X86_64_SIZE32} & 32 & \textit{word32} & \texttt{Z + A} \\
-      \texttt{R_X86_64_SIZE64} & 33 & \textit{word64} & \texttt{Z + A} \\
+      \texttt{R_X86_64_SIZE64} $^\dagger$ & 33 & \textit{word64} & \texttt{Z + A} \\
       \texttt{R_X86_64_GOTPC32_TLSDESC} & 34 & \textit{word32} &  \\
       \texttt{R_X86_64_TLSDESC_CALL} & 35 & none &  \\
       \texttt{R_X86_64_TLSDESC} & 36 & \textit{word64}$\times 2$ & \\
-      \texttt{R_X86_64_IRELATIVE} & 37 & \textit{word64} & \texttt{indirect (B + A)}\\
+      \texttt{R_X86_64_IRELATIVE} & 37 & \textit{wordclass} & \texttt{indirect (B + A)}\\
+      \texttt{R_X86_64_RELATIVE64} $^{\dagger\dagger}$ & 38 & \textit{word64} & \texttt{B + A} \\
 %      \texttt{R_X86_64_GOT64} & 16 & \textit{word64} & \texttt{G + A} \\
 %      \texttt{R_X86_64_PLT64} & 17 & \textit{word64} & \texttt{L + A - P} \\
+     \cline{1-4}
+    \multicolumn{3}{l}{\small $^\dagger$ This relocation is used only for LP64.}\\
+    \multicolumn{3}{l}{\small $^{\dagger\dagger}$ This relocation only
+    appears in X32 executable files or shared objects.}\\
     \end{tabular}
   \end{center}
 \Hrule
diff --git a/x32.tex b/x32.tex
new file mode 100644
index 0000000..d419e52
--- /dev/null
+++ b/x32.tex
@@ -0,0 +1,442 @@
+\chapter{X32 Programming Model\label{x32}}
+
+\section{Parameter Passing}
+When a value of pointer type is returned or passed in a register, bits 32
+to 63 shall be zero.
+
+\section{Address Space}
+
+\xARCH X32 binaries reside in the lower 32 bits of the 64-bit virtual
+address space and all addresses are 32 bits in size.  They should conform
+to \textindex{small code model} or
+\textindex{small position independent code model} (\textindex{PIC})
+described in Section \ref{models}.
+
+\section{Thread-Local Storage Support}
+
+X32 Thread-Local Storage (TLS) support is based on LP64 TLS
+implementation with some modifcations.
+
+\subsection{Global Thread-Local Variable}
+
+For a global thread-local variable x:
+
+\begin{verbatim}
+extern __thread int x;
+\end{verbatim}
+
+\begin{description}
+\item[\textindex{General Dynamic Model}]
+  Load address of \code{x} into \reg{rax}
+
+\begin{table}[H]
+\Hrule
+\caption{General Dynamic Model Code Sequence}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LP64} & \multicolumn{3}{c}{X32} \\
+\hline
+0x00 & .byte & 0x66			& 0x00 & leaq  & x@tlsgd(\%rip),\%rdi \\
+0x01 & leaq  & x@tlsgd(\%rip),\%rdi	& 0x07 & .word & 0x6666 \\
+0x08 & .word & 0x6666			& 0x09 & rex64 & \\
+0x0a & rex64 &				& 0x0a & call  & \_\_tls\_get\_addr@plt \\
+0x0b & call  & \_\_tls\_get\_addr@plt	&      &       & \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\item[\textindex{Initial Exec Model}]
+  Load address of \code{x} into \reg{rax}
+
+\begin{table}[H]
+\Hrule
+\caption{Initial Exec Model Code Sequence}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LP64} & \multicolumn{3}{c}{X32} \\
+\hline
+0x01 & movq & \%fs:0,\%rax		& 0x01 & movl & \%fs:0,\%eax \\
+0x09 & addq & x@gottpoff(\%rip),\%rax	& 0x08 & addl & x@gottpoff(\%rip),\%eax \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\item[\textindex{Initial Exec Model, II}]
+  Load value of \code{x} into \reg{edi}.  \code{\%fs:(\%eax)} memory
+  operand can't be used for X32 since its effective address is the base
+  address of \code{\%fs} + value of \reg{eax} zero-extended to a 64-bit
+  result, which is incorrect with negative value in \reg{eax}.
+
+\begin{table}[H]
+\Hrule
+\caption{Initial Exec Model Code Sequence, II}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LP64} & \multicolumn{3}{c}{X32} \\
+\hline
+0x01 & movq & x@gottpoff(\%rip),\%rax	& 0x01 & movq & x@gottpoff(\%rip),\%rax \\
+0x07 & movl & \%fs:(\%rax),\%edi	& 0x07 & movl & \%fs:(\%rax),\%edi \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\end{description}
+
+\subsection{Static Thread-Local Variable}
+
+For a static thread-local variable x:
+
+\begin{verbatim}
+static __thread int x;
+\end{verbatim}
+
+\begin{description}
+\item[\textindex{Local Dynamic Model}]
+  Load address of \code{x} into \reg{rax}
+
+\begin{table}[H]
+\Hrule
+\caption{Local Dynamic Model Code Sequence With Lea}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LP64} & \multicolumn{3}{c}{X32} \\
+\hline
+0x00 & leaq  & x@tlsld(\%rip),\%rdi	& 0x00 & leaq & x@tlsld(\%rip),\%rdi\\
+0x07 & call  & \_\_tls\_get\_addr@plt	& 0x07 & call & \_\_tls\_get\_addr@plt\\
+0x0c & leaq  & x@dtpoff(\%rax),\%rax	& 0x0c & leal & x@dtpoff(\%rax),\%eax\\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+or
+
+\begin{table}[H]
+\Hrule
+\caption{Local Dynamic Model Code Sequence With Add}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LP64} & \multicolumn{3}{c}{X32} \\
+\hline
+0x00 & leaq  & x@tlsld(\%rip),\%rdi	& 0x00 & leaq & x@tlsld(\%rip),\%rdi\\
+0x07 & call  & \_\_tls\_get\_addr@plt	& 0x07 & call & \_\_tls\_get\_addr@plt\\
+0x0c & addq  & \$x@dtpoff,\%rax		& 0x0c & addl & \$x@dtpoff,\%eax\\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\item[\textindex{Local Dynamic Model, II}]
+  Load value of \code{x} into \reg{edi}
+
+\begin{table}[H]
+\Hrule
+\caption{Local Dynamic Model Code Sequence, II}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LP64} & \multicolumn{3}{c}{X32} \\
+\hline
+0x01 & movq & \%fs:0,\%rax		& 0x01 & movl & \%fs:0,\%eax \\
+0x09 & movl & x@dtpoff(\%rax),\%edi	& 0x08 & movl & x@dtpoff(\%rax),\%edi \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\item[\textindex{Local Exec Model}]
+  Load address of \code{x} into \reg{rax}
+
+\begin{table}[H]
+\Hrule
+\caption{Local Exec Model Code Sequence With Lea}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LP64} & \multicolumn{3}{c}{X32} \\
+\hline
+0x01 & movq & \%fs:0,\%rax		& 0x01 & movl & \%fs:0,\%eax \\
+0x09 & leaq & x@tpoff(\%rax),\%rax	& 0x08 & leal & x@tpoff(\%rax),\%eax \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+or
+
+\begin{table}[H]
+\Hrule
+\caption{Local Exec Model Code Sequence With Add}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LP64} & \multicolumn{3}{c}{X32} \\
+\hline
+0x01 & movq & \%fs:0,\%rax		& 0x01 & movl & \%fs:0,\%eax \\
+0x09 & addq & \$x@tpoff,\%rax		& 0x08 & addl & \$x@tpoff,\%eax \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\item[\textindex{Local Exec Model, II}]
+  Load value of \code{x} into \reg{edi}
+
+\begin{table}[H]
+\Hrule
+\caption{Local Exec Model Code Sequence, II}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LP64} & \multicolumn{3}{c}{X32} \\
+\hline
+0x01 & movq & \%fs:0,\%rax		& 0x01 & movl & \%fs:0,\%eax \\
+0x09 & movl & x@tpoff(\%rax),\%edi	& 0x08 & movl & x@tpoff(\%rax),\%edi \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\item[\textindex{Local Exec Model, III}]
+  Load value of \code{x} into \reg{edi}
+
+\begin{table}[H]
+\Hrule
+\caption{Local Exec Model Code Sequence, III}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LP64} & \multicolumn{3}{c}{X32} \\
+\hline
+0x00 & movl & \%fs:x@tpoff,\%edi	& 0x00 & movl & \%fs:x@tpoff,\%edi \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\end{description}
+
+\subsection{TLS Linker Optimization}
+
+\begin{description}
+\item[\textindex{General Dynamic To Initial Exec}]
+  Load address of \code{x} into \reg{rax}
+
+\begin{table}[H]
+\Hrule
+\caption{GD -> IE Code Transition}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{GD} & \multicolumn{3}{c}{IE} \\
+\hline
+0x00 & leaq  & x@tlsgd(\%rip),\%rdi	& 0x00 & movl  & \%fs:0, \%eax \\
+0x07 & .word & 0x6666			& 0x08 & addq  & x@gottpoff(\%rip),\%rax\\
+0x09 & rex64 &				&      &       & \\
+0x0a & call  & \_\_tls\_get\_addr@plt	&      &       & \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\begin{table}[H]
+\Hrule
+\caption{GD -> LE Code Transition}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{GD} & \multicolumn{3}{c}{LE} \\
+\hline
+0x00 & leaq  & x@tlsgd(\%rip),\%rdi	& 0x00 & movl  & \%fs:0, \%eax \\
+0x07 & .word & 0x6666			& 0x08 & leal  & x@tpoff(\%rax),\%eax\\
+0x09 & rex64 &				&      &       & \\
+0x0a & call  & \_\_tls\_get\_addr@plt	&      &       & \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\item[\textindex{Initial Exec To Local Exec}]
+  Load address of \code{x} into \reg{rax}
+
+\begin{table}[H]
+\Hrule
+\caption{IE -> LE Code Transition With Lea}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{IE} & \multicolumn{3}{c}{LE} \\
+\hline
+0x01 & movl & \%fs:0,\%eax		& 0x01 & movl & \%fs:0,\%eax \\
+0x08 & addl & x@gottpoff(\%rip),\%eax	& 0x08 & leal & x@tpoff(\%rax),\%eax \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+or
+
+\begin{table}[H]
+\Hrule
+\caption{IE -> LE Code Transition With Add}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{IE} & \multicolumn{3}{c}{LE} \\
+\hline
+0x01 & movl & \%fs:0,\%eax		& 0x01 & movl & \%fs:0,\%eax \\
+0x08 & addl & \$x@gottpoff(\%rip),\%eax	& 0x08 & addl & \$x@tpoff,\%eax \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\item[\textindex{Initial Exec To Local Exec, II}]
+  Load value of \code{x} into \reg{edi}.
+
+\begin{table}[H]
+\Hrule
+\caption{IE -> LE Code Transition, II}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{IE} & \multicolumn{3}{c}{LE} \\
+\hline
+0x01 & movq & x@gottpoff(\%rip),\%rax	& 0x01 & movq & x@tpoff,\%rax \\
+0x07 & movl & \%fs:(\%rax),\%edi	& 0x07 & movl & \%fs:(\%rax),\%edi \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\item[\textindex{Local Dynamic to Local Exec}]
+  Load address of \code{x} into \reg{rax}
+
+\begin{table}[H]
+\Hrule
+\caption{LD -> LE Code Transition With Lea}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LD} & \multicolumn{3}{c}{LE} \\
+\hline
+0x00 & leaq  & x@tlsld(\%rip),\%rdi	& 0x00 & nopl & 0x0(\%rax) \\
+0x07 & call  & \_\_tls\_get\_addr@plt	& 0x04 & movl & \%fs:0,\%eax\\
+0x0c & leal  & x@dtpoff(\%rax),\%eax	& 0x0c & leal & x@tpoff(\%rax),\%eax\\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+or
+
+\begin{table}[H]
+\Hrule
+\caption{LD -> LE Code Transition With Add}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LD} & \multicolumn{3}{c}{LE} \\
+\hline
+0x00 & leaq  & x@tlsld(\%rip),\%rdi	& 0x00 & nopl & 0x0(\%rax) \\
+0x07 & call  & \_\_tls\_get\_addr@plt	& 0x04 & movl & \%fs:0,\%eax\\
+0x0c & addq  & \$x@dtpoff,\%rax		& 0x0c & addl & \$x@tpoff,\%eax\\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\item[\textindex{Local Dynamic To Local Exec, II}]
+  Load value of \code{x} into \reg{edi}.
+
+\begin{table}[H]
+\Hrule
+\caption{LD -> LE Code Transition, II}
+\begin{center}
+\small\code{
+\begin{tabular}{lll|lll}
+\multicolumn{3}{c}{LD} & \multicolumn{3}{c}{LE} \\
+0x00 & leaq  & x@tlsld(\%rip),\%rdi	& 0x00 & nopl & 0x0(\%rax) \\
+0x07 & call  & \_\_tls\_get\_addr@plt	& 0x04 & movl & \%fs:0,\%eax\\
+0x0c & movl  & x@dtpoff(\%rax),\%eax	& 0x0c & movl & x@tpoff(\%rax),\%eax\\
+\hline
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}
+
+\end{description}
+
+\section{Kernel Support}
+Kernel should limit stack and addresses returned from system calls
+bewteen $0x00000000$ to $0xffffffff$.
+
+\section{Coding Examples}
+
+Although X32 binaries run in the 64-bit mode, not all 64-bit instructions
+are supported. This section discusses example code sequences for
+fundamental operations which are different from the 64-bit mode.
+
+\subsection{Indirect Branch}
+
+Since indirect branch via memory loads a 64-bit address at the memory
+location, it is not supported in X32.  Indirect branch via register
+should be used instead.  The 32-bit address from memory is loaded into
+the lower 32 bits of a register, which will automatically zero-extend
+the upper 32 bits of the register.  Then the indirect call can be
+performed via the 64-bit register. 
+
+\begin{table}[H]
+\Hrule
+\caption{Indirect Branch}
+\begin{center}
+\code{
+\begin{tabular}{ll|ll}
+\multicolumn{2}{c}{LP64} & \multicolumn{2}{c}{X32} \\
+\hline
+call & *\%rax          & call & *\%rax \\
+\hline
+call & *func\_p(\%rip) & movl & func\_p(\%rip), \%eax \\
+     &                 & call & *\%rax \\
+\hline
+call & *func\_p        & movl & func\_p, \%eax \\
+     &                 & call & *\%rax \\
+\hline
+jmp  & *\%rax          & jmp  & *\%rax \\
+\hline
+jmp  & *func\_p(\%rip) & movl & func\_p(\%rip), \%eax \\
+     &                 & jmp  & *\%rax \\
+\hline
+jmp  & *func\_p        & movl & func\_p, \%eax \\
+     &                 & jmp  & *\%rax \\
+\end{tabular}
+}
+\end{center}
+\Hrule
+\end{table}

             reply	other threads:[~2012-05-14 17:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-14 17:31 H.J. Lu [this message]
2012-05-14 17:34 ` H. Peter Anvin
2012-05-14 17:44   ` H.J. Lu
2012-05-15 16:08     ` [discuss] " Michael Matz
2012-05-15 16:18       ` H.J. Lu
2012-05-17 19:50       ` H.J. Lu
     [not found] ` <ccd4a6ab-f279-477f-b48b-94b8f4afd37d@googlegroups.com>
2012-06-26 19:48   ` H.J. Lu
2012-06-26 19:53     ` H. Peter Anvin
     [not found]       ` <af4adaed-508a-439f-92db-21d4385d316e@googlegroups.com>
2012-06-28 21:06         ` H. Peter Anvin
     [not found]     ` <69b1606d-6150-46eb-a426-93bfad19e7a2@googlegroups.com>
2012-06-26 21:23       ` H.J. Lu
     [not found]         ` <bde2af16-b04e-4e17-a22e-3fe0941e2496@googlegroups.com>
2012-06-27 12:02           ` H.J. Lu
2012-06-27 18:24             ` Magnus Fromreide
2012-06-27 18:29               ` H.J. Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMe9rOqE84CeCEZHxahccP2obgb50zdJWuY0z3UzWnDYn=g_4A@mail.gmail.com' \
    --to=hjl.tools@gmail.com \
    --cc=binutils@sourceware.org \
    --cc=discuss@x86-64.org \
    --cc=gcc@gcc.gnu.org \
    --cc=gdb@sourceware.org \
    --cc=libc-alpha@sourceware.org \
    --cc=x32-abi@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).