public inbox for libffi-discuss@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 02/13] x86: Remove some conditional compilation
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
@ 2014-11-07 15:30 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 10/13] x86: Add support for Go closures Richard Henderson
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:30 UTC (permalink / raw)
  To: libffi-discuss

Removal of ifdefs made possible to due to ffi_abi unification.
---
 src/x86/ffi.c | 81 +++++++++++++++++++++++------------------------------------
 1 file changed, 32 insertions(+), 49 deletions(-)

diff --git a/src/x86/ffi.c b/src/x86/ffi.c
index 90e3f79..98aa008 100644
--- a/src/x86/ffi.c
+++ b/src/x86/ffi.c
@@ -217,34 +217,25 @@ ffi_status ffi_prep_cif_machdep(ffi_cif *cif)
 
     case FFI_TYPE_STRUCT:
 #ifndef X86
+      /* ??? This should be a different ABI rather than an ifdef.  */
       if (cif->rtype->size == 1)
-        {
-          cif->flags = FFI_TYPE_SMALL_STRUCT_1B; /* same as char size */
-        }
+	cif->flags = FFI_TYPE_SMALL_STRUCT_1B;	/* same as char size */
       else if (cif->rtype->size == 2)
-        {
-          cif->flags = FFI_TYPE_SMALL_STRUCT_2B; /* same as short size */
-        }
+	cif->flags = FFI_TYPE_SMALL_STRUCT_2B;	/* same as short size */
       else if (cif->rtype->size == 4)
-        {
-          cif->flags = FFI_TYPE_INT; /* same as int type */
-        }
+	cif->flags = FFI_TYPE_INT;		/* same as int type */
       else if (cif->rtype->size == 8)
-        {
-          cif->flags = FFI_TYPE_SINT64; /* same as int64 type */
-        }
+	cif->flags = FFI_TYPE_SINT64;		/* same as int64 type */
       else
 #endif
-        {
-#ifdef X86_WIN32
-          if (cif->abi == FFI_MS_CDECL)
-            cif->flags = FFI_TYPE_MS_STRUCT;
-          else
-#endif
-            cif->flags = FFI_TYPE_STRUCT;
-          /* allocate space for return value pointer */
-          cif->bytes += ALIGN(sizeof(void*), FFI_SIZEOF_ARG);
-        }
+	{
+	  if (cif->abi == FFI_MS_CDECL)
+	    cif->flags = FFI_TYPE_MS_STRUCT;
+	  else
+	    cif->flags = FFI_TYPE_STRUCT;
+	  /* Allocate space for return value pointer.  */
+	  cif->bytes += ALIGN(sizeof(void*), FFI_SIZEOF_ARG);
+	}
       break;
 
     default:
@@ -259,10 +250,8 @@ ffi_status ffi_prep_cif_machdep(ffi_cif *cif)
       cif->bytes += (unsigned)ALIGN((*ptr)->size, FFI_SIZEOF_ARG);
     }
 
-#ifndef X86_WIN32
   if (cif->abi == FFI_SYSV)
-    cif->bytes = (cif->bytes + 15) & ~0xF;
-#endif
+    cif->bytes = ALIGN (cif->bytes, 15);
 
   return FFI_OK;
 }
@@ -577,14 +566,12 @@ ffi_prep_closure_loc (ffi_closure* closure,
                                    &ffi_closure_STDCALL,
                                    (void*)codeloc);
     }
-#ifdef X86_WIN32
   else if (cif->abi == FFI_MS_CDECL)
     {
       FFI_INIT_TRAMPOLINE (&closure->tramp[0],
                            &ffi_closure_SYSV,
                            (void*)codeloc);
     }
-#endif /* X86_WIN32 */
   else
     {
       return FFI_BAD_ABI;
@@ -610,40 +597,36 @@ ffi_prep_raw_closure_loc (ffi_raw_closure* closure,
 {
   int i;
 
-  if (cif->abi != FFI_SYSV
-#ifdef X86_WIN32
-      && cif->abi != FFI_THISCALL
-#endif
-     )
-    return FFI_BAD_ABI;
-
-  /* we currently don't support certain kinds of arguments for raw
+  /* We currently don't support certain kinds of arguments for raw
      closures.  This should be implemented by a separate assembly
      language routine, since it would require argument processing,
      something we don't do now for performance.  */
-
   for (i = cif->nargs-1; i >= 0; i--)
     {
       FFI_ASSERT (cif->arg_types[i]->type != FFI_TYPE_STRUCT);
       FFI_ASSERT (cif->arg_types[i]->type != FFI_TYPE_LONGDOUBLE);
     }
-  
-#ifdef X86_WIN32
-  if (cif->abi == FFI_SYSV)
+
+  switch (cif->abi)
     {
-#endif
-  FFI_INIT_TRAMPOLINE (&closure->tramp[0], &ffi_closure_raw_SYSV,
-                       codeloc);
 #ifdef X86_WIN32
-    }
-  else if (cif->abi == FFI_THISCALL)
-    {
-      FFI_INIT_TRAMPOLINE_RAW_THISCALL (&closure->tramp[0], &ffi_closure_raw_THISCALL, codeloc, cif->bytes);
-    }
+    case FFI_THISCALL:
+      FFI_INIT_TRAMPOLINE_RAW_THISCALL (&closure->tramp[0],
+					&ffi_closure_raw_THISCALL,
+					codeloc, cif->bytes);
+      break;
 #endif
-  closure->cif  = cif;
+    case FFI_SYSV:
+      FFI_INIT_TRAMPOLINE (&closure->tramp[0], &ffi_closure_raw_SYSV,
+			   codeloc);
+      break;
+    default:
+      return FFI_BAD_ABI;
+    }
+
+  closure->cif = cif;
+  closure->fun = fun;
   closure->user_data = user_data;
-  closure->fun  = fun;
 
   return FFI_OK;
 }
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 00/13] Go closures for i686
@ 2014-11-07 15:30 Richard Henderson
  2014-11-07 15:30 ` [PATCH 02/13] x86: Remove some conditional compilation Richard Henderson
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:30 UTC (permalink / raw)
  To: libffi-discuss

A massive cleanup of the i686 backend, with all/most ABIs supported
for all targets, not just windows.  Which (nearly) eliminates conditional
compilation wrt ABIs, and allows massive code de-duplication.

Like the x86_64 patch set, this breaks Darwin.  What I think ought to
happen is that darwin.S should be deleted, and the very minor tweaks
that might be left vs sysv.S should be handled by the preprocessor.
However, I have no access to darwin with which to test.

Like the x86_64 patch set, the masm variant is removed.  Again, I have
no way that I could test this.  Though win32 support is of course still
working via gas.

With the final patch, building with clang mostly works.  For some reason
the unwind tests fail, despite the .eh_frame looking correct.  Freebsd
continues to work if you use gcc+gas from ports.

Tested on i686-{linux,cygwin,freebsd10}.


r~


Richard Henderson (13):
  x86: Tidy ffi_abi
  x86: Remove some conditional compilation
  x86: Force FFI_TYPE_LONGDOUBLE different from FFI_TYPE_DOUBLE
  x86: Convert to gas generated unwind info
  ffi_cif: Add cfa_escape
  x86: Rewrite ffi_call
  x86: Rewrite closures
  testsuite: Fix return_complex2 vs excessive precision
  x86: Add support for Complex
  x86: Add support for Go closures
  x86: Use win32 name mangling for fastcall functions
  testsuite: Add two dg-do run markers
  x86: Work around two clang assembler bugs

 Makefile.am                               |   13 +-
 include/ffi_cfi.h                         |    2 +
 src/x86/ffi.c                             | 1128 ++++++++++++------------
 src/x86/ffitarget.h                       |   64 +-
 src/x86/freebsd.S                         |  463 ----------
 src/x86/internal.h                        |   23 +
 src/x86/sysv.S                            | 1037 +++++++++++++---------
 src/x86/win32.S                           | 1351 -----------------------------
 testsuite/libffi.call/call.exp            |    3 +-
 testsuite/libffi.call/float2.c            |    1 +
 testsuite/libffi.call/return_complex2.inc |   10 +-
 testsuite/libffi.call/return_ldl.c        |    1 +
 12 files changed, 1238 insertions(+), 2858 deletions(-)
 delete mode 100644 src/x86/freebsd.S
 create mode 100644 src/x86/internal.h
 delete mode 100644 src/x86/win32.S

-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 04/13] x86: Convert to gas generated unwind info
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
                   ` (6 preceding siblings ...)
  2014-11-07 15:31 ` [PATCH 11/13] x86: Use win32 name mangling for fastcall functions Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 13/13] x86: Work around two clang assembler bugs Richard Henderson
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

---
 src/x86/sysv.S | 173 +++++++--------------------------------------------------
 1 file changed, 20 insertions(+), 153 deletions(-)

diff --git a/src/x86/sysv.S b/src/x86/sysv.S
index 3bd5477..fd13bc0 100644
--- a/src/x86/sysv.S
+++ b/src/x86/sysv.S
@@ -30,6 +30,7 @@
 #define LIBFFI_ASM	
 #include <fficonfig.h>
 #include <ffi.h>
+#include <ffi_cfi.h>
 
 .text
 
@@ -40,11 +41,12 @@
         .type    ffi_call_SYSV,@function
 
 ffi_call_SYSV:
-.LFB1:
+	cfi_startproc
         pushl %ebp
-.LCFI0:
+	cfi_adjust_cfa_offset(4)
+	cfi_rel_offset(%ebp, 0)
         movl  %esp,%ebp
-.LCFI1:
+	cfi_def_cfa_register(%ebp)
 	/* Make room for all of the new args.  */
 	movl  16(%ebp),%ecx
 	subl  %ecx,%esp
@@ -163,7 +165,7 @@ epilogue:
         movl %ebp,%esp
         popl %ebp
         ret
-.LFE1:
+	cfi_endproc
 .ffi_call_SYSV_end:
         .size    ffi_call_SYSV,.ffi_call_SYSV_end-ffi_call_SYSV
 
@@ -173,11 +175,12 @@ FFI_HIDDEN (ffi_closure_SYSV)
 	.type	ffi_closure_SYSV, @function
 
 ffi_closure_SYSV:
-.LFB2:
+	cfi_startproc
 	pushl	%ebp
-.LCFI2:
+	cfi_adjust_cfa_offset(4)
+	cfi_rel_offset(%ebp, 0)
 	movl	%esp, %ebp
-.LCFI3:
+	cfi_def_cfa_register(%ebp)
 	subl	$40, %esp
 	leal	-24(%ebp), %edx
 	movl	%edx, -12(%ebp)	/* resp */
@@ -199,12 +202,13 @@ ffi_closure_SYSV:
 	call	ffi_closure_SYSV_inner
 #else
 	movl	%ebx, 8(%esp)
-.LCFI7:
+	cfi_offset(%ebx, -40)
 	call	1f
 1:	popl	%ebx
 	addl	$_GLOBAL_OFFSET_TABLE_+[.-1b], %ebx
 	call	ffi_closure_SYSV_inner@PLT
 	movl	8(%esp), %ebx
+	cfi_restore(%ebx)
 #endif
 	movl	-12(%ebp), %ecx
 	cmpl	$FFI_TYPE_INT, %eax
@@ -251,7 +255,7 @@ ffi_closure_SYSV:
 	movl	%ebp, %esp
 	popl	%ebp
 	ret	$4
-.LFE2:
+	cfi_endproc
 	.size	ffi_closure_SYSV, .-ffi_closure_SYSV
 
 #if !FFI_NO_RAW_API
@@ -278,13 +282,14 @@ FFI_HIDDEN (ffi_closure_raw_SYSV)
 	.type	ffi_closure_raw_SYSV, @function
 
 ffi_closure_raw_SYSV:
-.LFB3:
+	cfi_startproc
 	pushl	%ebp
-.LCFI4:
+	cfi_adjust_cfa_offset(4)
+	cfi_rel_offset(%ebp, 0)
 	movl	%esp, %ebp
-.LCFI5:
+	cfi_def_cfa_register(%ebp)
 	pushl	%esi
-.LCFI6:
+	cfi_offset(%esi, -12)
 	subl	$36, %esp
 	movl	RAW_CLOSURE_CIF_OFFSET(%eax), %esi	 /* closure->cif */
 	movl	RAW_CLOSURE_USER_DATA_OFFSET(%eax), %edx /* closure->user_data */
@@ -335,149 +340,11 @@ ffi_closure_raw_SYSV:
 	movl	-24(%ebp), %eax
 	movl	-20(%ebp), %edx
 	jmp	.Lrcls_epilogue
-.LFE3:
+	cfi_endproc
 	.size	ffi_closure_raw_SYSV, .-ffi_closure_raw_SYSV
-#endif
-
-#if defined __GNUC__
-/* Only emit dwarf unwind info when building with GNU toolchain.  */
-
-#if defined __PIC__
-# if defined __sun__ && defined __svr4__
-/* 32-bit Solaris 2/x86 uses datarel encoding for PIC.  GNU ld before 2.22
-   doesn't correctly sort .eh_frame_hdr with mixed encodings, so match this.  */
-#  define FDE_ENCODING		0x30	/* datarel */
-#  define FDE_ENCODE(X)		X@GOTOFF
-# else
-#  define FDE_ENCODING		0x1b	/* pcrel sdata4 */
-#  if defined HAVE_AS_X86_PCREL
-#   define FDE_ENCODE(X)	X-.
-#  else
-#   define FDE_ENCODE(X)	X@rel
-#  endif
-# endif
-#else
-# define FDE_ENCODING		0	/* absolute */
-# define FDE_ENCODE(X)		X
-#endif
-
-	.section	.eh_frame,EH_FRAME_FLAGS,@progbits
-.Lframe1:
-	.long	.LECIE1-.LSCIE1	/* Length of Common Information Entry */
-.LSCIE1:
-	.long	0x0	/* CIE Identifier Tag */
-	.byte	0x1	/* CIE Version */
-#ifdef HAVE_AS_ASCII_PSEUDO_OP
-#ifdef __PIC__
-	.ascii "zR\0"	/* CIE Augmentation */
-#else
-	.ascii "\0"	/* CIE Augmentation */
-#endif
-#elif defined HAVE_AS_STRING_PSEUDO_OP
-#ifdef __PIC__
-	.string "zR"	/* CIE Augmentation */
-#else
-	.string ""	/* CIE Augmentation */
-#endif
-#else
-#error missing .ascii/.string
-#endif
-	.byte	0x1	/* .uleb128 0x1; CIE Code Alignment Factor */
-	.byte	0x7c	/* .sleb128 -4; CIE Data Alignment Factor */
-	.byte	0x8	/* CIE RA Column */
-#ifdef __PIC__
-	.byte	0x1	/* .uleb128 0x1; Augmentation size */
-	.byte	FDE_ENCODING
-#endif
-	.byte	0xc	/* DW_CFA_def_cfa */
-	.byte	0x4	/* .uleb128 0x4 */
-	.byte	0x4	/* .uleb128 0x4 */
-	.byte	0x88	/* DW_CFA_offset, column 0x8 */
-	.byte	0x1	/* .uleb128 0x1 */
-	.align 4
-.LECIE1:
-.LSFDE1:
-	.long	.LEFDE1-.LASFDE1	/* FDE Length */
-.LASFDE1:
-	.long	.LASFDE1-.Lframe1	/* FDE CIE offset */
-	.long	FDE_ENCODE(.LFB1)	/* FDE initial location */
-	.long	.LFE1-.LFB1		/* FDE address range */
-#ifdef __PIC__
-	.byte	0x0	/* .uleb128 0x0; Augmentation size */
-#endif
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI0-.LFB1
-	.byte	0xe	/* DW_CFA_def_cfa_offset */
-	.byte	0x8	/* .uleb128 0x8 */
-	.byte	0x85	/* DW_CFA_offset, column 0x5 */
-	.byte	0x2	/* .uleb128 0x2 */
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI1-.LCFI0
-	.byte	0xd	/* DW_CFA_def_cfa_register */
-	.byte	0x5	/* .uleb128 0x5 */
-	.align 4
-.LEFDE1:
-.LSFDE2:
-	.long	.LEFDE2-.LASFDE2	/* FDE Length */
-.LASFDE2:
-	.long	.LASFDE2-.Lframe1	/* FDE CIE offset */
-	.long	FDE_ENCODE(.LFB2)	/* FDE initial location */
-	.long	.LFE2-.LFB2		/* FDE address range */
-#ifdef __PIC__
-	.byte	0x0	/* .uleb128 0x0; Augmentation size */
-#endif
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI2-.LFB2
-	.byte	0xe	/* DW_CFA_def_cfa_offset */
-	.byte	0x8	/* .uleb128 0x8 */
-	.byte	0x85	/* DW_CFA_offset, column 0x5 */
-	.byte	0x2	/* .uleb128 0x2 */
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI3-.LCFI2
-	.byte	0xd	/* DW_CFA_def_cfa_register */
-	.byte	0x5	/* .uleb128 0x5 */
-#if !defined HAVE_HIDDEN_VISIBILITY_ATTRIBUTE && defined __PIC__
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI7-.LCFI3
-	.byte	0x83	/* DW_CFA_offset, column 0x3 */
-	.byte	0xa	/* .uleb128 0xa */
-#endif
-	.align 4
-.LEFDE2:
-
-#if !FFI_NO_RAW_API
-
-.LSFDE3:
-	.long	.LEFDE3-.LASFDE3	/* FDE Length */
-.LASFDE3:
-	.long	.LASFDE3-.Lframe1	/* FDE CIE offset */
-	.long	FDE_ENCODE(.LFB3)	/* FDE initial location */
-	.long	.LFE3-.LFB3		/* FDE address range */
-#ifdef __PIC__
-	.byte	0x0	/* .uleb128 0x0; Augmentation size */
-#endif
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI4-.LFB3
-	.byte	0xe	/* DW_CFA_def_cfa_offset */
-	.byte	0x8	/* .uleb128 0x8 */
-	.byte	0x85	/* DW_CFA_offset, column 0x5 */
-	.byte	0x2	/* .uleb128 0x2 */
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI5-.LCFI4
-	.byte	0xd	/* DW_CFA_def_cfa_register */
-	.byte	0x5	/* .uleb128 0x5 */
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI6-.LCFI5
-	.byte	0x86	/* DW_CFA_offset, column 0x6 */
-	.byte	0x3	/* .uleb128 0x3 */
-	.align 4
-.LEFDE3:
-
-#endif
-#endif
 
+#endif /* !FFI_NO_RAW_API */
 #endif /* ifndef __x86_64__ */
-
 #if defined __ELF__ && defined __linux__
 	.section	.note.GNU-stack,"",@progbits
 #endif
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 05/13] ffi_cif: Add cfa_escape
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
                   ` (2 preceding siblings ...)
  2014-11-07 15:31 ` [PATCH 12/13] testsuite: Add two dg-do run markers Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 07/13] x86: Rewrite closures Richard Henderson
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

---
 include/ffi_cfi.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/ffi_cfi.h b/include/ffi_cfi.h
index 6cca20c..244ce57 100644
--- a/include/ffi_cfi.h
+++ b/include/ffi_cfi.h
@@ -27,6 +27,7 @@
 # define cfi_window_save		.cfi_window_save
 # define cfi_personality(enc, exp)	.cfi_personality enc, exp
 # define cfi_lsda(enc, exp)		.cfi_lsda enc, exp
+# define cfi_escape(...)		.cfi_escape __VA_ARGS__
 
 #else
 
@@ -48,6 +49,7 @@
 # define cfi_window_save
 # define cfi_personality(enc, exp)
 # define cfi_lsda(enc, exp)
+# define cfi_escape(...)
 
 #endif /* HAVE_AS_CFI_PSEUDO_OP */
 #endif /* FFI_CFI_H */
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/13] x86: Tidy ffi_abi
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
                   ` (11 preceding siblings ...)
  2014-11-07 15:31 ` [PATCH 08/13] testsuite: Fix return_complex2 vs excessive precision Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 16:09 ` [PATCH 00/13] Go closures for i686 Richard Henderson
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

The x86_64 unix port only handles one ABI; don't define all of the
other symbols.  The UNIX64 symbol retains the same value.

The i386 ports ought to have the same symbols, even if we can't yet
unify the values without incrementing the libffi soname.
---
 src/x86/ffi.c       |  2 +-
 src/x86/ffitarget.h | 60 ++++++++++++++++++++++++++---------------------------
 2 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/src/x86/ffi.c b/src/x86/ffi.c
index c387fb5..90e3f79 100644
--- a/src/x86/ffi.c
+++ b/src/x86/ffi.c
@@ -260,7 +260,7 @@ ffi_status ffi_prep_cif_machdep(ffi_cif *cif)
     }
 
 #ifndef X86_WIN32
-  if (cif->abi == FFI_SYSV || cif->abi == FFI_UNIX64)
+  if (cif->abi == FFI_SYSV)
     cif->bytes = (cif->bytes + 15) & ~0xF;
 #endif
 
diff --git a/src/x86/ffitarget.h b/src/x86/ffitarget.h
index 8c52573..a4c9573 100644
--- a/src/x86/ffitarget.h
+++ b/src/x86/ffitarget.h
@@ -76,44 +76,44 @@ typedef signed long            ffi_sarg;
 #endif
 
 typedef enum ffi_abi {
+#if defined(X86_WIN64)
   FFI_FIRST_ABI = 0,
-
-  /* ---- Intel x86 Win32 ---------- */
-#ifdef X86_WIN32
-  FFI_SYSV,
-  FFI_STDCALL,
-  FFI_THISCALL,
-  FFI_FASTCALL,
-  FFI_MS_CDECL,
-  FFI_PASCAL,
-  FFI_REGISTER,
-  FFI_LAST_ABI,
-#ifdef _MSC_VER
-  FFI_DEFAULT_ABI = FFI_MS_CDECL
-#else
-  FFI_DEFAULT_ABI = FFI_SYSV
-#endif
-
-#elif defined(X86_WIN64)
   FFI_WIN64,
   FFI_LAST_ABI,
   FFI_DEFAULT_ABI = FFI_WIN64
 
-#else
-  /* ---- Intel x86 and AMD x86-64 - */
-  FFI_SYSV,
-  FFI_UNIX64,   /* Unix variants all use the same ABI for x86-64  */
-  FFI_THISCALL,
-  FFI_FASTCALL,
-  FFI_STDCALL,
-  FFI_PASCAL,
-  FFI_REGISTER,
+#elif defined(X86_64) || (defined (__x86_64__) && defined (X86_DARWIN))
+  FFI_FIRST_ABI = 1,
+  FFI_UNIX64,
   FFI_LAST_ABI,
-#if defined(__i386__) || defined(__i386)
+  FFI_DEFAULT_ABI = FFI_UNIX64
+
+#elif defined(X86_WIN32)
+  FFI_FIRST_ABI = 0,
+  FFI_SYSV      = 1,
+  FFI_STDCALL   = 2,
+  FFI_THISCALL  = 3,
+  FFI_FASTCALL  = 4,
+  FFI_MS_CDECL  = 5,
+  FFI_PASCAL    = 6,
+  FFI_REGISTER  = 7,
+  FFI_LAST_ABI,
+# ifdef _MSC_VER
+  FFI_DEFAULT_ABI = FFI_MS_CDECL
+# else
   FFI_DEFAULT_ABI = FFI_SYSV
+# endif
 #else
-  FFI_DEFAULT_ABI = FFI_UNIX64
-#endif
+  FFI_FIRST_ABI = 0,
+  FFI_SYSV      = 1,
+  FFI_THISCALL  = 3,
+  FFI_FASTCALL  = 4,
+  FFI_STDCALL   = 5,
+  FFI_PASCAL    = 6,
+  FFI_REGISTER  = 7,
+  FFI_MS_CDECL  = 8,
+  FFI_LAST_ABI,
+  FFI_DEFAULT_ABI = FFI_SYSV
 #endif
 } ffi_abi;
 #endif
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 06/13] x86: Rewrite ffi_call
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
                   ` (9 preceding siblings ...)
  2014-11-07 15:31 ` [PATCH 03/13] x86: Force FFI_TYPE_LONGDOUBLE different from FFI_TYPE_DOUBLE Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 08/13] testsuite: Fix return_complex2 vs excessive precision Richard Henderson
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

Decouple the assembly from FFI_TYPE_*.  Merge prep_args with ffi_call,
passing the frame and the stack to the assembly.

Note that this patch isn't really standalone, as this breaks closures.
---
 src/x86/ffi.c       | 601 ++++++++++++++++++++++++++--------------------------
 src/x86/ffitarget.h |   4 -
 src/x86/internal.h  |  23 ++
 src/x86/sysv.S      | 243 ++++++++++-----------
 4 files changed, 448 insertions(+), 423 deletions(-)
 create mode 100644 src/x86/internal.h

diff --git a/src/x86/ffi.c b/src/x86/ffi.c
index 339ca89..1c77bb8 100644
--- a/src/x86/ffi.c
+++ b/src/x86/ffi.c
@@ -29,10 +29,10 @@
    ----------------------------------------------------------------------- */
 
 #ifndef __x86_64__
-
 #include <ffi.h>
 #include <ffi_common.h>
 #include <stdlib.h>
+#include "internal.h"
 
 /* Force FFI_TYPE_LONGDOUBLE to be different than FFI_TYPE_DOUBLE;
    all further uses in this file will refer to the 80-bit type.  */
@@ -45,276 +45,277 @@
 # define FFI_TYPE_LONGDOUBLE 4
 #endif
 
+#if defined(__GNUC__) && !defined(__declspec)
+# define __declspec(x)  __attribute__((x))
+#endif
 
-/* ffi_prep_args is called by the assembly routine once stack space
-   has been allocated for the function's arguments */
-
-unsigned int ffi_prep_args(char *stack, extended_cif *ecif);
-unsigned int ffi_prep_args(char *stack, extended_cif *ecif)
+/* Perform machine dependent cif processing.  */
+ffi_status FFI_HIDDEN
+ffi_prep_cif_machdep(ffi_cif *cif)
 {
-  register unsigned int i;
-  register void **p_argv;
-  register char *argp;
-  register ffi_type **p_arg;
-  const int cabi = ecif->cif->abi;
-  const int dir = (cabi == FFI_PASCAL || cabi == FFI_REGISTER) ? -1 : +1;
-  unsigned int stack_args_count = 0;
-  void *p_stack_data[3];
-  char *argp2 = stack;
-
-  argp = stack;
+  size_t bytes = 0;
+  int i, n, flags, cabi = cif->abi;
 
-  if ((ecif->cif->flags == FFI_TYPE_STRUCT
-       || ecif->cif->flags == FFI_TYPE_MS_STRUCT))
+  switch (cabi)
     {
-      /* For fastcall/thiscall/register this is first register-passed
-         argument.  */
-      if (cabi == FFI_THISCALL || cabi == FFI_FASTCALL || cabi == FFI_REGISTER)
-        {
-          p_stack_data[stack_args_count] = argp;
-          ++stack_args_count;
-        }
-
-      *(void **) argp = ecif->rvalue;
-      argp += sizeof(void*);
-    }
-
-  p_arg  = ecif->cif->arg_types;
-  p_argv = ecif->avalue;
-  if (dir < 0)
-    {
-      const int nargs = ecif->cif->nargs - 1;
-      if (nargs > 0)
-      {
-        p_arg  += nargs;
-        p_argv += nargs;
-      }
-    }
-
-  for (i = ecif->cif->nargs;
-       i != 0;
-       i--, p_arg += dir, p_argv += dir)
-    {
-      /* Align if necessary */
-      if ((sizeof(void*) - 1) & (size_t) argp)
-        argp = (char *) ALIGN(argp, sizeof(void*));
-
-      size_t z = (*p_arg)->size;
-
-      if (z < FFI_SIZEOF_ARG)
-        {
-          z = FFI_SIZEOF_ARG;
-          switch ((*p_arg)->type)
-            {
-            case FFI_TYPE_SINT8:
-              *(ffi_sarg *) argp = (ffi_sarg)*(SINT8 *)(* p_argv);
-              break;
-
-            case FFI_TYPE_UINT8:
-              *(ffi_arg *) argp = (ffi_arg)*(UINT8 *)(* p_argv);
-              break;
-
-            case FFI_TYPE_SINT16:
-              *(ffi_sarg *) argp = (ffi_sarg)*(SINT16 *)(* p_argv);
-              break;
-
-            case FFI_TYPE_UINT16:
-              *(ffi_arg *) argp = (ffi_arg)*(UINT16 *)(* p_argv);
-              break;
-
-            case FFI_TYPE_SINT32:
-              *(ffi_sarg *) argp = (ffi_sarg)*(SINT32 *)(* p_argv);
-              break;
-
-            case FFI_TYPE_UINT32:
-              *(ffi_arg *) argp = (ffi_arg)*(UINT32 *)(* p_argv);
-              break;
-
-            case FFI_TYPE_STRUCT:
-              *(ffi_arg *) argp = *(ffi_arg *)(* p_argv);
-              break;
-
-            default:
-              FFI_ASSERT(0);
-            }
-        }
-      else
-        {
-          memcpy(argp, *p_argv, z);
-        }
-
-    /* For thiscall/fastcall/register convention register-passed arguments
-       are the first two none-floating-point arguments with a size
-       smaller or equal to sizeof (void*).  */
-    if ((z == FFI_SIZEOF_ARG)
-        && ((cabi == FFI_REGISTER)
-          || (cabi == FFI_THISCALL && stack_args_count < 1)
-          || (cabi == FFI_FASTCALL && stack_args_count < 2))
-        && ((*p_arg)->type != FFI_TYPE_FLOAT && (*p_arg)->type != FFI_TYPE_STRUCT)
-       )
-      {
-        if (dir < 0 && stack_args_count > 2)
-          {
-            /* Iterating arguments backwards, so first register-passed argument
-               will be passed last. Shift temporary values to make place. */
-            p_stack_data[0] = p_stack_data[1];
-            p_stack_data[1] = p_stack_data[2];
-            stack_args_count = 2;
-          }
-
-        p_stack_data[stack_args_count] = argp;
-        ++stack_args_count;
-      }
-
-      argp += z;
-    }
-
-  /* We need to move the register-passed arguments for thiscall,
-     fastcall, register on top of stack, so that those can be moved
-     to registers by call-handler.  */
-  if (stack_args_count > 0)
-    {
-      if (dir < 0 && stack_args_count > 1)
-        {
-          /* Reverse order if iterating arguments backwards */
-          ffi_arg tmp = *(ffi_arg*) p_stack_data[0];
-          *(ffi_arg*) p_stack_data[0] = *(ffi_arg*) p_stack_data[stack_args_count - 1];
-          *(ffi_arg*) p_stack_data[stack_args_count - 1] = tmp;
-        }
-      
-      int i;
-      for (i = 0; i < stack_args_count; i++)
-        {
-          if (p_stack_data[i] != argp2)
-            {
-              ffi_arg tmp = *(ffi_arg*) p_stack_data[i];
-              memmove (argp2 + FFI_SIZEOF_ARG, argp2, (size_t) ((char*) p_stack_data[i] - (char*)argp2));
-              *(ffi_arg *) argp2 = tmp;
-            }
-
-          argp2 += FFI_SIZEOF_ARG;
-        }
+    case FFI_SYSV:
+    case FFI_STDCALL:
+    case FFI_THISCALL:
+    case FFI_FASTCALL:
+    case FFI_MS_CDECL:
+    case FFI_PASCAL:
+    case FFI_REGISTER:
+      break;
+    default:
+      return FFI_BAD_ABI;
     }
 
-    return stack_args_count;
-    return 0;
-}
-
-/* Perform machine dependent cif processing */
-ffi_status ffi_prep_cif_machdep(ffi_cif *cif)
-{
-  unsigned int i;
-  ffi_type **ptr;
-
-  /* Set the return type flag */
   switch (cif->rtype->type)
     {
     case FFI_TYPE_VOID:
+      flags = X86_RET_VOID;
+      break;
+    case FFI_TYPE_FLOAT:
+      flags = X86_RET_FLOAT;
+      break;
+    case FFI_TYPE_DOUBLE:
+      flags = X86_RET_DOUBLE;
+      break;
+    case FFI_TYPE_LONGDOUBLE:
+      flags = X86_RET_LDOUBLE;
+      break;
     case FFI_TYPE_UINT8:
+      flags = X86_RET_UINT8;
+      break;
     case FFI_TYPE_UINT16:
+      flags = X86_RET_UINT16;
+      break;
     case FFI_TYPE_SINT8:
+      flags = X86_RET_SINT8;
+      break;
     case FFI_TYPE_SINT16:
-    case FFI_TYPE_SINT64:
-    case FFI_TYPE_FLOAT:
-    case FFI_TYPE_DOUBLE:
-    case FFI_TYPE_LONGDOUBLE:
-      cif->flags = (unsigned) cif->rtype->type;
+      flags = X86_RET_SINT16;
       break;
-
+    case FFI_TYPE_INT:
+    case FFI_TYPE_SINT32:
+    case FFI_TYPE_UINT32:
+    case FFI_TYPE_POINTER:
+      flags = X86_RET_INT32;
+      break;
+    case FFI_TYPE_SINT64:
     case FFI_TYPE_UINT64:
-      cif->flags = FFI_TYPE_SINT64;
+      flags = X86_RET_INT64;
       break;
-
     case FFI_TYPE_STRUCT:
 #ifndef X86
       /* ??? This should be a different ABI rather than an ifdef.  */
       if (cif->rtype->size == 1)
-	cif->flags = FFI_TYPE_SMALL_STRUCT_1B;	/* same as char size */
+	flags = X86_RET_STRUCT_1B;
       else if (cif->rtype->size == 2)
-	cif->flags = FFI_TYPE_SMALL_STRUCT_2B;	/* same as short size */
+	flags = X86_RET_STRUCT_2B;
       else if (cif->rtype->size == 4)
-	cif->flags = FFI_TYPE_INT;		/* same as int type */
+	flags = X86_RET_INT32;
       else if (cif->rtype->size == 8)
-	cif->flags = FFI_TYPE_SINT64;		/* same as int64 type */
+	flags = X86_RET_INT64;
       else
 #endif
 	{
-	  if (cif->abi == FFI_MS_CDECL)
-	    cif->flags = FFI_TYPE_MS_STRUCT;
-	  else
-	    cif->flags = FFI_TYPE_STRUCT;
+	  switch (cabi)
+	    {
+	    case FFI_THISCALL:
+	    case FFI_FASTCALL:
+	    case FFI_STDCALL:
+	    case FFI_MS_CDECL:
+	      flags = X86_RET_STRUCTARG;
+	      break;
+	    default:
+	      flags = X86_RET_STRUCTPOP;
+	      break;
+	    }
 	  /* Allocate space for return value pointer.  */
-	  cif->bytes += ALIGN(sizeof(void*), FFI_SIZEOF_ARG);
+	  bytes += ALIGN (sizeof(void*), FFI_SIZEOF_ARG);
 	}
       break;
-
     default:
-      cif->flags = FFI_TYPE_INT;
-      break;
+      return FFI_BAD_TYPEDEF;
     }
+  cif->flags = flags;
 
-  for (ptr = cif->arg_types, i = cif->nargs; i > 0; i--, ptr++)
+  for (i = 0, n = cif->nargs; i < n; i++)
     {
-      if (((*ptr)->alignment - 1) & cif->bytes)
-        cif->bytes = ALIGN(cif->bytes, (*ptr)->alignment);
-      cif->bytes += (unsigned)ALIGN((*ptr)->size, FFI_SIZEOF_ARG);
-    }
+      ffi_type *t = cif->arg_types[i];
 
-  if (cif->abi == FFI_SYSV)
-    cif->bytes = ALIGN (cif->bytes, 15);
+      bytes = ALIGN (bytes, t->alignment);
+      bytes += ALIGN (t->size, FFI_SIZEOF_ARG);
+    }
+  cif->bytes = ALIGN (bytes, 16);
 
   return FFI_OK;
 }
 
-extern void
-ffi_call_win32(unsigned int (*)(char *, extended_cif *), extended_cif *,
-               unsigned, unsigned, unsigned, unsigned *, void (*fn)(void));
-extern void ffi_call_SYSV(void (*)(char *, extended_cif *), extended_cif *,
-                          unsigned, unsigned, unsigned *, void (*fn)(void));
-
-void ffi_call(ffi_cif *cif, void (*fn)(void), void *rvalue, void **avalue)
+static ffi_arg
+extend_basic_type(void *arg, int type)
 {
-  extended_cif ecif;
+  switch (type)
+    {
+    case FFI_TYPE_SINT8:
+      return *(SINT8 *)arg;
+    case FFI_TYPE_UINT8:
+      return *(UINT8 *)arg;
+    case FFI_TYPE_SINT16:
+      return *(SINT16 *)arg;
+    case FFI_TYPE_UINT16:
+      return *(UINT16 *)arg;
+
+    case FFI_TYPE_SINT32:
+    case FFI_TYPE_UINT32:
+    case FFI_TYPE_POINTER:
+    case FFI_TYPE_FLOAT:
+      return *(UINT32 *)arg;
+
+    default:
+      abort();
+    }
+}
 
-  ecif.cif = cif;
-  ecif.avalue = avalue;
-  
-  /* If the return value is a struct and we don't have a return */
-  /* value address then we need to make one                     */
+struct call_frame
+{
+  void *ebp;		/* 0 */
+  void *retaddr;	/* 4 */
+  void (*fn)(void);	/* 8 */
+  int flags;		/* 12 */
+  void *rvalue;		/* 16 */
+  unsigned regs[3];	/* 20-28 */
+};
+
+struct abi_params
+{
+  int dir;		/* parameter growth direction */
+  int nregs;		/* number of register parameters */
+  int regs[3];
+};
+
+static const struct abi_params abi_params[FFI_LAST_ABI] = {
+  [FFI_SYSV] = { 1, 0 },
+  [FFI_THISCALL] = { 1, 1, { R_ECX } },
+  [FFI_FASTCALL] = { 1, 2, { R_ECX, R_EDX } },
+  [FFI_STDCALL] = { 1, 0 },
+  [FFI_PASCAL] = { -1, 0 },
+  [FFI_REGISTER] = { -1, 3, { R_EAX, R_EDX, R_ECX } },
+  [FFI_MS_CDECL] = { 1, 0 }
+};
+
+extern void ffi_call_i386(struct call_frame *, char *)
+	FFI_HIDDEN __declspec(fastcall);
 
-  if (rvalue == NULL
-      && (cif->flags == FFI_TYPE_STRUCT
-          || cif->flags == FFI_TYPE_MS_STRUCT))
+void
+ffi_call (ffi_cif *cif, void (*fn)(void), void *rvalue, void **avalue)
+{
+  size_t rsize, bytes;
+  struct call_frame *frame;
+  char *stack, *argp;
+  ffi_type **arg_types;
+  int flags, cabi, i, n, dir, narg_reg;
+  const struct abi_params *pabi;
+
+  flags = cif->flags;
+  cabi = cif->abi;
+  pabi = &abi_params[cabi];
+  dir = pabi->dir;
+
+  rsize = 0;
+  if (rvalue == NULL)
     {
-      ecif.rvalue = alloca(cif->rtype->size);
+      switch (flags)
+	{
+	case X86_RET_FLOAT:
+	case X86_RET_DOUBLE:
+	case X86_RET_LDOUBLE:
+	case X86_RET_STRUCTPOP:
+	case X86_RET_STRUCTARG:
+	  /* The float cases need to pop the 387 stack.
+	     The struct cases need to pass a valid pointer to the callee.  */
+	  rsize = cif->rtype->size;
+	  break;
+	default:
+	  /* We can pretend that the callee returns nothing.  */
+	  flags = X86_RET_VOID;
+	  break;
+	}
     }
-  else
-    ecif.rvalue = rvalue;
-    
-  
-  switch (cif->abi) 
+
+  bytes = cif->bytes;
+  stack = alloca(bytes + sizeof(*frame) + rsize);
+  argp = (dir < 0 ? stack + bytes : stack);
+  frame = (struct call_frame *)(stack + bytes);
+  if (rsize)
+    rvalue = frame + 1;
+
+  frame->fn = fn;
+  frame->flags = flags;
+  frame->rvalue = rvalue;
+
+  narg_reg = 0;
+  switch (flags)
     {
-#ifndef X86_WIN32
-    case FFI_SYSV:
-      ffi_call_SYSV(ffi_prep_args, &ecif, cif->bytes, cif->flags, ecif.rvalue,
-                    fn);
-      break;
-#else
-    case FFI_SYSV:
-    case FFI_MS_CDECL:
-#endif
-    case FFI_STDCALL:
-    case FFI_THISCALL:
-    case FFI_FASTCALL:
-    case FFI_PASCAL:
-    case FFI_REGISTER:
-      ffi_call_win32(ffi_prep_args, &ecif, cif->abi, cif->bytes, cif->flags,
-                     ecif.rvalue, fn);
-      break;
-    default:
-      FFI_ASSERT(0);
+    case X86_RET_STRUCTARG:
+      /* The pointer is passed as the first argument.  */
+      if (pabi->nregs > 0)
+	{
+	  frame->regs[pabi->regs[0]] = (unsigned)rvalue;
+	  narg_reg = 1;
+	  break;
+	}
+      /* fallthru */
+    case X86_RET_STRUCTPOP:
+      *(void **)argp = rvalue;
+      argp += sizeof(void *);
       break;
     }
+
+  arg_types = cif->arg_types;
+  for (i = 0, n = cif->nargs; i < n; i++)
+    {
+      ffi_type *ty = arg_types[i];
+      void *valp = avalue[i];
+      size_t z = ty->size;
+      int t = ty->type;
+
+      if (z <= FFI_SIZEOF_ARG && t != FFI_TYPE_STRUCT)
+        {
+	  ffi_arg val = extend_basic_type (valp, t);
+
+	  if (t != FFI_TYPE_FLOAT && narg_reg < pabi->nregs)
+	    frame->regs[pabi->regs[narg_reg++]] = val;
+	  else if (dir < 0)
+	    {
+	      argp -= 4;
+	      *(ffi_arg *)argp = val;
+	    }
+	  else
+	    {
+	      *(ffi_arg *)argp = val;
+	      argp += 4;
+	    }
+	}
+      else
+	{
+	  size_t za = ALIGN (z, FFI_SIZEOF_ARG);
+	  if (dir < 0)
+	    {
+	      argp -= za;
+	      memcpy (argp, valp, z);
+	    }
+	  else
+	    {
+	      memcpy (argp, valp, z);
+	      argp += za;
+	    }
+	}
+    }
+  FFI_ASSERT (dir > 0 || argp == stack);
+
+  ffi_call_i386 (frame, stack);
 }
 
 
@@ -641,88 +642,92 @@ ffi_prep_raw_closure_loc (ffi_raw_closure* closure,
   return FFI_OK;
 }
 
-static unsigned int 
-ffi_prep_args_raw(char *stack, extended_cif *ecif)
+void
+ffi_raw_call(ffi_cif *cif, void (*fn)(void), void *rvalue, ffi_raw *avalue)
 {
-  const ffi_cif *cif = ecif->cif;
-  unsigned int i, passed_regs = 0;
-  
-  const unsigned int abi = cif->abi;
-  const unsigned int max_regs = (abi == FFI_THISCALL) ? 1
-                              : (abi == FFI_FASTCALL) ? 2
-                              : (abi == FFI_REGISTER) ? 3
-                              : 0;
-
-  if (cif->flags == FFI_TYPE_STRUCT)
-    ++passed_regs;
-  
-  for (i = 0; i < cif->nargs && passed_regs <= max_regs; i++)
+  size_t rsize, bytes;
+  struct call_frame *frame;
+  char *stack, *argp;
+  ffi_type **arg_types;
+  int flags, cabi, i, n, narg_reg;
+  const struct abi_params *pabi;
+
+  flags = cif->flags;
+  cabi = cif->abi;
+  pabi = &abi_params[cabi];
+
+  rsize = 0;
+  if (rvalue == NULL)
     {
-      if (cif->arg_types[i]->type == FFI_TYPE_FLOAT
-         || cif->arg_types[i]->type == FFI_TYPE_STRUCT)
-        continue;
-
-      size_t sz = cif->arg_types[i]->size;
-      if (sz == 0 || sz > FFI_SIZEOF_ARG)
-        continue;
-
-      ++passed_regs;
+      switch (flags)
+	{
+	case X86_RET_FLOAT:
+	case X86_RET_DOUBLE:
+	case X86_RET_LDOUBLE:
+	case X86_RET_STRUCTPOP:
+	case X86_RET_STRUCTARG:
+	  /* The float cases need to pop the 387 stack.
+	     The struct cases need to pass a valid pointer to the callee.  */
+	  rsize = cif->rtype->size;
+	  break;
+	default:
+	  /* We can pretend that the callee returns nothing.  */
+	  flags = X86_RET_VOID;
+	  break;
+	}
     }
 
-  memcpy (stack, ecif->avalue, cif->bytes);
-  return passed_regs;
-}
-
-/* we borrow this routine from libffi (it must be changed, though, to
- * actually call the function passed in the first argument.  as of
- * libffi-1.20, this is not the case.)
- */
+  bytes = cif->bytes;
+  argp = stack = alloca(bytes + sizeof(*frame) + rsize);
+  frame = (struct call_frame *)(stack + bytes);
+  if (rsize)
+    rvalue = frame + 1;
 
-void
-ffi_raw_call(ffi_cif *cif, void (*fn)(void), void *rvalue, ffi_raw *fake_avalue)
-{
-  extended_cif ecif;
-  void **avalue = (void **)fake_avalue;
-
-  ecif.cif = cif;
-  ecif.avalue = avalue;
-  
-  /* If the return value is a struct and we don't have a return */
-  /* value address then we need to make one                     */
-
-  if (rvalue == NULL
-      && (cif->flags == FFI_TYPE_STRUCT
-          || cif->flags == FFI_TYPE_MS_STRUCT))
+  narg_reg = 0;
+  switch (flags)
     {
-      ecif.rvalue = alloca(cif->rtype->size);
+    case X86_RET_STRUCTARG:
+      /* The pointer is passed as the first argument.  */
+      if (pabi->nregs > 0)
+	{
+	  frame->regs[pabi->regs[0]] = (unsigned)rvalue;
+	  narg_reg = 1;
+	  break;
+	}
+      /* fallthru */
+    case X86_RET_STRUCTPOP:
+      *(void **)argp = rvalue;
+      argp += sizeof(void *);
+      bytes -= sizeof(void *);
+      break;
     }
-  else
-    ecif.rvalue = rvalue;
-    
-  
-  switch (cif->abi) 
+
+  arg_types = cif->arg_types;
+  for (i = 0, n = cif->nargs; narg_reg < pabi->nregs && i < n; i++)
     {
-#ifndef X86_WIN32
-    case FFI_SYSV:
-      ffi_call_SYSV(ffi_prep_args_raw, &ecif, cif->bytes, cif->flags,
-                    ecif.rvalue, fn);
-      break;
-#else
-    case FFI_SYSV:
-    case FFI_MS_CDECL:
-#endif
-    case FFI_STDCALL:
-    case FFI_THISCALL:
-    case FFI_FASTCALL:
-    case FFI_PASCAL:
-    case FFI_REGISTER:
-      ffi_call_win32(ffi_prep_args_raw, &ecif, cif->abi, cif->bytes, cif->flags,
-                     ecif.rvalue, fn);
-      break;
-    default:
-      FFI_ASSERT(0);
-      break;
+      ffi_type *ty = arg_types[i];
+      size_t z = ty->size;
+      int t = ty->type;
+
+      if (z <= FFI_SIZEOF_ARG && t != FFI_TYPE_STRUCT && t != FFI_TYPE_FLOAT)
+	{
+	  ffi_arg val = extend_basic_type (avalue, t);
+	  frame->regs[pabi->regs[narg_reg++]] = val;
+	  z = FFI_SIZEOF_ARG;
+	}
+      else
+	{
+	  memcpy (argp, avalue, z);
+	  z = ALIGN (z, FFI_SIZEOF_ARG);
+	  argp += z;
+	}
+      avalue += z;
+      bytes -= z;
     }
+  if (i < n)
+    memcpy (argp, avalue, bytes);
+
+  ffi_call_i386 (frame, stack);
 }
 #endif /* !FFI_NO_RAW_API */
 #endif /* !__x86_64__ */
diff --git a/src/x86/ffitarget.h b/src/x86/ffitarget.h
index a4c9573..91e429c 100644
--- a/src/x86/ffitarget.h
+++ b/src/x86/ffitarget.h
@@ -98,11 +98,7 @@ typedef enum ffi_abi {
   FFI_PASCAL    = 6,
   FFI_REGISTER  = 7,
   FFI_LAST_ABI,
-# ifdef _MSC_VER
   FFI_DEFAULT_ABI = FFI_MS_CDECL
-# else
-  FFI_DEFAULT_ABI = FFI_SYSV
-# endif
 #else
   FFI_FIRST_ABI = 0,
   FFI_SYSV      = 1,
diff --git a/src/x86/internal.h b/src/x86/internal.h
new file mode 100644
index 0000000..480c1d0
--- /dev/null
+++ b/src/x86/internal.h
@@ -0,0 +1,23 @@
+#define X86_RET_FLOAT		0
+#define X86_RET_DOUBLE		1
+#define X86_RET_LDOUBLE		2
+#define X86_RET_SINT8		3
+#define X86_RET_SINT16		4
+#define X86_RET_UINT8		5
+#define X86_RET_UINT16		6
+#define X86_RET_INT64		7
+#define X86_RET_INT32		8
+#define X86_RET_VOID		9
+#define X86_RET_STRUCTPOP	10
+#define X86_RET_STRUCTARG       11
+#define X86_RET_STRUCT_1B	12
+#define X86_RET_STRUCT_2B	13
+#define X86_RET_UNUSED14	14
+#define X86_RET_UNUSED15	15
+
+#define X86_RET_TYPE_MASK	15
+#define X86_RET_POP_SHIFT	4
+
+#define R_EAX	0
+#define R_EDX	1
+#define R_ECX	2
diff --git a/src/x86/sysv.S b/src/x86/sysv.S
index fd13bc0..d0b8417 100644
--- a/src/x86/sysv.S
+++ b/src/x86/sysv.S
@@ -31,143 +31,144 @@
 #include <fficonfig.h>
 #include <ffi.h>
 #include <ffi_cfi.h>
+#include "internal.h"
 
-.text
-
-.globl ffi_prep_args
+#define C2(X, Y)  X ## Y
+#define C1(X, Y)  C2(X, Y)
+#ifdef __USER_LABEL_PREFIX__
+# define C(X)     C1(__USER_LABEL_PREFIX__, X)
+#else
+# define C(X)     X
+#endif
 
-	.align 4
-.globl ffi_call_SYSV
-        .type    ffi_call_SYSV,@function
+#ifdef __ELF__
+# define ENDF(X)  .type	X,@function; .size X, . - X
+#else
+# define ENDF(X)
+#endif
 
-ffi_call_SYSV:
-	cfi_startproc
-        pushl %ebp
-	cfi_adjust_cfa_offset(4)
-	cfi_rel_offset(%ebp, 0)
-        movl  %esp,%ebp
-	cfi_def_cfa_register(%ebp)
-	/* Make room for all of the new args.  */
-	movl  16(%ebp),%ecx
-	subl  %ecx,%esp
+/* This macro allows the safe creation of jump tables without an
+   actual table.  The entry points into the table are all 8 bytes.
+   The use of ORG asserts that we're at the correct location.  */
+#define E(X)      .align 8; .org 0b + X * 8
 
-        /* Align the stack pointer to 16-bytes */
-        andl  $0xfffffff0, %esp
+	.text
+	.align	16
+	.globl	C(ffi_call_i386)
+	FFI_HIDDEN(C(ffi_call_i386))
 
-	movl  %esp,%eax
+/* This is declared as
 
-	/* Place all of the ffi_prep_args in position  */
-	pushl 12(%ebp)
-	pushl %eax
-	call  *8(%ebp)
+   void ffi_call_i386(struct ffi_call_frame *frame, char *argp)
+        __attribute__((fastcall));
 
-	/* Return stack to previous state and call the function  */
-	addl  $8,%esp	
+   This the arguments are present in
 
-	call  *28(%ebp)
+        ecx: frame
+        edx: argp
+*/
 
-	/* Load %ecx with the return type code  */
-	movl  20(%ebp),%ecx	
+C(ffi_call_i386):
+	cfi_startproc
+	movl	(%esp), %eax		/* move the return address */
+	movl	%ebp, (%ecx)		/* store %ebp into local frame */
+	movl	%eax, 4(%ecx)		/* store retaddr into local frame */
+
+	/* New stack frame based off ebp.  This is a itty bit of unwind
+	   trickery in that the CFA *has* changed.  There is no easy way
+	   to describe it correctly on entry to the function.  Fortunately,
+	   it doesn't matter too much since at all points we can correctly
+	   unwind back to ffi_call.  Note that the location to which we
+	   moved the return address is (the new) CFA-4, so from the
+	   perspective of the unwind info, it hasn't moved.  */
+	movl	%ecx, %ebp
+	cfi_def_cfa(%ebp, 8)
+	cfi_rel_offset(%ebp, 0)
 
-	/* Protect %esi.  We're going to pop it in the epilogue.  */
-	pushl %esi
+	movl	%edx, %esp		/* set outgoing argument stack */
+	movl	20+R_EAX*4(%ebp), %eax	/* set register arguments */
+	movl	20+R_EDX*4(%ebp), %edx
+	movl	20+R_ECX*4(%ebp), %ecx
 
-	/* If the return value pointer is NULL, assume no return value.  */
-	cmpl  $0,24(%ebp)
-	jne  0f
+	call	*8(%ebp)
 
-	/* Even if there is no space for the return value, we are 
-	   obliged to handle floating-point values.  */
-	cmpl  $FFI_TYPE_FLOAT,%ecx
-	jne   noretval
-	fstp  %st(0)
+	movl	12(%ebp), %ecx		/* load return type code */
+	movl	%ebx, 8(%ebp)		/* preserve %ebx */
+	cfi_rel_offset(%ebp, 8)
 
-        jmp   epilogue
+	andl	$X86_RET_TYPE_MASK, %ecx
+#ifdef __PIC__
+	call	__x86.get_pc_thunk.bx
+1:	leal	0f-1b(%ebx, %ecx, 8), %ebx
+#else
+	leal	0f(,%ecx, 8), %ebx
+#endif
+	movl	16(%ebp), %ecx		/* load result address */
+	jmp	*%ebx
 
+	.align	8
 0:
-	call  1f
-
-.Lstore_table:
-	.long	noretval-.Lstore_table	/* FFI_TYPE_VOID */
-	.long	retint-.Lstore_table	/* FFI_TYPE_INT */
-	.long	retfloat-.Lstore_table	/* FFI_TYPE_FLOAT */
-	.long	retdouble-.Lstore_table	/* FFI_TYPE_DOUBLE */
-	.long	retlongdouble-.Lstore_table	/* FFI_TYPE_LONGDOUBLE */
-	.long	retuint8-.Lstore_table	/* FFI_TYPE_UINT8 */
-	.long	retsint8-.Lstore_table	/* FFI_TYPE_SINT8 */
-	.long	retuint16-.Lstore_table	/* FFI_TYPE_UINT16 */
-	.long	retsint16-.Lstore_table	/* FFI_TYPE_SINT16 */
-	.long	retint-.Lstore_table	/* FFI_TYPE_UINT32 */
-	.long	retint-.Lstore_table	/* FFI_TYPE_SINT32 */
-	.long	retint64-.Lstore_table	/* FFI_TYPE_UINT64 */
-	.long	retint64-.Lstore_table	/* FFI_TYPE_SINT64 */
-	.long	retstruct-.Lstore_table	/* FFI_TYPE_STRUCT */
-	.long	retint-.Lstore_table	/* FFI_TYPE_POINTER */
-
-1:
-	pop  %esi
-	add  (%esi, %ecx, 4), %esi
-	jmp  *%esi
-
-	/* Sign/zero extend as appropriate.  */
-retsint8:
-	movsbl  %al, %eax
-	jmp  retint
-
-retsint16:
-	movswl  %ax, %eax
-	jmp  retint
-
-retuint8:
-	movzbl  %al, %eax
-	jmp  retint
-
-retuint16:
-	movzwl  %ax, %eax
-	jmp  retint
-
-retfloat:
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx	
-	fstps (%ecx)
-	jmp   epilogue
-
-retdouble:
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx	
-	fstpl (%ecx)
-	jmp   epilogue
-
-retlongdouble:
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx	
-	fstpt (%ecx)
-	jmp   epilogue
-	
-retint64:	
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx	
-	movl  %eax,0(%ecx)
-	movl  %edx,4(%ecx)
-	jmp   epilogue
-	
-retint:
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx	
-	movl  %eax,0(%ecx)
-
-retstruct:
-	/* Nothing to do!  */
-
-noretval:
-epilogue:
-        popl %esi
-        movl %ebp,%esp
-        popl %ebp
-        ret
+E(X86_RET_FLOAT)
+	fstps	(%ecx)
+	jmp	9f
+E(X86_RET_DOUBLE)
+	fstpl	(%ecx)
+	jmp	9f
+E(X86_RET_LDOUBLE)
+	fstpt	(%ecx)
+	jmp	9f
+E(X86_RET_SINT8)
+	movsbl	%al, %eax
+	mov	%eax, (%ecx)
+	jmp	9f
+E(X86_RET_SINT16)
+	movswl	%ax, %eax
+	mov	%eax, (%ecx)
+	jmp	9f
+E(X86_RET_UINT8)
+	movzbl	%al, %eax
+	mov	%eax, (%ecx)
+	jmp	9f
+E(X86_RET_UINT16)
+	movzwl	%ax, %eax
+	mov	%eax, (%ecx)
+	jmp	9f
+E(X86_RET_INT64)
+	movl	%edx, 4(%ecx)
+	/* fallthru */
+E(X86_RET_INT32)
+	movl	%eax, (%ecx)
+	/* fallthru */
+E(X86_RET_VOID)
+9:	movl	8(%ebp), %ebx
+	movl	%ebp, %esp
+	popl	%ebp
+	cfi_remember_state
+	cfi_def_cfa(%esp, 4)
+	cfi_restore(%ebx)
+	cfi_restore(%ebp)
+	ret
+	cfi_restore_state
+
+E(X86_RET_STRUCTPOP)
+	jmp	9b
+E(X86_RET_STRUCTARG)
+	jmp	9b
+E(X86_RET_STRUCT_1B)
+	movb	%al, (%ecx)
+	jmp	9b
+E(X86_RET_STRUCT_2B)
+	movw	%ax, (%ecx)
+	jmp	9b
+
+	/* Fill out the table so that bad values are predictable.  */
+E(X86_RET_UNUSED14)
+	ud2
+E(X86_RET_UNUSED15)
+	ud2
+
 	cfi_endproc
-.ffi_call_SYSV_end:
-        .size    ffi_call_SYSV,.ffi_call_SYSV_end-ffi_call_SYSV
+ENDF(C(ffi_call_i386))
 
 	.align	4
 FFI_HIDDEN (ffi_closure_SYSV)
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 08/13] testsuite: Fix return_complex2 vs excessive precision
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
                   ` (10 preceding siblings ...)
  2014-11-07 15:31 ` [PATCH 06/13] x86: Rewrite ffi_call Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 01/13] x86: Tidy ffi_abi Richard Henderson
  2014-11-07 16:09 ` [PATCH 00/13] Go closures for i686 Richard Henderson
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

Use the previously computed rc2 to validate, rather than
recomputing a floating point result with excess precision.
---
 testsuite/libffi.call/return_complex2.inc | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/testsuite/libffi.call/return_complex2.inc b/testsuite/libffi.call/return_complex2.inc
index dad4a0f..265170b 100644
--- a/testsuite/libffi.call/return_complex2.inc
+++ b/testsuite/libffi.call/return_complex2.inc
@@ -2,10 +2,14 @@
 #include "ffitest.h"
 #include <complex.h>
 
-static _Complex T_C_TYPE return_c(_Complex T_C_TYPE c1, _Complex T_C_TYPE c2, unsigned int in3, _Complex T_C_TYPE c4)
+_Complex T_C_TYPE
+return_c(_Complex T_C_TYPE c1, _Complex T_C_TYPE c2,
+	 unsigned int in3, _Complex T_C_TYPE c4)
 {
-  return c1 + c2 + in3 + c4;
+  volatile _Complex T_C_TYPE r = c1 + c2 + in3 + c4;
+  return r;
 }
+
 int main (void)
 {
   ffi_cif cif;
@@ -35,6 +39,6 @@ int main (void)
   printf ("%f,%fi vs %f,%fi\n",
 	  T_CONV creal (rc), T_CONV cimag (rc),
 	  T_CONV creal (rc2), T_CONV cimag (rc2));
-  CHECK(rc ==  c1 + c2 + in3 + c4);
+  CHECK(rc == rc2);
   exit(0);
 }
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 09/13] x86: Add support for Complex
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
                   ` (4 preceding siblings ...)
  2014-11-07 15:31 ` [PATCH 07/13] x86: Rewrite closures Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 11/13] x86: Use win32 name mangling for fastcall functions Richard Henderson
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

---
 src/x86/ffi.c                  | 27 +++++++++++++++++++++++++++
 testsuite/libffi.call/call.exp |  3 ++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/src/x86/ffi.c b/src/x86/ffi.c
index 40e47d2..a0d0cf3 100644
--- a/src/x86/ffi.c
+++ b/src/x86/ffi.c
@@ -120,6 +120,7 @@ ffi_prep_cif_machdep(ffi_cif *cif)
       else
 #endif
 	{
+	do_struct:
 	  switch (cabi)
 	    {
 	    case FFI_THISCALL:
@@ -136,6 +137,32 @@ ffi_prep_cif_machdep(ffi_cif *cif)
 	  bytes += ALIGN (sizeof(void*), FFI_SIZEOF_ARG);
 	}
       break;
+    case FFI_TYPE_COMPLEX:
+      switch (cif->rtype->elements[0]->type)
+	{
+	case FFI_TYPE_DOUBLE:
+	case FFI_TYPE_LONGDOUBLE:
+	case FFI_TYPE_SINT64:
+	case FFI_TYPE_UINT64:
+	  goto do_struct;
+	case FFI_TYPE_FLOAT:
+	case FFI_TYPE_INT:
+	case FFI_TYPE_SINT32:
+	case FFI_TYPE_UINT32:
+	  flags = X86_RET_INT64;
+	  break;
+	case FFI_TYPE_SINT16:
+	case FFI_TYPE_UINT16:
+	  flags = X86_RET_INT32;
+	  break;
+	case FFI_TYPE_SINT8:
+	case FFI_TYPE_UINT8:
+	  flags = X86_RET_STRUCT_2B;
+	  break;
+	default:
+	  return FFI_BAD_TYPEDEF;
+	}
+      break;
     default:
       return FFI_BAD_TYPEDEF;
     }
diff --git a/testsuite/libffi.call/call.exp b/testsuite/libffi.call/call.exp
index 55de25c..d42dae5 100644
--- a/testsuite/libffi.call/call.exp
+++ b/testsuite/libffi.call/call.exp
@@ -27,7 +27,8 @@ run-many-tests $tlist ""
 # ??? We really should preprocess ffi.h and grep
 # for FFI_TARGET_HAS_COMPLEX_TYPE.
 if { [istarget s390*]
-     || [istarget x86_64*] } {
+     || [istarget x86_64*]
+     || [istarget i?86*] } {
   run-many-tests $ctlist ""
 } else {
     foreach test $ctlist {
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 12/13] testsuite: Add two dg-do run markers
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
  2014-11-07 15:30 ` [PATCH 02/13] x86: Remove some conditional compilation Richard Henderson
  2014-11-07 15:31 ` [PATCH 10/13] x86: Add support for Go closures Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 05/13] ffi_cif: Add cfa_escape Richard Henderson
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

Caught by clang warning about unused -L parameter.
---
 testsuite/libffi.call/float2.c     | 1 +
 testsuite/libffi.call/return_ldl.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/testsuite/libffi.call/float2.c b/testsuite/libffi.call/float2.c
index aae1abf..419c2bd 100644
--- a/testsuite/libffi.call/float2.c
+++ b/testsuite/libffi.call/float2.c
@@ -3,6 +3,7 @@
    Limitations:	none.
    PR:		none.
    Originator:	From the original ffitest.c  */
+/* { dg-do run } */
 
 #include "ffitest.h"
 #include "float.h"
diff --git a/testsuite/libffi.call/return_ldl.c b/testsuite/libffi.call/return_ldl.c
index 520e710..52a92fe 100644
--- a/testsuite/libffi.call/return_ldl.c
+++ b/testsuite/libffi.call/return_ldl.c
@@ -3,6 +3,7 @@
    Limitations:	none.
    PR:		none.
    Originator:	<andreast@gcc.gnu.org> 20071113  */
+/* { dg-do run } */
 
 #include "ffitest.h"
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 10/13] x86: Add support for Go closures
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
  2014-11-07 15:30 ` [PATCH 02/13] x86: Remove some conditional compilation Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 12/13] testsuite: Add two dg-do run markers Richard Henderson
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

---
 src/x86/ffi.c       | 70 ++++++++++++++++++++++++++++++++++++++++++++++-------
 src/x86/ffitarget.h |  2 +-
 src/x86/sysv.S      | 34 +++++++++++++++++++++++++-
 3 files changed, 95 insertions(+), 11 deletions(-)

diff --git a/src/x86/ffi.c b/src/x86/ffi.c
index a0d0cf3..359a864 100644
--- a/src/x86/ffi.c
+++ b/src/x86/ffi.c
@@ -218,25 +218,28 @@ struct call_frame
 struct abi_params
 {
   int dir;		/* parameter growth direction */
+  int static_chain;	/* the static chain register used by gcc */
   int nregs;		/* number of register parameters */
   int regs[3];
 };
 
 static const struct abi_params abi_params[FFI_LAST_ABI] = {
-  [FFI_SYSV] = { 1, 0 },
-  [FFI_THISCALL] = { 1, 1, { R_ECX } },
-  [FFI_FASTCALL] = { 1, 2, { R_ECX, R_EDX } },
-  [FFI_STDCALL] = { 1, 0 },
-  [FFI_PASCAL] = { -1, 0 },
-  [FFI_REGISTER] = { -1, 3, { R_EAX, R_EDX, R_ECX } },
-  [FFI_MS_CDECL] = { 1, 0 }
+  [FFI_SYSV] = { 1, R_ECX, 0 },
+  [FFI_THISCALL] = { 1, R_EAX, 1, { R_ECX } },
+  [FFI_FASTCALL] = { 1, R_EAX, 2, { R_ECX, R_EDX } },
+  [FFI_STDCALL] = { 1, R_ECX, 0 },
+  [FFI_PASCAL] = { -1, R_ECX, 0 },
+  /* ??? No defined static chain; gcc does not support REGISTER.  */
+  [FFI_REGISTER] = { -1, R_ECX, 3, { R_EAX, R_EDX, R_ECX } },
+  [FFI_MS_CDECL] = { 1, R_ECX, 0 }
 };
 
 extern void ffi_call_i386(struct call_frame *, char *)
 	FFI_HIDDEN __declspec(fastcall);
 
-void
-ffi_call (ffi_cif *cif, void (*fn)(void), void *rvalue, void **avalue)
+static void
+ffi_call_int (ffi_cif *cif, void (*fn)(void), void *rvalue,
+	      void **avalue, void *closure)
 {
   size_t rsize, bytes;
   struct call_frame *frame;
@@ -281,6 +284,7 @@ ffi_call (ffi_cif *cif, void (*fn)(void), void *rvalue, void **avalue)
   frame->fn = fn;
   frame->flags = flags;
   frame->rvalue = rvalue;
+  frame->regs[pabi->static_chain] = (unsigned)closure;
 
   narg_reg = 0;
   switch (flags)
@@ -345,6 +349,18 @@ ffi_call (ffi_cif *cif, void (*fn)(void), void *rvalue, void **avalue)
   ffi_call_i386 (frame, stack);
 }
 
+void
+ffi_call (ffi_cif *cif, void (*fn)(void), void *rvalue, void **avalue)
+{
+  ffi_call_int (cif, fn, rvalue, avalue, NULL);
+}
+
+void
+ffi_call_go (ffi_cif *cif, void (*fn)(void), void *rvalue,
+	     void **avalue, void *closure)
+{
+  ffi_call_int (cif, fn, rvalue, avalue, closure);
+}
 
 /** private members **/
 
@@ -493,6 +509,42 @@ ffi_prep_closure_loc (ffi_closure* closure,
   return FFI_OK;
 }
 
+void FFI_HIDDEN ffi_go_closure_EAX(void);
+void FFI_HIDDEN ffi_go_closure_ECX(void);
+void FFI_HIDDEN ffi_go_closure_STDCALL(void);
+
+ffi_status
+ffi_prep_go_closure (ffi_go_closure* closure, ffi_cif* cif,
+		     void (*fun)(ffi_cif*,void*,void**,void*))
+{
+  void (*dest)(void);
+
+  switch (cif->abi)
+    {
+    case FFI_SYSV:
+    case FFI_MS_CDECL:
+      dest = ffi_go_closure_ECX;
+      break;
+    case FFI_THISCALL:
+    case FFI_FASTCALL:
+      dest = ffi_go_closure_ECX;
+      break;
+    case FFI_STDCALL:
+    case FFI_PASCAL:
+      dest = ffi_go_closure_STDCALL;
+      break;
+    case FFI_REGISTER:
+    default:
+      return FFI_BAD_ABI;
+    }
+
+  closure->tramp = dest;
+  closure->cif = cif;
+  closure->fun = fun;
+
+  return FFI_OK;
+}
+
 /* ------- Native raw API support -------------------------------- */
 
 #if !FFI_NO_RAW_API
diff --git a/src/x86/ffitarget.h b/src/x86/ffitarget.h
index 8fff29f..580522f 100644
--- a/src/x86/ffitarget.h
+++ b/src/x86/ffitarget.h
@@ -117,6 +117,7 @@ typedef enum ffi_abi {
 /* ---- Definitions for closures ----------------------------------------- */
 
 #define FFI_CLOSURES 1
+#define FFI_GO_CLOSURES 1
 
 #define FFI_TYPE_SMALL_STRUCT_1B (FFI_TYPE_LAST + 1)
 #define FFI_TYPE_SMALL_STRUCT_2B (FFI_TYPE_LAST + 2)
@@ -127,7 +128,6 @@ typedef enum ffi_abi {
     || (defined (__x86_64__) && defined (X86_DARWIN))
 # define FFI_TRAMPOLINE_SIZE 24
 # define FFI_NATIVE_RAW_API 0
-# define FFI_GO_CLOSURES 1
 #else
 # define FFI_TRAMPOLINE_SIZE 12
 # define FFI_NATIVE_RAW_API 1  /* x86 has native raw api support */
diff --git a/src/x86/sysv.S b/src/x86/sysv.S
index 7b898ae..f412b7a 100644
--- a/src/x86/sysv.S
+++ b/src/x86/sysv.S
@@ -228,6 +228,28 @@ ENDF(C(ffi_call_i386))
 	jmp	*%eax
 .endm
 
+.macro	FFI_GO_CLOSURE suffix, chain, t1, t2
+	.align	16
+	.globl	C(ffi_go_closure_\suffix)
+	FFI_HIDDEN(C(ffi_go_closure_\suffix))
+C(ffi_go_closure_\suffix):
+	cfi_startproc
+	subl	$closure_FS, %esp
+	cfi_adjust_cfa_offset(closure_FS)
+	FFI_CLOSURE_SAVE_REGS
+	movl	4(\chain), \t1		/* copy cif */
+	movl	8(\chain), \t2		/* copy fun */
+	movl	\t1, 28(%esp)
+	movl	\t2, 32(%esp)
+	movl	\chain, 36(%esp)	/* closure is user_data */
+	jmp	88f
+	cfi_endproc
+ENDF(C(ffi_go_closure_\suffix))
+.endm
+
+FFI_GO_CLOSURE EAX, %eax, %edx, %ecx
+FFI_GO_CLOSURE ECX, %ecx, %edx, %eax
+
 /* The closure entry points are reached from the ffi_closure trampoline.
    On entry, %eax contains the address of the ffi_closure.  */
 
@@ -242,6 +264,9 @@ C(ffi_closure_i386):
 
 	FFI_CLOSURE_SAVE_REGS
 	FFI_CLOSURE_COPY_TRAMP_DATA
+
+88:	/* Entry point from preceeding Go closures.  */
+
 	FFI_CLOSURE_CALL_INNER
 	FFI_CLOSURE_MASK_AND_JUMP
 
@@ -303,6 +328,8 @@ E(X86_RET_UNUSED15)
 	cfi_endproc
 ENDF(C(ffi_closure_i386))
 
+FFI_GO_CLOSURE STDCALL, %ecx, %edx, %eax
+
 /* For REGISTER, we have no available parameter registers, and so we
    enter here having pushed the closure onto the stack.  */
 
@@ -339,8 +366,13 @@ C(ffi_closure_STDCALL):
 	cfi_adjust_cfa_offset(closure_FS)
 
 	FFI_CLOSURE_SAVE_REGS
-0:
+
+0:	/* Entry point from ffi_closure_REGISTER.  */
+
 	FFI_CLOSURE_COPY_TRAMP_DATA
+
+88:	/* Entry point from preceeding Go closure.  */
+
 	FFI_CLOSURE_CALL_INNER
 
 	movl	%eax, %ecx
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 11/13] x86: Use win32 name mangling for fastcall functions
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
                   ` (5 preceding siblings ...)
  2014-11-07 15:31 ` [PATCH 09/13] x86: Add support for Complex Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 04/13] x86: Convert to gas generated unwind info Richard Henderson
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

---
 src/x86/sysv.S | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/src/x86/sysv.S b/src/x86/sysv.S
index f412b7a..e6a8c1e 100644
--- a/src/x86/sysv.S
+++ b/src/x86/sysv.S
@@ -47,6 +47,15 @@
 # define ENDF(X)
 #endif
 
+/* Handle win32 fastcall name mangling.  */
+#ifdef X86_WIN32
+# define ffi_call_i386		@ffi_call_i386@8
+# define ffi_closure_inner	@ffi_closure_inner@8
+#else
+# define ffi_call_i386		C(ffi_call_i386)
+# define ffi_closure_inner	C(ffi_closure_inner)
+#endif
+
 /* This macro allows the safe creation of jump tables without an
    actual table.  The entry points into the table are all 8 bytes.
    The use of ORG asserts that we're at the correct location.  */
@@ -54,8 +63,8 @@
 
 	.text
 	.align	16
-	.globl	C(ffi_call_i386)
-	FFI_HIDDEN(C(ffi_call_i386))
+	.globl	ffi_call_i386
+	FFI_HIDDEN(ffi_call_i386)
 
 /* This is declared as
 
@@ -68,7 +77,7 @@
         edx: argp
 */
 
-C(ffi_call_i386):
+ffi_call_i386:
 	cfi_startproc
 	movl	(%esp), %eax		/* move the return address */
 	movl	%ebp, (%ecx)		/* store %ebp into local frame */
@@ -168,7 +177,7 @@ E(X86_RET_UNUSED15)
 	ud2
 
 	cfi_endproc
-ENDF(C(ffi_call_i386))
+ENDF(ffi_call_i386)
 
 /* The inner helper is declared as
 
@@ -210,9 +219,9 @@ ENDF(C(ffi_call_i386))
 	addl	$C(_GLOBAL_OFFSET_TABLE_), %ebx
 #endif
 #if defined HAVE_HIDDEN_VISIBILITY_ATTRIBUTE || !defined __PIC__
-	call	C(ffi_closure_inner)
+	call	ffi_closure_inner
 #else
-	call	C(ffi_closure_inner)@PLT
+	call	ffi_closure_inner@PLT
 #endif
 .endm
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 07/13] x86: Rewrite closures
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
                   ` (3 preceding siblings ...)
  2014-11-07 15:31 ` [PATCH 05/13] ffi_cif: Add cfa_escape Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 09/13] x86: Add support for Complex Richard Henderson
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

Move everything into sysv.S, removing win32.S and freebsd.S.
Handle all abis with a single ffi_closure_inner function.
Move complexity of the raw THISCALL trampoline into assembly
instead of the trampoline itself.
Only push the context for the REGISTER abi; let the rest
receive it in a register.
---
 Makefile.am         |   13 +-
 src/x86/ffi.c       |  389 +++++----------
 src/x86/ffitarget.h |    6 +-
 src/x86/freebsd.S   |  463 ------------------
 src/x86/sysv.S      |  607 +++++++++++++++++------
 src/x86/win32.S     | 1351 ---------------------------------------------------
 6 files changed, 593 insertions(+), 2236 deletions(-)
 delete mode 100644 src/x86/freebsd.S
 delete mode 100644 src/x86/win32.S

diff --git a/Makefile.am b/Makefile.am
index 3d1ecae..97a7bd3 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -37,8 +37,8 @@ EXTRA_DIST = LICENSE ChangeLog.v1 ChangeLog.libgcj			\
 	 src/sh64/sysv.S src/sh64/ffitarget.h src/sparc/v8.S		\
 	 src/sparc/v9.S src/sparc/ffitarget.h src/sparc/ffi.c		\
 	 src/x86/darwin64.S src/x86/ffi.c src/x86/sysv.S		\
-	 src/x86/win32.S src/x86/darwin.S src/x86/ffiw64.c src/x86/win64.S \
-	 src/x86/freebsd.S src/x86/ffi64.c src/x86/unix64.S		\
+	 src/x86/darwin.S src/x86/ffiw64.c src/x86/win64.S 		\
+	 src/x86/ffi64.c src/x86/unix64.S				\
 	 src/x86/ffitarget.h src/pa/ffitarget.h src/pa/ffi.c		\
 	 src/pa/linux.S src/pa/hpux32.S src/frv/ffi.c src/bfin/ffi.c	\
 	 src/bfin/ffitarget.h src/bfin/sysv.S src/frv/eabi.S		\
@@ -126,22 +126,19 @@ if BFIN
 nodist_libffi_la_SOURCES += src/bfin/ffi.c src/bfin/sysv.S
 endif
 if X86
-nodist_libffi_la_SOURCES += src/x86/ffi.c src/x86/sysv.S src/x86/win32.S
+nodist_libffi_la_SOURCES += src/x86/ffi.c src/x86/sysv.S
 endif
 if X86_FREEBSD
-nodist_libffi_la_SOURCES += src/x86/ffi.c src/x86/freebsd.S src/x86/win32.S
+nodist_libffi_la_SOURCES += src/x86/ffi.c src/x86/sysv.S
 endif
 if X86_WIN32
-nodist_libffi_la_SOURCES += src/x86/ffi.c src/x86/win32.S
+nodist_libffi_la_SOURCES += src/x86/ffi.c src/x86/sysv.S
 endif
 if X86_WIN64
 nodist_libffi_la_SOURCES += src/x86/ffiw64.c src/x86/win64.S
 endif
 if X86_DARWIN
 nodist_libffi_la_SOURCES += src/x86/ffi.c src/x86/darwin.S src/x86/ffi64.c src/x86/darwin64.S
-if X86_DARWIN32
-nodist_libffi_la_SOURCES += src/x86/win32.S
-endif
 endif
 if SPARC
 nodist_libffi_la_SOURCES += src/sparc/ffi.c src/sparc/v8.S src/sparc/v9.S
diff --git a/src/x86/ffi.c b/src/x86/ffi.c
index 1c77bb8..40e47d2 100644
--- a/src/x86/ffi.c
+++ b/src/x86/ffi.c
@@ -321,224 +321,105 @@ ffi_call (ffi_cif *cif, void (*fn)(void), void *rvalue, void **avalue)
 
 /** private members **/
 
-/* The following __attribute__((regparm(1))) decorations will have no effect
-   on MSVC or SUNPRO_C -- standard conventions apply. */
-static unsigned int ffi_prep_incoming_args (char *stack, void **ret,
-                                            void** args, ffi_cif* cif);
-void FFI_HIDDEN ffi_closure_SYSV (ffi_closure *)
-     __attribute__ ((regparm(1)));
-unsigned int FFI_HIDDEN ffi_closure_SYSV_inner (ffi_closure *, void **, void *)
-     __attribute__ ((regparm(1)));
-unsigned int FFI_HIDDEN ffi_closure_WIN32_inner (ffi_closure *, void **, void *)
-     __attribute__ ((regparm(1)));
-void FFI_HIDDEN ffi_closure_raw_SYSV (ffi_raw_closure *)
-     __attribute__ ((regparm(1)));
-#ifdef X86_WIN32
-void FFI_HIDDEN ffi_closure_raw_THISCALL (ffi_raw_closure *)
-     __attribute__ ((regparm(1)));
-#endif
-void FFI_HIDDEN ffi_closure_STDCALL (ffi_closure *);
-void FFI_HIDDEN ffi_closure_THISCALL (ffi_closure *);
-void FFI_HIDDEN ffi_closure_FASTCALL (ffi_closure *);
-void FFI_HIDDEN ffi_closure_REGISTER (ffi_closure *);
-
-/* This function is jumped to by the trampoline */
+void FFI_HIDDEN ffi_closure_i386(void);
+void FFI_HIDDEN ffi_closure_STDCALL(void);
+void FFI_HIDDEN ffi_closure_REGISTER(void);
 
-unsigned int FFI_HIDDEN __attribute__ ((regparm(1)))
-ffi_closure_SYSV_inner (ffi_closure *closure, void **respp, void *args)
+struct closure_frame
 {
-  /* our various things...  */
-  ffi_cif       *cif;
-  void         **arg_area;
-
-  cif         = closure->cif;
-  arg_area    = (void**) alloca (cif->nargs * sizeof (void*));  
-
-  /* this call will initialize ARG_AREA, such that each
-   * element in that array points to the corresponding 
-   * value on the stack; and if the function returns
-   * a structure, it will change RESP to point to the
-   * structure return address.  */
-
-  ffi_prep_incoming_args(args, respp, arg_area, cif);
-
-  (closure->fun) (cif, *respp, arg_area, closure->user_data);
-
-  return cif->flags;
-}
+  unsigned rettemp[4];				/* 0 */
+  unsigned regs[3];				/* 16-24 */
+  ffi_cif *cif;					/* 28 */
+  void (*fun)(ffi_cif*,void*,void**,void*);	/* 32 */
+  void *user_data;				/* 36 */
+};
 
-unsigned int FFI_HIDDEN __attribute__ ((regparm(1)))
-ffi_closure_WIN32_inner (ffi_closure *closure, void **respp, void *args)
+int FFI_HIDDEN __declspec(fastcall)
+ffi_closure_inner (struct closure_frame *frame, char *stack)
 {
-  /* our various things...  */
-  ffi_cif       *cif;
-  void         **arg_area;
-  unsigned int   ret;
-
-  cif         = closure->cif;
-  arg_area    = (void**) alloca (cif->nargs * sizeof (void*));  
-
-  /* this call will initialize ARG_AREA, such that each
-   * element in that array points to the corresponding 
-   * value on the stack; and if the function returns
-   * a structure, it will change RESP to point to the
-   * structure return address.  */
-
-  ret = ffi_prep_incoming_args(args, respp, arg_area, cif);
-
-  (closure->fun) (cif, *respp, arg_area, closure->user_data);
-
-  return ret;
-}
+  ffi_cif *cif = frame->cif;
+  int cabi, i, n, flags, dir, narg_reg;
+  const struct abi_params *pabi;
+  ffi_type **arg_types;
+  char *argp;
+  void *rvalue;
+  void **avalue;
 
-static unsigned int
-ffi_prep_incoming_args(char *stack, void **rvalue, void **avalue,
-                       ffi_cif *cif)
-{
-  register unsigned int i;
-  register void **p_argv;
-  register char *argp;
-  register ffi_type **p_arg;
-  const int cabi = cif->abi;
-  const int dir = (cabi == FFI_PASCAL || cabi == FFI_REGISTER) ? -1 : +1;
-  const unsigned int max_stack_count = (cabi == FFI_THISCALL) ? 1
-                                     : (cabi == FFI_FASTCALL) ? 2
-                                     : (cabi == FFI_REGISTER) ? 3
-                                     : 0;
-  unsigned int passed_regs = 0;
-  void *p_stack_data[3] = { stack - 1 };
-
-  argp = stack;
-  argp += max_stack_count * FFI_SIZEOF_ARG;
-
-  if ((cif->flags == FFI_TYPE_STRUCT
-       || cif->flags == FFI_TYPE_MS_STRUCT))
-    {
-      if (passed_regs < max_stack_count)
-        {
-          *rvalue = *(void**) (stack + (passed_regs*FFI_SIZEOF_ARG));
-          ++passed_regs;
-        }
-      else
-        {
-          *rvalue = *(void **) argp;
-          argp += sizeof(void *);
-        }
-    }
+  cabi = cif->abi;
+  flags = cif->flags;
+  narg_reg = 0;
+  rvalue = frame->rettemp;
+  pabi = &abi_params[cabi];
+  dir = pabi->dir;
+  argp = (dir < 0 ? stack + cif->bytes : stack);
 
-  /* Do register arguments first  */
-  for (i = 0, p_arg = cif->arg_types; 
-       i < cif->nargs && passed_regs < max_stack_count;
-       i++, p_arg++)
+  switch (flags)
     {
-      if ((*p_arg)->type == FFI_TYPE_FLOAT
-         || (*p_arg)->type == FFI_TYPE_STRUCT)
-        continue;
-
-      size_t sz = (*p_arg)->size;
-      if(sz == 0 || sz > FFI_SIZEOF_ARG)
-        continue;
-
-      p_stack_data[passed_regs] = avalue + i;
-      avalue[i] = stack + (passed_regs*FFI_SIZEOF_ARG);
-      ++passed_regs;
+    case X86_RET_STRUCTARG:
+      if (pabi->nregs > 0)
+	{
+	  rvalue = (void *)frame->regs[pabi->regs[0]];
+	  narg_reg = 1;
+	  frame->rettemp[0] = (unsigned)rvalue;
+	  break;
+	}
+      /* fallthru */
+    case X86_RET_STRUCTPOP:
+      rvalue = *(void **)argp;
+      argp += sizeof(void *);
+      break;
     }
 
-  p_arg = cif->arg_types;
-  p_argv = avalue;
-  if (dir < 0)
-    {
-      const int nargs = cif->nargs - 1;
-      if (nargs > 0)
-      {
-        p_arg  += nargs;
-        p_argv += nargs;
-      }
-    }
+  n = cif->nargs;
+  avalue = alloca(sizeof(void *) * n);
 
-  for (i = cif->nargs;
-       i != 0;
-       i--, p_arg += dir, p_argv += dir)
+  arg_types = cif->arg_types;
+  for (i = 0; i < n; ++i)
     {
-      /* Align if necessary */
-      if ((sizeof(void*) - 1) & (size_t) argp)
-        argp = (char *) ALIGN(argp, sizeof(void*));
-
-      size_t z = (*p_arg)->size;
+      size_t z = arg_types[i]->size;
+      int t = arg_types[i]->type;
+      void *valp;
 
-      if (passed_regs > 0
-          && z <= FFI_SIZEOF_ARG
-          && (p_argv == p_stack_data[0]
-            || p_argv == p_stack_data[1]
-            || p_argv == p_stack_data[2]))
-        {
-          /* Already assigned a register value */
-          continue;
-        }
+      if (z <= FFI_SIZEOF_ARG && t != FFI_TYPE_STRUCT)
+	{
+	  if (t != FFI_TYPE_FLOAT && narg_reg < pabi->nregs)
+	    valp = &frame->regs[pabi->regs[narg_reg++]];
+	  else if (dir < 0)
+	    {
+	      argp -= 4;
+	      valp = argp;
+	    }
+	  else
+	    {
+	      valp = argp;
+	      argp += 4;
+	    }
+	}
       else
-        {
-          /* because we're little endian, this is what it turns into.   */
-          *p_argv = (void*) argp;
-        }
+	{
+	  size_t za = ALIGN (z, FFI_SIZEOF_ARG);
+	  if (dir < 0)
+	    {
+	      argp -= za;
+	      valp = argp;
+	    }
+	  else
+	    {
+	      valp = argp;
+	      argp += za;
+	    }
+	}
 
-      argp += z;
+      avalue[i] = valp;
     }
 
-  return (size_t)argp - (size_t)stack;
-}
+  frame->fun (cif, rvalue, avalue, frame->user_data);
 
-/* How to make a trampoline.  Derived from gcc/config/i386/i386.c. */
-
-#define FFI_INIT_TRAMPOLINE(TRAMP,FUN,CTX) \
-{ unsigned char *__tramp = (unsigned char*)(TRAMP); \
-   unsigned int  __fun = (unsigned int)(FUN); \
-   unsigned int  __ctx = (unsigned int)(CTX); \
-   unsigned int  __dis = __fun - (__ctx + 10);  \
-   *(unsigned char*) &__tramp[0] = 0xb8; \
-   *(unsigned int*)  &__tramp[1] = __ctx; /* movl __ctx, %eax */ \
-   *(unsigned char*) &__tramp[5] = 0xe9; \
-   *(unsigned int*)  &__tramp[6] = __dis; /* jmp __fun  */ \
- }
-
-#define FFI_INIT_TRAMPOLINE_RAW_THISCALL(TRAMP,FUN,CTX,SIZE) \
-{ unsigned char *__tramp = (unsigned char*)(TRAMP); \
-   unsigned int  __fun = (unsigned int)(FUN); \
-   unsigned int  __ctx = (unsigned int)(CTX); \
-   unsigned int  __dis = __fun - (__ctx + 49);  \
-   unsigned short __size = (unsigned short)(SIZE); \
-   *(unsigned int *) &__tramp[0] = 0x8324048b;      /* mov (%esp), %eax */ \
-   *(unsigned int *) &__tramp[4] = 0x4c890cec;      /* sub $12, %esp */ \
-   *(unsigned int *) &__tramp[8] = 0x04890424;      /* mov %ecx, 4(%esp) */ \
-   *(unsigned char*) &__tramp[12] = 0x24;           /* mov %eax, (%esp) */ \
-   *(unsigned char*) &__tramp[13] = 0xb8; \
-   *(unsigned int *) &__tramp[14] = __size;         /* mov __size, %eax */ \
-   *(unsigned int *) &__tramp[18] = 0x08244c8d;     /* lea 8(%esp), %ecx */ \
-   *(unsigned int *) &__tramp[22] = 0x4802e8c1;     /* shr $2, %eax ; dec %eax */ \
-   *(unsigned short*) &__tramp[26] = 0x0b74;        /* jz 1f */ \
-   *(unsigned int *) &__tramp[28] = 0x8908518b;     /* 2b: mov 8(%ecx), %edx */ \
-   *(unsigned int *) &__tramp[32] = 0x04c18311;     /* mov %edx, (%ecx) ; add $4, %ecx */ \
-   *(unsigned char*) &__tramp[36] = 0x48;           /* dec %eax */ \
-   *(unsigned short*) &__tramp[37] = 0xf575;        /* jnz 2b ; 1f: */ \
-   *(unsigned char*) &__tramp[39] = 0xb8; \
-   *(unsigned int*)  &__tramp[40] = __ctx;          /* movl __ctx, %eax */ \
-   *(unsigned char *)  &__tramp[44] = 0xe8; \
-   *(unsigned int*)  &__tramp[45] = __dis;          /* call __fun  */ \
-   *(unsigned char*)  &__tramp[49] = 0xc2;          /* ret  */ \
-   *(unsigned short*)  &__tramp[50] = (__size + 8); /* ret (__size + 8)  */ \
- }
-
-#define FFI_INIT_TRAMPOLINE_WIN32(TRAMP,FUN,CTX)  \
-{ unsigned char *__tramp = (unsigned char*)(TRAMP); \
-   unsigned int  __fun = (unsigned int)(FUN); \
-   unsigned int  __ctx = (unsigned int)(CTX); \
-   unsigned int  __dis = __fun - (__ctx + 10); \
-   *(unsigned char*) &__tramp[0] = 0x68; \
-   *(unsigned int*)  &__tramp[1] = __ctx; /* push __ctx */ \
-   *(unsigned char*) &__tramp[5] = 0xe9; \
-   *(unsigned int*)  &__tramp[6] = __dis; /* jmp __fun  */ \
- }
-
-/* the cif must already be prep'ed */
+  if (cabi == FFI_STDCALL)
+    return flags + (cif->bytes << X86_RET_POP_SHIFT);
+  else
+    return flags;
+}
 
 ffi_status
 ffi_prep_closure_loc (ffi_closure* closure,
@@ -547,50 +428,40 @@ ffi_prep_closure_loc (ffi_closure* closure,
                       void *user_data,
                       void *codeloc)
 {
-  if (cif->abi == FFI_SYSV)
-    {
-      FFI_INIT_TRAMPOLINE (&closure->tramp[0],
-                           &ffi_closure_SYSV,
-                           (void*)codeloc);
-    }
-  else if (cif->abi == FFI_REGISTER)
-    {
-      FFI_INIT_TRAMPOLINE_WIN32 (&closure->tramp[0],
-                                   &ffi_closure_REGISTER,
-                                   (void*)codeloc);
-    }
-  else if (cif->abi == FFI_FASTCALL)
-    {
-      FFI_INIT_TRAMPOLINE_WIN32 (&closure->tramp[0],
-                                   &ffi_closure_FASTCALL,
-                                   (void*)codeloc);
-    }
-  else if (cif->abi == FFI_THISCALL)
-    {
-      FFI_INIT_TRAMPOLINE_WIN32 (&closure->tramp[0],
-                                   &ffi_closure_THISCALL,
-                                   (void*)codeloc);
-    }
-  else if (cif->abi == FFI_STDCALL || cif->abi == FFI_PASCAL)
-    {
-      FFI_INIT_TRAMPOLINE_WIN32 (&closure->tramp[0],
-                                   &ffi_closure_STDCALL,
-                                   (void*)codeloc);
-    }
-  else if (cif->abi == FFI_MS_CDECL)
-    {
-      FFI_INIT_TRAMPOLINE (&closure->tramp[0],
-                           &ffi_closure_SYSV,
-                           (void*)codeloc);
-    }
-  else
+  char *tramp = closure->tramp;
+  void (*dest)(void);
+  int op = 0xb8;  /* movl imm, %eax */
+
+  switch (cif->abi)
     {
+    case FFI_SYSV:
+    case FFI_THISCALL:
+    case FFI_FASTCALL:
+    case FFI_MS_CDECL:
+      dest = ffi_closure_i386;
+      break;
+    case FFI_STDCALL:
+    case FFI_PASCAL:
+      dest = ffi_closure_STDCALL;
+      break;
+    case FFI_REGISTER:
+      dest = ffi_closure_REGISTER;
+      op = 0x68;  /* pushl imm */
+    default:
       return FFI_BAD_ABI;
     }
-    
-  closure->cif  = cif;
+
+  /* movl or pushl immediate.  */
+  tramp[0] = op;
+  *(void **)(tramp + 1) = codeloc;
+
+  /* jmp dest */
+  tramp[5] = 0xe9;
+  *(unsigned *)(tramp + 6) = (unsigned)dest - ((unsigned)codeloc + 10);
+
+  closure->cif = cif;
+  closure->fun = fun;
   closure->user_data = user_data;
-  closure->fun  = fun;
 
   return FFI_OK;
 }
@@ -599,13 +470,18 @@ ffi_prep_closure_loc (ffi_closure* closure,
 
 #if !FFI_NO_RAW_API
 
+void FFI_HIDDEN ffi_closure_raw_SYSV(void);
+void FFI_HIDDEN ffi_closure_raw_THISCALL(void);
+
 ffi_status
-ffi_prep_raw_closure_loc (ffi_raw_closure* closure,
-                          ffi_cif* cif,
+ffi_prep_raw_closure_loc (ffi_raw_closure *closure,
+                          ffi_cif *cif,
                           void (*fun)(ffi_cif*,void*,ffi_raw*,void*),
                           void *user_data,
                           void *codeloc)
 {
+  char *tramp = closure->tramp;
+  void (*dest)(void);
   int i;
 
   /* We currently don't support certain kinds of arguments for raw
@@ -613,28 +489,33 @@ ffi_prep_raw_closure_loc (ffi_raw_closure* closure,
      language routine, since it would require argument processing,
      something we don't do now for performance.  */
   for (i = cif->nargs-1; i >= 0; i--)
-    {
-      FFI_ASSERT (cif->arg_types[i]->type != FFI_TYPE_STRUCT);
-      FFI_ASSERT (cif->arg_types[i]->type != FFI_TYPE_LONGDOUBLE);
-    }
+    switch (cif->arg_types[i]->type)
+      {
+      case FFI_TYPE_STRUCT:
+      case FFI_TYPE_LONGDOUBLE:
+	return FFI_BAD_TYPEDEF;
+      }
 
   switch (cif->abi)
     {
-#ifdef X86_WIN32
     case FFI_THISCALL:
-      FFI_INIT_TRAMPOLINE_RAW_THISCALL (&closure->tramp[0],
-					&ffi_closure_raw_THISCALL,
-					codeloc, cif->bytes);
+      dest = ffi_closure_raw_THISCALL;
       break;
-#endif
     case FFI_SYSV:
-      FFI_INIT_TRAMPOLINE (&closure->tramp[0], &ffi_closure_raw_SYSV,
-			   codeloc);
+      dest = ffi_closure_raw_SYSV;
       break;
     default:
       return FFI_BAD_ABI;
     }
 
+  /* movl imm, %eax.  */
+  tramp[0] = 0xb8;
+  *(void **)(tramp + 1) = codeloc;
+
+  /* jmp dest */
+  tramp[5] = 0xe9;
+  *(unsigned *)(tramp + 6) = (unsigned)dest - ((unsigned)codeloc + 10);
+
   closure->cif = cif;
   closure->fun = fun;
   closure->user_data = user_data;
diff --git a/src/x86/ffitarget.h b/src/x86/ffitarget.h
index 91e429c..8fff29f 100644
--- a/src/x86/ffitarget.h
+++ b/src/x86/ffitarget.h
@@ -129,11 +129,7 @@ typedef enum ffi_abi {
 # define FFI_NATIVE_RAW_API 0
 # define FFI_GO_CLOSURES 1
 #else
-# ifdef X86_WIN32
-#  define FFI_TRAMPOLINE_SIZE 52
-# else
-#  define FFI_TRAMPOLINE_SIZE 10
-# endif
+# define FFI_TRAMPOLINE_SIZE 12
 # define FFI_NATIVE_RAW_API 1  /* x86 has native raw api support */
 #endif
 
diff --git a/src/x86/freebsd.S b/src/x86/freebsd.S
deleted file mode 100644
index 97e0b4e..0000000
--- a/src/x86/freebsd.S
+++ /dev/null
@@ -1,463 +0,0 @@
-/* -----------------------------------------------------------------------
-   freebsd.S - Copyright (c) 1996, 1998, 2001, 2002, 2003, 2005  Red Hat, Inc.
-	       Copyright (c) 2008  Björn König
-	
-   X86 Foreign Function Interface for FreeBSD
-
-   Permission is hereby granted, free of charge, to any person obtaining
-   a copy of this software and associated documentation files (the
-   ``Software''), to deal in the Software without restriction, including
-   without limitation the rights to use, copy, modify, merge, publish,
-   distribute, sublicense, and/or sell copies of the Software, and to
-   permit persons to whom the Software is furnished to do so, subject to
-   the following conditions:
-
-   The above copyright notice and this permission notice shall be included
-   in all copies or substantial portions of the Software.
-
-   THE SOFTWARE IS PROVIDED ``AS IS'', WITHOUT WARRANTY OF ANY KIND,
-   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
-   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-   NONINFRINGEMENT.  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
-   HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
-   WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
-   DEALINGS IN THE SOFTWARE.
------------------------------------------------------------------------ */
-
-#ifndef __x86_64__
-
-#define LIBFFI_ASM	
-#include <fficonfig.h>
-#include <ffi.h>
-
-.text
-
-.globl ffi_prep_args
-
-	.align 4
-.globl ffi_call_SYSV
-        .type    ffi_call_SYSV,@function
-
-ffi_call_SYSV:
-.LFB1:
-        pushl %ebp
-.LCFI0:
-        movl  %esp,%ebp
-.LCFI1:
-	/* Make room for all of the new args.  */
-	movl  16(%ebp),%ecx
-	subl  %ecx,%esp
-
-	/* Align the stack pointer to 16-bytes */
-	andl  $0xfffffff0, %esp
-
-	movl  %esp,%eax
-
-	/* Place all of the ffi_prep_args in position  */
-	pushl 12(%ebp)
-	pushl %eax
-	call  *8(%ebp)
-
-	/* Return stack to previous state and call the function  */
-	addl  $8,%esp	
-
-	call  *28(%ebp)
-
-	/* Load %ecx with the return type code  */
-	movl  20(%ebp),%ecx	
-
-	/* Protect %esi.  We're going to pop it in the epilogue.  */
-	pushl %esi
-
-	/* If the return value pointer is NULL, assume no return value.  */
-	cmpl  $0,24(%ebp)
-	jne  0f
-
-	/* Even if there is no space for the return value, we are 
-	   obliged to handle floating-point values.  */
-	cmpl  $FFI_TYPE_FLOAT,%ecx
-	jne   noretval
-	fstp  %st(0)
-
-        jmp   epilogue
-
-0:
-	call  1f
-
-.Lstore_table:
-	.long	noretval-.Lstore_table	/* FFI_TYPE_VOID */
-	.long	retint-.Lstore_table	/* FFI_TYPE_INT */
-	.long	retfloat-.Lstore_table	/* FFI_TYPE_FLOAT */
-	.long	retdouble-.Lstore_table	/* FFI_TYPE_DOUBLE */
-	.long	retlongdouble-.Lstore_table	/* FFI_TYPE_LONGDOUBLE */
-	.long	retuint8-.Lstore_table	/* FFI_TYPE_UINT8 */
-	.long	retsint8-.Lstore_table	/* FFI_TYPE_SINT8 */
-	.long	retuint16-.Lstore_table	/* FFI_TYPE_UINT16 */
-	.long	retsint16-.Lstore_table	/* FFI_TYPE_SINT16 */
-	.long	retint-.Lstore_table	/* FFI_TYPE_UINT32 */
-	.long	retint-.Lstore_table	/* FFI_TYPE_SINT32 */
-	.long	retint64-.Lstore_table	/* FFI_TYPE_UINT64 */
-	.long	retint64-.Lstore_table	/* FFI_TYPE_SINT64 */
-	.long	retstruct-.Lstore_table	/* FFI_TYPE_STRUCT */
-	.long	retint-.Lstore_table	/* FFI_TYPE_POINTER */
-	.long   retstruct1b-.Lstore_table	/* FFI_TYPE_SMALL_STRUCT_1B */
-	.long   retstruct2b-.Lstore_table	/* FFI_TYPE_SMALL_STRUCT_2B */
-
-1:
-	pop  %esi
-	add  (%esi, %ecx, 4), %esi
-	jmp  *%esi
-
-	/* Sign/zero extend as appropriate.  */
-retsint8:
-	movsbl  %al, %eax
-	jmp  retint
-
-retsint16:
-	movswl  %ax, %eax
-	jmp  retint
-
-retuint8:
-	movzbl  %al, %eax
-	jmp  retint
-
-retuint16:
-	movzwl  %ax, %eax
-	jmp  retint
-
-retfloat:
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx	
-	fstps (%ecx)
-	jmp   epilogue
-
-retdouble:
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx	
-	fstpl (%ecx)
-	jmp   epilogue
-
-retlongdouble:
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx	
-	fstpt (%ecx)
-	jmp   epilogue
-	
-retint64:	
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx	
-	movl  %eax,0(%ecx)
-	movl  %edx,4(%ecx)
-	jmp   epilogue
-	
-retstruct1b:
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx
-	movb  %al,0(%ecx)
-	jmp   epilogue
-
-retstruct2b:
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx
-	movw  %ax,0(%ecx)
-	jmp   epilogue
-
-retint:
-	/* Load %ecx with the pointer to storage for the return value  */
-	movl  24(%ebp),%ecx	
-	movl  %eax,0(%ecx)
-
-retstruct:
-	/* Nothing to do!  */
-
-noretval:
-epilogue:
-        popl %esi
-        movl %ebp,%esp
-        popl %ebp
-        ret
-.LFE1:
-.ffi_call_SYSV_end:
-        .size    ffi_call_SYSV,.ffi_call_SYSV_end-ffi_call_SYSV
-
-	.align	4
-FFI_HIDDEN (ffi_closure_SYSV)
-.globl ffi_closure_SYSV
-	.type	ffi_closure_SYSV, @function
-
-ffi_closure_SYSV:
-.LFB2:
-	pushl	%ebp
-.LCFI2:
-	movl	%esp, %ebp
-.LCFI3:
-	subl	$40, %esp
-	leal	-24(%ebp), %edx
-	movl	%edx, -12(%ebp)	/* resp */
-	leal	8(%ebp), %edx
-	movl	%edx, 4(%esp)	/* args = __builtin_dwarf_cfa () */
-	leal	-12(%ebp), %edx
-	movl	%edx, (%esp)	/* &resp */
-#if defined HAVE_HIDDEN_VISIBILITY_ATTRIBUTE || !defined __PIC__
-	call	ffi_closure_SYSV_inner
-#else
-	movl	%ebx, 8(%esp)
-.LCFI7:
-	call	1f
-1:	popl	%ebx
-	addl	$_GLOBAL_OFFSET_TABLE_+[.-1b], %ebx
-	call	ffi_closure_SYSV_inner@PLT
-	movl	8(%esp), %ebx
-#endif
-	movl	-12(%ebp), %ecx
-	cmpl	$FFI_TYPE_INT, %eax
-	je	.Lcls_retint
-
-	/* Handle FFI_TYPE_UINT8, FFI_TYPE_SINT8, FFI_TYPE_UINT16,
-	   FFI_TYPE_SINT16, FFI_TYPE_UINT32, FFI_TYPE_SINT32.  */
-	cmpl	$FFI_TYPE_UINT64, %eax
-	jge	0f
-	cmpl	$FFI_TYPE_UINT8, %eax
-	jge	.Lcls_retint
-	
-0:	cmpl	$FFI_TYPE_FLOAT, %eax
-	je	.Lcls_retfloat
-	cmpl	$FFI_TYPE_DOUBLE, %eax
-	je	.Lcls_retdouble
-	cmpl	$FFI_TYPE_LONGDOUBLE, %eax
-	je	.Lcls_retldouble
-	cmpl	$FFI_TYPE_SINT64, %eax
-	je	.Lcls_retllong
-	cmpl	$FFI_TYPE_SMALL_STRUCT_1B, %eax
-	je	.Lcls_retstruct1b
-	cmpl	$FFI_TYPE_SMALL_STRUCT_2B, %eax
-	je	.Lcls_retstruct2b
-	cmpl	$FFI_TYPE_STRUCT, %eax
-	je	.Lcls_retstruct
-.Lcls_epilogue:
-	movl	%ebp, %esp
-	popl	%ebp
-	ret
-.Lcls_retint:
-	movl	(%ecx), %eax
-	jmp	.Lcls_epilogue
-.Lcls_retfloat:
-	flds	(%ecx)
-	jmp	.Lcls_epilogue
-.Lcls_retdouble:
-	fldl	(%ecx)
-	jmp	.Lcls_epilogue
-.Lcls_retldouble:
-	fldt	(%ecx)
-	jmp	.Lcls_epilogue
-.Lcls_retllong:
-	movl	(%ecx), %eax
-	movl	4(%ecx), %edx
-	jmp	.Lcls_epilogue
-.Lcls_retstruct1b:
-	movsbl	(%ecx), %eax
-	jmp	.Lcls_epilogue
-.Lcls_retstruct2b:
-	movswl	(%ecx), %eax
-	jmp	.Lcls_epilogue
-.Lcls_retstruct:
-	movl	%ebp, %esp
-	popl	%ebp
-	ret	$4
-.LFE2:
-	.size	ffi_closure_SYSV, .-ffi_closure_SYSV
-
-#if !FFI_NO_RAW_API
-
-#define RAW_CLOSURE_CIF_OFFSET ((FFI_TRAMPOLINE_SIZE + 3) & ~3)
-#define RAW_CLOSURE_FUN_OFFSET (RAW_CLOSURE_CIF_OFFSET + 4)
-#define RAW_CLOSURE_USER_DATA_OFFSET (RAW_CLOSURE_FUN_OFFSET + 4)
-#define CIF_FLAGS_OFFSET 20
-
-	.align	4
-FFI_HIDDEN (ffi_closure_raw_SYSV)
-.globl ffi_closure_raw_SYSV
-	.type	ffi_closure_raw_SYSV, @function
-
-ffi_closure_raw_SYSV:
-.LFB3:
-	pushl	%ebp
-.LCFI4:
-	movl	%esp, %ebp
-.LCFI5:
-	pushl	%esi
-.LCFI6:
-	subl	$36, %esp
-	movl	RAW_CLOSURE_CIF_OFFSET(%eax), %esi	 /* closure->cif */
-	movl	RAW_CLOSURE_USER_DATA_OFFSET(%eax), %edx /* closure->user_data */
-	movl	%edx, 12(%esp)	/* user_data */
-	leal	8(%ebp), %edx	/* __builtin_dwarf_cfa () */
-	movl	%edx, 8(%esp)	/* raw_args */
-	leal	-24(%ebp), %edx
-	movl	%edx, 4(%esp)	/* &res */
-	movl	%esi, (%esp)	/* cif */
-	call	*RAW_CLOSURE_FUN_OFFSET(%eax)		 /* closure->fun */
-	movl	CIF_FLAGS_OFFSET(%esi), %eax		 /* rtype */
-	cmpl	$FFI_TYPE_INT, %eax
-	je	.Lrcls_retint
-
-	/* Handle FFI_TYPE_UINT8, FFI_TYPE_SINT8, FFI_TYPE_UINT16,
-	   FFI_TYPE_SINT16, FFI_TYPE_UINT32, FFI_TYPE_SINT32.  */
-	cmpl	$FFI_TYPE_UINT64, %eax
-	jge	0f
-	cmpl	$FFI_TYPE_UINT8, %eax
-	jge	.Lrcls_retint
-0:
-	cmpl	$FFI_TYPE_FLOAT, %eax
-	je	.Lrcls_retfloat
-	cmpl	$FFI_TYPE_DOUBLE, %eax
-	je	.Lrcls_retdouble
-	cmpl	$FFI_TYPE_LONGDOUBLE, %eax
-	je	.Lrcls_retldouble
-	cmpl	$FFI_TYPE_SINT64, %eax
-	je	.Lrcls_retllong
-.Lrcls_epilogue:
-	addl	$36, %esp
-	popl	%esi
-	popl	%ebp
-	ret
-.Lrcls_retint:
-	movl	-24(%ebp), %eax
-	jmp	.Lrcls_epilogue
-.Lrcls_retfloat:
-	flds	-24(%ebp)
-	jmp	.Lrcls_epilogue
-.Lrcls_retdouble:
-	fldl	-24(%ebp)
-	jmp	.Lrcls_epilogue
-.Lrcls_retldouble:
-	fldt	-24(%ebp)
-	jmp	.Lrcls_epilogue
-.Lrcls_retllong:
-	movl	-24(%ebp), %eax
-	movl	-20(%ebp), %edx
-	jmp	.Lrcls_epilogue
-.LFE3:
-	.size	ffi_closure_raw_SYSV, .-ffi_closure_raw_SYSV
-#endif
-
-	.section	.eh_frame,EH_FRAME_FLAGS,@progbits
-.Lframe1:
-	.long	.LECIE1-.LSCIE1	/* Length of Common Information Entry */
-.LSCIE1:
-	.long	0x0	/* CIE Identifier Tag */
-	.byte	0x1	/* CIE Version */
-#ifdef __PIC__
-	.ascii "zR\0"	/* CIE Augmentation */
-#else
-	.ascii "\0"	/* CIE Augmentation */
-#endif
-	.byte	0x1	/* .uleb128 0x1; CIE Code Alignment Factor */
-	.byte	0x7c	/* .sleb128 -4; CIE Data Alignment Factor */
-	.byte	0x8	/* CIE RA Column */
-#ifdef __PIC__
-	.byte	0x1	/* .uleb128 0x1; Augmentation size */
-	.byte	0x1b	/* FDE Encoding (pcrel sdata4) */
-#endif
-	.byte	0xc	/* DW_CFA_def_cfa */
-	.byte	0x4	/* .uleb128 0x4 */
-	.byte	0x4	/* .uleb128 0x4 */
-	.byte	0x88	/* DW_CFA_offset, column 0x8 */
-	.byte	0x1	/* .uleb128 0x1 */
-	.align 4
-.LECIE1:
-.LSFDE1:
-	.long	.LEFDE1-.LASFDE1	/* FDE Length */
-.LASFDE1:
-	.long	.LASFDE1-.Lframe1	/* FDE CIE offset */
-#ifdef __PIC__
-	.long	.LFB1-.	/* FDE initial location */
-#else
-	.long	.LFB1	/* FDE initial location */
-#endif
-	.long	.LFE1-.LFB1	/* FDE address range */
-#ifdef __PIC__
-	.byte	0x0	/* .uleb128 0x0; Augmentation size */
-#endif
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI0-.LFB1
-	.byte	0xe	/* DW_CFA_def_cfa_offset */
-	.byte	0x8	/* .uleb128 0x8 */
-	.byte	0x85	/* DW_CFA_offset, column 0x5 */
-	.byte	0x2	/* .uleb128 0x2 */
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI1-.LCFI0
-	.byte	0xd	/* DW_CFA_def_cfa_register */
-	.byte	0x5	/* .uleb128 0x5 */
-	.align 4
-.LEFDE1:
-.LSFDE2:
-	.long	.LEFDE2-.LASFDE2	/* FDE Length */
-.LASFDE2:
-	.long	.LASFDE2-.Lframe1	/* FDE CIE offset */
-#ifdef __PIC__
-	.long	.LFB2-.	/* FDE initial location */
-#else
-	.long	.LFB2
-#endif
-	.long	.LFE2-.LFB2	/* FDE address range */
-#ifdef __PIC__
-	.byte	0x0	/* .uleb128 0x0; Augmentation size */
-#endif
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI2-.LFB2
-	.byte	0xe	/* DW_CFA_def_cfa_offset */
-	.byte	0x8	/* .uleb128 0x8 */
-	.byte	0x85	/* DW_CFA_offset, column 0x5 */
-	.byte	0x2	/* .uleb128 0x2 */
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI3-.LCFI2
-	.byte	0xd	/* DW_CFA_def_cfa_register */
-	.byte	0x5	/* .uleb128 0x5 */
-#if !defined HAVE_HIDDEN_VISIBILITY_ATTRIBUTE && defined __PIC__
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI7-.LCFI3
-	.byte	0x83	/* DW_CFA_offset, column 0x3 */
-	.byte	0xa	/* .uleb128 0xa */
-#endif
-	.align 4
-.LEFDE2:
-
-#if !FFI_NO_RAW_API
-
-.LSFDE3:
-	.long	.LEFDE3-.LASFDE3	/* FDE Length */
-.LASFDE3:
-	.long	.LASFDE3-.Lframe1	/* FDE CIE offset */
-#ifdef __PIC__
-	.long	.LFB3-.	/* FDE initial location */
-#else
-	.long	.LFB3
-#endif
-	.long	.LFE3-.LFB3	/* FDE address range */
-#ifdef __PIC__
-	.byte	0x0	/* .uleb128 0x0; Augmentation size */
-#endif
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI4-.LFB3
-	.byte	0xe	/* DW_CFA_def_cfa_offset */
-	.byte	0x8	/* .uleb128 0x8 */
-	.byte	0x85	/* DW_CFA_offset, column 0x5 */
-	.byte	0x2	/* .uleb128 0x2 */
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI5-.LCFI4
-	.byte	0xd	/* DW_CFA_def_cfa_register */
-	.byte	0x5	/* .uleb128 0x5 */
-	.byte	0x4	/* DW_CFA_advance_loc4 */
-	.long	.LCFI6-.LCFI5
-	.byte	0x86	/* DW_CFA_offset, column 0x6 */
-	.byte	0x3	/* .uleb128 0x3 */
-	.align 4
-.LEFDE3:
-
-#endif
-
-#endif /* ifndef __x86_64__ */
-
-	.section .note.GNU-stack,"",%progbits
diff --git a/src/x86/sysv.S b/src/x86/sysv.S
index d0b8417..7b898ae 100644
--- a/src/x86/sysv.S
+++ b/src/x86/sysv.S
@@ -59,10 +59,10 @@
 
 /* This is declared as
 
-   void ffi_call_i386(struct ffi_call_frame *frame, char *argp)
+   void ffi_call_i386(struct call_frame *frame, char *argp)
         __attribute__((fastcall));
 
-   This the arguments are present in
+   Thus the arguments are present in
 
         ecx: frame
         edx: argp
@@ -170,181 +170,478 @@ E(X86_RET_UNUSED15)
 	cfi_endproc
 ENDF(C(ffi_call_i386))
 
-	.align	4
-FFI_HIDDEN (ffi_closure_SYSV)
-.globl ffi_closure_SYSV
-	.type	ffi_closure_SYSV, @function
+/* The inner helper is declared as
 
-ffi_closure_SYSV:
-	cfi_startproc
-	pushl	%ebp
-	cfi_adjust_cfa_offset(4)
-	cfi_rel_offset(%ebp, 0)
-	movl	%esp, %ebp
-	cfi_def_cfa_register(%ebp)
-	subl	$40, %esp
-	leal	-24(%ebp), %edx
-	movl	%edx, -12(%ebp)	/* resp */
-	leal	8(%ebp), %edx
-#ifdef __SUNPRO_C
-	/* The SUNPRO compiler doesn't support GCC's regparm function
-  	   attribute, so we have to pass all three arguments to
-	   ffi_closure_SYSV_inner on the stack.  */
-	movl	%edx, 8(%esp)	/* args = __builtin_dwarf_cfa () */
-	leal	-12(%ebp), %edx
-	movl	%edx, 4(%esp)	/* &resp */
-	movl    %eax, (%esp)    /* closure */
-#else
-	movl	%edx, 4(%esp)	/* args = __builtin_dwarf_cfa () */
-	leal	-12(%ebp), %edx
-	movl	%edx, (%esp)	/* &resp */
+   void ffi_closure_inner(struct closure_frame *frame, char *argp)
+	__attribute_((fastcall))
+
+   Thus the arguments are placed in
+
+	ecx:	frame
+	edx:	argp
+*/
+
+/* Macros to help setting up the closure_data structure.  */
+
+#define closure_FS	(16 + 3*4 + 3*4 + 4)
+
+.macro	FFI_CLOSURE_SAVE_REGS
+	movl	%eax, 16+R_EAX*4(%esp)
+	movl	%edx, 16+R_EDX*4(%esp)
+	movl	%ecx, 16+R_ECX*4(%esp)
+.endm
+
+.macro	FFI_CLOSURE_COPY_TRAMP_DATA chain
+	movl	FFI_TRAMPOLINE_SIZE(%eax), %edx		/* copy cif */
+	movl	FFI_TRAMPOLINE_SIZE+4(%eax), %ecx	/* copy fun */
+	movl	FFI_TRAMPOLINE_SIZE+8(%eax), %eax	/* copy user_data */
+	movl	%edx, 28(%esp)
+	movl	%ecx, 32(%esp)
+	movl	%eax, 36(%esp)
+.endm
+
+.macro	FFI_CLOSURE_CALL_INNER
+	movl	%esp, %ecx			/* load closure_data */
+	leal	closure_FS+4(%esp), %edx	/* load incoming stack */
+#ifdef __PIC__
+	movl	%ebx, 40(%esp)			/* save ebx */
+	cfi_rel_offset(%ebx, 40)
+	call	__x86.get_pc_thunk.bx		/* load got register */
+	addl	$C(_GLOBAL_OFFSET_TABLE_), %ebx
 #endif
 #if defined HAVE_HIDDEN_VISIBILITY_ATTRIBUTE || !defined __PIC__
-	call	ffi_closure_SYSV_inner
+	call	C(ffi_closure_inner)
 #else
-	movl	%ebx, 8(%esp)
-	cfi_offset(%ebx, -40)
-	call	1f
-1:	popl	%ebx
-	addl	$_GLOBAL_OFFSET_TABLE_+[.-1b], %ebx
-	call	ffi_closure_SYSV_inner@PLT
-	movl	8(%esp), %ebx
+	call	C(ffi_closure_inner)@PLT
+#endif
+.endm
+
+.macro	FFI_CLOSURE_MASK_AND_JUMP
+	andl	$X86_RET_TYPE_MASK, %eax
+#ifdef __PIC__
+	leal	0f@GOTOFF(%ebx, %eax, 8), %eax
+	movl	40(%esp), %ebx			/* restore ebx */
 	cfi_restore(%ebx)
+#else
+	leal	0f(, %eax, 8), %eax
 #endif
-	movl	-12(%ebp), %ecx
-	cmpl	$FFI_TYPE_INT, %eax
-	je	.Lcls_retint
-
-	/* Handle FFI_TYPE_UINT8, FFI_TYPE_SINT8, FFI_TYPE_UINT16,
-	   FFI_TYPE_SINT16, FFI_TYPE_UINT32, FFI_TYPE_SINT32.  */
-	cmpl	$FFI_TYPE_UINT64, %eax
-	jge	0f
-	cmpl	$FFI_TYPE_UINT8, %eax
-	jge	.Lcls_retint
-	
-0:	cmpl	$FFI_TYPE_FLOAT, %eax
-	je	.Lcls_retfloat
-	cmpl	$FFI_TYPE_DOUBLE, %eax
-	je	.Lcls_retdouble
-	cmpl	$FFI_TYPE_LONGDOUBLE, %eax
-	je	.Lcls_retldouble
-	cmpl	$FFI_TYPE_SINT64, %eax
-	je	.Lcls_retllong
-	cmpl	$FFI_TYPE_STRUCT, %eax
-	je	.Lcls_retstruct
-.Lcls_epilogue:
-	movl	%ebp, %esp
-	popl	%ebp
+	jmp	*%eax
+.endm
+
+/* The closure entry points are reached from the ffi_closure trampoline.
+   On entry, %eax contains the address of the ffi_closure.  */
+
+	.align	16
+	.globl	C(ffi_closure_i386)
+	FFI_HIDDEN(C(ffi_closure_i386))
+
+C(ffi_closure_i386):
+	cfi_startproc
+	subl	$closure_FS, %esp
+	cfi_adjust_cfa_offset(closure_FS)
+
+	FFI_CLOSURE_SAVE_REGS
+	FFI_CLOSURE_COPY_TRAMP_DATA
+	FFI_CLOSURE_CALL_INNER
+	FFI_CLOSURE_MASK_AND_JUMP
+
+	.align	8
+0:
+E(X86_RET_FLOAT)
+	flds	(%esp)
+	jmp	9f
+E(X86_RET_DOUBLE)
+	fldl	(%esp)
+	jmp	9f
+E(X86_RET_LDOUBLE)
+	fldt	(%esp)
+	jmp	9f
+E(X86_RET_SINT8)
+	movsbl	(%esp), %eax
+	jmp	9f
+E(X86_RET_SINT16)
+	movswl	(%esp), %eax
+	jmp	9f
+E(X86_RET_UINT8)
+	movzbl	(%esp), %eax
+	jmp	9f
+E(X86_RET_UINT16)
+	movzwl	(%esp), %eax
+	jmp	9f
+E(X86_RET_INT64)
+	movl	4(%esp), %edx
+	/* fallthru */
+E(X86_RET_INT32)
+	movl	(%esp), %eax
+	/* fallthru */
+E(X86_RET_VOID)
+9:	addl	$closure_FS, %esp
+	cfi_adjust_cfa_offset(-closure_FS)
 	ret
-.Lcls_retint:
-	movl	(%ecx), %eax
-	jmp	.Lcls_epilogue
-.Lcls_retfloat:
-	flds	(%ecx)
-	jmp	.Lcls_epilogue
-.Lcls_retdouble:
-	fldl	(%ecx)
-	jmp	.Lcls_epilogue
-.Lcls_retldouble:
-	fldt	(%ecx)
-	jmp	.Lcls_epilogue
-.Lcls_retllong:
-	movl	(%ecx), %eax
-	movl	4(%ecx), %edx
-	jmp	.Lcls_epilogue
-.Lcls_retstruct:
-	movl	%ebp, %esp
-	popl	%ebp
+	cfi_adjust_cfa_offset(closure_FS)
+E(X86_RET_STRUCTPOP)
+	addl	$closure_FS, %esp
+	cfi_adjust_cfa_offset(-closure_FS)
 	ret	$4
+	cfi_adjust_cfa_offset(closure_FS)
+E(X86_RET_STRUCTARG)
+	movl	(%esp), %eax
+	jmp	9b
+E(X86_RET_STRUCT_1B)
+	movzbl	(%esp), %eax
+	jmp	9b
+E(X86_RET_STRUCT_2B)
+	movzwl	(%esp), %eax
+	jmp	9b
+
+	/* Fill out the table so that bad values are predictable.  */
+E(X86_RET_UNUSED14)
+	ud2
+E(X86_RET_UNUSED15)
+	ud2
+
+	cfi_endproc
+ENDF(C(ffi_closure_i386))
+
+/* For REGISTER, we have no available parameter registers, and so we
+   enter here having pushed the closure onto the stack.  */
+
+	.align	16
+	.globl	C(ffi_closure_REGISTER)
+	FFI_HIDDEN(C(ffi_closure_REGISTER))
+C(ffi_closure_REGISTER):
+	cfi_startproc
+	cfi_def_cfa(%esp, 8)
+	cfi_offset(%eip, -8)
+	subl	$closure_FS-4, %esp
+	cfi_adjust_cfa_offset(closure_FS-4)
+
+	FFI_CLOSURE_SAVE_REGS
+
+	movl	closure_FS-4(%esp), %ecx	/* load retaddr */
+	movl	closure_FS(%esp), %eax		/* load closure */
+	movl	%ecx, closure_FS(%esp)		/* move retaddr */
+	jmp	0f
+
+	cfi_endproc
+ENDF(C(ffi_closure_REGISTER))
+
+/* For STDCALL (and others), we need to pop N bytes of arguments off
+   the stack following the closure.  The amount needing to be popped
+   is returned to us from ffi_closure_inner.  */
+
+	.align	16
+	.globl	C(ffi_closure_STDCALL)
+	FFI_HIDDEN(C(ffi_closure_STDCALL))
+C(ffi_closure_STDCALL):
+	cfi_startproc
+	subl	$closure_FS, %esp
+	cfi_adjust_cfa_offset(closure_FS)
+
+	FFI_CLOSURE_SAVE_REGS
+0:
+	FFI_CLOSURE_COPY_TRAMP_DATA
+	FFI_CLOSURE_CALL_INNER
+
+	movl	%eax, %ecx
+	shrl	$X86_RET_POP_SHIFT, %ecx	/* isolate pop count */
+	leal	closure_FS(%esp, %ecx), %ecx	/* compute popped esp */
+	movl	closure_FS(%esp), %edx		/* move return address */
+	movl	%edx, (%ecx)
+
+	/* New pseudo-stack frame based off ecx.  This is unwind trickery
+	   in that the CFA *has* changed, to the proper popped stack address.
+	   Note that the location to which we moved the return address
+	   is the new CFA-4, so that's unchanged.  */
+	cfi_def_cfa(%ecx, 4)
+	/* Normally esp is unwound to CFA + the caller's ARGS_SIZE.
+	   We've just set the CFA to that final value.  Tell the unwinder
+	   to restore esp from CFA without the ARGS_SIZE:
+	   DW_CFA_val_expression %esp, DW_OP_call_frame_cfa.  */
+	cfi_escape(0x16, 4, 1, 0x9c)
+
+	FFI_CLOSURE_MASK_AND_JUMP
+
+	.align	8
+0:
+E(X86_RET_FLOAT)
+	flds    (%esp)
+	movl    %ecx, %esp
+	ret
+E(X86_RET_DOUBLE)
+	fldl    (%esp)
+	movl    %ecx, %esp
+	ret
+E(X86_RET_LDOUBLE)
+	fldt    (%esp)
+	movl    %ecx, %esp
+	ret
+E(X86_RET_SINT8)
+	movsbl  (%esp), %eax
+	movl    %ecx, %esp
+	ret
+E(X86_RET_SINT16)
+	movswl  (%esp), %eax
+	movl    %ecx, %esp
+	ret
+E(X86_RET_UINT8)
+	movzbl  (%esp), %eax
+	movl    %ecx, %esp
+	ret
+E(X86_RET_UINT16)
+	movzwl  (%esp), %eax
+	movl    %ecx, %esp
+	ret
+E(X86_RET_INT64)
+	popl    %eax
+	popl    %edx
+	movl    %ecx, %esp
+	ret
+E(X86_RET_INT32)
+	movl    (%esp), %eax
+	movl    %ecx, %esp
+	ret
+E(X86_RET_VOID)
+	movl    %ecx, %esp
+	ret
+E(X86_RET_STRUCTPOP)
+	movl    %ecx, %esp
+	ret
+E(X86_RET_STRUCTARG)
+	movl	(%esp), %eax
+	movl	%ecx, %esp
+	ret
+E(X86_RET_STRUCT_1B)
+	movzbl	(%esp), %eax
+	movl	%ecx, %esp
+	ret
+E(X86_RET_STRUCT_2B)
+	movzwl	(%esp), %eax
+	movl	%ecx, %esp
+	ret
+
+	/* Fill out the table so that bad values are predictable.  */
+E(X86_RET_UNUSED14)
+	ud2
+E(X86_RET_UNUSED15)
+	ud2
+
 	cfi_endproc
-	.size	ffi_closure_SYSV, .-ffi_closure_SYSV
+ENDF(C(ffi_closure_STDCALL))
 
 #if !FFI_NO_RAW_API
 
-/* Precalculate for e.g. the Solaris 10/x86 assembler.  */
-#if FFI_TRAMPOLINE_SIZE == 10
-#define RAW_CLOSURE_CIF_OFFSET 12
-#define RAW_CLOSURE_FUN_OFFSET 16
-#define RAW_CLOSURE_USER_DATA_OFFSET 20
-#elif FFI_TRAMPOLINE_SIZE == 24
-#define RAW_CLOSURE_CIF_OFFSET 24
-#define RAW_CLOSURE_FUN_OFFSET 28
-#define RAW_CLOSURE_USER_DATA_OFFSET 32
+#define raw_closure_S_FS	(16+16+12)
+
+	.align	16
+	.globl	C(ffi_closure_raw_SYSV)
+	FFI_HIDDEN(C(ffi_closure_raw_SYSV))
+C(ffi_closure_raw_SYSV):
+	cfi_startproc
+	subl	$raw_closure_S_FS, %esp
+	cfi_adjust_cfa_offset(raw_closure_S_FS)
+	movl	%ebx, raw_closure_S_FS-4(%esp)
+	cfi_rel_offset(%ebx, raw_closure_S_FS-4)
+
+	movl	FFI_TRAMPOLINE_SIZE+8(%eax), %edx	/* load cl->user_data */
+	movl	%edx, 12(%esp)
+	leal	raw_closure_S_FS+4(%esp), %edx		/* load raw_args */
+	movl	%edx, 8(%esp)
+	leal	16(%esp), %edx				/* load &res */
+	movl	%edx, 4(%esp)
+	movl	FFI_TRAMPOLINE_SIZE(%eax), %ebx		/* load cl->cif */
+	movl	%ebx, (%esp)
+	call	*FFI_TRAMPOLINE_SIZE+4(%eax)		/* call cl->fun */
+
+	movl	20(%ebx), %eax				/* load cif->flags */
+	andl	$X86_RET_TYPE_MASK, %eax
+#ifdef __PIC__
+	call	__x86.get_pc_thunk.bx
+1:	leal	0f-1b(%ebx, %eax, 8), %eax
 #else
-#define RAW_CLOSURE_CIF_OFFSET ((FFI_TRAMPOLINE_SIZE + 3) & ~3)
-#define RAW_CLOSURE_FUN_OFFSET (RAW_CLOSURE_CIF_OFFSET + 4)
-#define RAW_CLOSURE_USER_DATA_OFFSET (RAW_CLOSURE_FUN_OFFSET + 4)
+	leal	0f(,%eax, 8), %eax
 #endif
-#define CIF_FLAGS_OFFSET 20
+	movl	raw_closure_S_FS-4(%esp), %ebx
+	cfi_restore(%ebx)
+	jmp	*%eax
+
+	.align	8
+0:
+E(X86_RET_FLOAT)
+	flds	16(%esp)
+	jmp	9f
+E(X86_RET_DOUBLE)
+	fldl	16(%esp)
+	jmp	9f
+E(X86_RET_LDOUBLE)
+	fldt	16(%esp)
+	jmp	9f
+E(X86_RET_SINT8)
+	movsbl	16(%esp), %eax
+	jmp	9f
+E(X86_RET_SINT16)
+	movswl	16(%esp), %eax
+	jmp	9f
+E(X86_RET_UINT8)
+	movzbl	16(%esp), %eax
+	jmp	9f
+E(X86_RET_UINT16)
+	movzwl	16(%esp), %eax
+	jmp	9f
+E(X86_RET_INT64)
+	movl	16+4(%esp), %edx
+	/* fallthru */
+E(X86_RET_INT32)
+	movl	16(%esp), %eax
+	/* fallthru */
+E(X86_RET_VOID)
+9:	addl	$raw_closure_S_FS, %esp
+	cfi_adjust_cfa_offset(-raw_closure_S_FS)
+	ret
+	cfi_adjust_cfa_offset(raw_closure_S_FS)
+E(X86_RET_STRUCTPOP)
+	addl	$raw_closure_S_FS, %esp
+	cfi_adjust_cfa_offset(-raw_closure_S_FS)
+	ret	$4
+	cfi_adjust_cfa_offset(raw_closure_S_FS)
+E(X86_RET_STRUCTARG)
+	movl	16(%esp), %eax
+	jmp	9b
+E(X86_RET_STRUCT_1B)
+	movzbl	16(%esp), %eax
+	jmp	9b
+E(X86_RET_STRUCT_2B)
+	movzwl	16(%esp), %eax
+	jmp	9b
+
+	/* Fill out the table so that bad values are predictable.  */
+E(X86_RET_UNUSED14)
+	ud2
+E(X86_RET_UNUSED15)
+	ud2
 
-	.align	4
-FFI_HIDDEN (ffi_closure_raw_SYSV)
-.globl ffi_closure_raw_SYSV
-	.type	ffi_closure_raw_SYSV, @function
+	cfi_endproc
+ENDF(C(ffi_closure_raw_SYSV))
+
+#undef	raw_closure_S_FS
+#define raw_closure_T_FS	(16+16+8)
 
-ffi_closure_raw_SYSV:
+	.align	16
+	.globl	C(ffi_closure_raw_THISCALL)
+	FFI_HIDDEN(C(ffi_closure_raw_THISCALL))
+C(ffi_closure_raw_THISCALL):
 	cfi_startproc
-	pushl	%ebp
+	/* Rearrange the stack such that %ecx is the first argument.
+	   This means moving the return address.  */
+	popl	%edx
+	cfi_adjust_cfa_offset(-4)
+	cfi_register(%eip, %edx)
+	pushl	%ecx
 	cfi_adjust_cfa_offset(4)
-	cfi_rel_offset(%ebp, 0)
-	movl	%esp, %ebp
-	cfi_def_cfa_register(%ebp)
-	pushl	%esi
-	cfi_offset(%esi, -12)
-	subl	$36, %esp
-	movl	RAW_CLOSURE_CIF_OFFSET(%eax), %esi	 /* closure->cif */
-	movl	RAW_CLOSURE_USER_DATA_OFFSET(%eax), %edx /* closure->user_data */
-	movl	%edx, 12(%esp)	/* user_data */
-	leal	8(%ebp), %edx	/* __builtin_dwarf_cfa () */
-	movl	%edx, 8(%esp)	/* raw_args */
-	leal	-24(%ebp), %edx
-	movl	%edx, 4(%esp)	/* &res */
-	movl	%esi, (%esp)	/* cif */
-	call	*RAW_CLOSURE_FUN_OFFSET(%eax)		 /* closure->fun */
-	movl	CIF_FLAGS_OFFSET(%esi), %eax		 /* rtype */
-	cmpl	$FFI_TYPE_INT, %eax
-	je	.Lrcls_retint
-
-	/* Handle FFI_TYPE_UINT8, FFI_TYPE_SINT8, FFI_TYPE_UINT16,
-	   FFI_TYPE_SINT16, FFI_TYPE_UINT32, FFI_TYPE_SINT32.  */
-	cmpl	$FFI_TYPE_UINT64, %eax
-	jge	0f
-	cmpl	$FFI_TYPE_UINT8, %eax
-	jge	.Lrcls_retint
+	pushl	%edx
+	cfi_adjust_cfa_offset(4)
+	cfi_rel_offset(%eip, 0)
+	subl	$raw_closure_T_FS, %esp
+	cfi_adjust_cfa_offset(raw_closure_T_FS)
+	movl	%ebx, raw_closure_T_FS-4(%esp)
+	cfi_offset(%ebx, raw_closure_T_FS-4)
+
+	movl	FFI_TRAMPOLINE_SIZE+8(%eax), %edx	/* load cl->user_data */
+	movl	%edx, 12(%esp)
+	leal	raw_closure_T_FS+4(%esp), %edx		/* load raw_args */
+	movl	%edx, 8(%esp)
+	leal	16(%esp), %edx				/* load &res */
+	movl	%edx, 4(%esp)
+	movl	FFI_TRAMPOLINE_SIZE(%eax), %ebx		/* load cl->cif */
+	movl	%ebx, (%esp)
+	call	*FFI_TRAMPOLINE_SIZE+4(%eax)		/* call cl->fun */
+
+	movl	20(%ebx), %eax				/* load cif->flags */
+	andl	$X86_RET_TYPE_MASK, %eax
+#ifdef __PIC__
+	call	__x86.get_pc_thunk.bx
+1:	leal	0f-1b(%ebx, %eax, 8), %eax
+#else
+	leal	0f(,%eax, 8), %eax
+#endif
+	movl	raw_closure_T_FS-4(%esp), %ebx
+	cfi_restore(%ebx)
+	jmp	*%eax
+
+	.align	8
 0:
-	cmpl	$FFI_TYPE_FLOAT, %eax
-	je	.Lrcls_retfloat
-	cmpl	$FFI_TYPE_DOUBLE, %eax
-	je	.Lrcls_retdouble
-	cmpl	$FFI_TYPE_LONGDOUBLE, %eax
-	je	.Lrcls_retldouble
-	cmpl	$FFI_TYPE_SINT64, %eax
-	je	.Lrcls_retllong
-.Lrcls_epilogue:
-	addl	$36, %esp
-	popl	%esi
-	popl	%ebp
-	ret
-.Lrcls_retint:
-	movl	-24(%ebp), %eax
-	jmp	.Lrcls_epilogue
-.Lrcls_retfloat:
-	flds	-24(%ebp)
-	jmp	.Lrcls_epilogue
-.Lrcls_retdouble:
-	fldl	-24(%ebp)
-	jmp	.Lrcls_epilogue
-.Lrcls_retldouble:
-	fldt	-24(%ebp)
-	jmp	.Lrcls_epilogue
-.Lrcls_retllong:
-	movl	-24(%ebp), %eax
-	movl	-20(%ebp), %edx
-	jmp	.Lrcls_epilogue
+E(X86_RET_FLOAT)
+	flds	16(%esp)
+	jmp	9f
+E(X86_RET_DOUBLE)
+	fldl	16(%esp)
+	jmp	9f
+E(X86_RET_LDOUBLE)
+	fldt	16(%esp)
+	jmp	9f
+E(X86_RET_SINT8)
+	movsbl	16(%esp), %eax
+	jmp	9f
+E(X86_RET_SINT16)
+	movswl	16(%esp), %eax
+	jmp	9f
+E(X86_RET_UINT8)
+	movzbl	16(%esp), %eax
+	jmp	9f
+E(X86_RET_UINT16)
+	movzwl	16(%esp), %eax
+	jmp	9f
+E(X86_RET_INT64)
+	movl	16+4(%esp), %edx
+	/* fallthru */
+E(X86_RET_INT32)
+	movl	16(%esp), %eax
+	/* fallthru */
+E(X86_RET_VOID)
+9:	addl	$raw_closure_T_FS, %esp
+	cfi_adjust_cfa_offset(-raw_closure_T_FS)
+	/* Remove the extra %ecx argument we pushed.  */
+	ret	$4
+	cfi_adjust_cfa_offset(raw_closure_T_FS)
+E(X86_RET_STRUCTPOP)
+	addl	$raw_closure_T_FS, %esp
+	cfi_adjust_cfa_offset(-raw_closure_T_FS)
+	ret	$8
+	cfi_adjust_cfa_offset(raw_closure_T_FS)
+E(X86_RET_STRUCTARG)
+	movl	16(%esp), %eax
+	jmp	9b
+E(X86_RET_STRUCT_1B)
+	movzbl	16(%esp), %eax
+	jmp	9b
+E(X86_RET_STRUCT_2B)
+	movzwl	16(%esp), %eax
+	jmp	9b
+
+	/* Fill out the table so that bad values are predictable.  */
+E(X86_RET_UNUSED14)
+	ud2
+E(X86_RET_UNUSED15)
+	ud2
+
 	cfi_endproc
-	.size	ffi_closure_raw_SYSV, .-ffi_closure_raw_SYSV
+ENDF(C(ffi_closure_raw_THISCALL))
 
 #endif /* !FFI_NO_RAW_API */
+
+#if defined(__PIC__)
+	.section .text.__x86.get_pc_thunk.bx,"axG",@progbits,__x86.get_pc_thunk.bx,comdat
+	.globl	__x86.get_pc_thunk.bx
+	.hidden	__x86.get_pc_thunk.bx
+	.type	__x86.get_pc_thunk.bx,@function
+__x86.get_pc_thunk.bx:
+	cfi_startproc
+	movl	(%esp), %ebx
+	ret
+	cfi_endproc
+	.size	__x86.get_pc_thunk.bx, . - __x86.get_pc_thunk.bx
+#endif /* __PIC__ */
+
 #endif /* ifndef __x86_64__ */
 #if defined __ELF__ && defined __linux__
 	.section	.note.GNU-stack,"",@progbits
diff --git a/src/x86/win32.S b/src/x86/win32.S
deleted file mode 100644
index d523eb0..0000000
--- a/src/x86/win32.S
+++ /dev/null
@@ -1,1351 +0,0 @@
-/* -----------------------------------------------------------------------
-   win32.S - Copyright (c) 2014  Anthony Green
-             Copyright (c) 1996, 1998, 2001, 2002, 2009  Red Hat, Inc.
-             Copyright (c) 2001  John Beniton
-             Copyright (c) 2002  Ranjit Mathew
-             Copyright (c) 2009  Daniel Witte
-
-
-   X86 Foreign Function Interface
- 
-   Permission is hereby granted, free of charge, to any person obtaining
-   a copy of this software and associated documentation files (the
-   ``Software''), to deal in the Software without restriction, including
-   without limitation the rights to use, copy, modify, merge, publish,
-   distribute, sublicense, and/or sell copies of the Software, and to
-   permit persons to whom the Software is furnished to do so, subject to
-   the following conditions:
- 
-   The above copyright notice and this permission notice shall be included
-   in all copies or substantial portions of the Software.
- 
-   THE SOFTWARE IS PROVIDED ``AS IS'', WITHOUT WARRANTY OF ANY KIND,
-   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
-   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-   NONINFRINGEMENT.  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
-   HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
-   WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
-   DEALINGS IN THE SOFTWARE.
-   -----------------------------------------------------------------------
-   */
- 
-#define LIBFFI_ASM
-#include <fficonfig.h>
-#include <ffi.h>
-
-#define CIF_BYTES_OFFSET 16
-#define CIF_FLAGS_OFFSET 20
-
-#ifdef _MSC_VER
-
-#define CLOSURE_CIF_OFFSET ((FFI_TRAMPOLINE_SIZE + 3) AND NOT 3)
-
-.386
-.MODEL FLAT, C
-
-EXTRN ffi_closure_SYSV_inner:NEAR
-EXTRN ffi_closure_WIN32_inner:NEAR
-
-_TEXT SEGMENT
-
-ffi_call_win32 PROC NEAR,
-    ffi_prep_args : NEAR PTR DWORD,
-    ecif          : NEAR PTR DWORD,
-    cif_abi       : DWORD,
-    cif_bytes     : DWORD,
-    cif_flags     : DWORD,
-    rvalue        : NEAR PTR DWORD,
-    fn            : NEAR PTR DWORD
-
-        ;; Make room for all of the new args.
-        mov  ecx, cif_bytes
-        sub  esp, ecx
-
-        mov  eax, esp
-
-        ;; Call ffi_prep_args
-        push ecif
-        push eax
-        call ffi_prep_args
-        add  esp, 8
-
-        ;; Prepare registers
-        ;; EAX stores the number of register arguments
-        cmp  eax, 0
-        je   fun
-        cmp  eax, 3
-        jl   prepr_two_cmp
-        
-        mov  ecx, esp
-        add  esp, 12
-        mov  eax, DWORD PTR [ecx+8]
-        jmp  prepr_two
-prepr_two_cmp:
-        cmp  eax, 2
-        jl   prepr_one_prep
-        mov  ecx, esp
-        add  esp, 8
-prepr_two:
-        mov  edx, DWORD PTR [ecx+4]
-        jmp  prepr_one
-prepr_one_prep:
-        mov  ecx, esp
-        add  esp, 4
-prepr_one:
-        mov  ecx, DWORD PTR [ecx]
-        cmp  cif_abi, 7 ;; FFI_REGISTER
-        jne  fun
-
-        xchg ecx, eax
-
-fun:
-        ;; Call function
-        call fn
-
-        ;; Load ecx with the return type code
-        mov  ecx, cif_flags
-
-        ;; If the return value pointer is NULL, assume no return value.
-        cmp  rvalue, 0
-        jne  ca_jumptable
-
-        ;; Even if there is no space for the return value, we are
-        ;; obliged to handle floating-point values.
-        cmp  ecx, FFI_TYPE_FLOAT
-        jne  ca_epilogue
-        fstp st(0)
-
-        jmp  ca_epilogue
-
-ca_jumptable:
-        jmp  [ca_jumpdata + 4 * ecx]
-ca_jumpdata:
-        ;; Do not insert anything here between label and jump table.
-        dd offset ca_epilogue       ;; FFI_TYPE_VOID
-        dd offset ca_retint         ;; FFI_TYPE_INT
-        dd offset ca_retfloat       ;; FFI_TYPE_FLOAT
-        dd offset ca_retdouble      ;; FFI_TYPE_DOUBLE
-        dd offset ca_retlongdouble  ;; FFI_TYPE_LONGDOUBLE
-        dd offset ca_retuint8       ;; FFI_TYPE_UINT8
-        dd offset ca_retsint8       ;; FFI_TYPE_SINT8
-        dd offset ca_retuint16      ;; FFI_TYPE_UINT16
-        dd offset ca_retsint16      ;; FFI_TYPE_SINT16
-        dd offset ca_retint         ;; FFI_TYPE_UINT32
-        dd offset ca_retint         ;; FFI_TYPE_SINT32
-        dd offset ca_retint64       ;; FFI_TYPE_UINT64
-        dd offset ca_retint64       ;; FFI_TYPE_SINT64
-        dd offset ca_epilogue       ;; FFI_TYPE_STRUCT
-        dd offset ca_retint         ;; FFI_TYPE_POINTER
-        dd offset ca_retstruct1b    ;; FFI_TYPE_SMALL_STRUCT_1B
-        dd offset ca_retstruct2b    ;; FFI_TYPE_SMALL_STRUCT_2B
-        dd offset ca_retint         ;; FFI_TYPE_SMALL_STRUCT_4B
-        dd offset ca_epilogue       ;; FFI_TYPE_MS_STRUCT
-
-        /* Sign/zero extend as appropriate.  */
-ca_retuint8:
-        movzx eax, al
-        jmp   ca_retint
-
-ca_retsint8:
-        movsx eax, al
-        jmp   ca_retint
-
-ca_retuint16:
-        movzx eax, ax
-        jmp   ca_retint
-
-ca_retsint16:
-        movsx eax, ax
-        jmp   ca_retint
-
-ca_retint:
-        ;; Load %ecx with the pointer to storage for the return value
-        mov   ecx, rvalue
-        mov   [ecx + 0], eax
-        jmp   ca_epilogue
-
-ca_retint64:
-        ;; Load %ecx with the pointer to storage for the return value
-        mov   ecx, rvalue
-        mov   [ecx + 0], eax
-        mov   [ecx + 4], edx
-        jmp   ca_epilogue
-
-ca_retfloat:
-        ;; Load %ecx with the pointer to storage for the return value
-        mov   ecx, rvalue
-        fstp  DWORD PTR [ecx]
-        jmp   ca_epilogue
-
-ca_retdouble:
-        ;; Load %ecx with the pointer to storage for the return value
-        mov   ecx, rvalue
-        fstp  QWORD PTR [ecx]
-        jmp   ca_epilogue
-
-ca_retlongdouble:
-        ;; Load %ecx with the pointer to storage for the return value
-        mov   ecx, rvalue
-        fstp  TBYTE PTR [ecx]
-        jmp   ca_epilogue
-
-ca_retstruct1b:
-        ;; Load %ecx with the pointer to storage for the return value
-        mov   ecx, rvalue
-        mov   [ecx + 0], al
-        jmp   ca_epilogue
-
-ca_retstruct2b:
-        ;; Load %ecx with the pointer to storage for the return value
-        mov   ecx, rvalue
-        mov   [ecx + 0], ax
-        jmp   ca_epilogue
-
-ca_epilogue:
-        ;; Epilogue code is autogenerated.
-        ret
-ffi_call_win32 ENDP
-
-ffi_closure_THISCALL PROC NEAR
-        ;; Insert the register argument on the stack as the first argument
-        xchg	DWORD PTR [esp+4], ecx
-        xchg	DWORD PTR [esp], ecx
-        push	ecx
-        jmp	ffi_closure_STDCALL
-ffi_closure_THISCALL ENDP
-
-ffi_closure_FASTCALL PROC NEAR
-        ;; Insert the 2 register arguments on the stack as the first argument
-        xchg	DWORD PTR [esp+4], edx
-        xchg	DWORD PTR [esp], ecx
-        push	edx
-        push	ecx
-        jmp	ffi_closure_STDCALL
-ffi_closure_FASTCALL ENDP
-
-ffi_closure_REGISTER PROC NEAR
-        ;; Insert the 3 register arguments on the stack as the first argument
-        push	eax
-        xchg	DWORD PTR [esp+8], ecx
-        xchg	DWORD PTR [esp+4], edx
-        push	ecx
-        push	edx
-        jmp	ffi_closure_STDCALL
-ffi_closure_FASTCALL ENDP
-
-ffi_closure_SYSV PROC NEAR FORCEFRAME
-    ;; the ffi_closure ctx is passed in eax by the trampoline.
-
-        sub  esp, 40
-        lea  edx, [ebp - 24]
-        mov  [ebp - 12], edx         ;; resp
-        lea  edx, [ebp + 8]
-stub::
-        mov  [esp + 8], edx          ;; args
-        lea  edx, [ebp - 12]
-        mov  [esp + 4], edx          ;; &resp
-        mov  [esp], eax              ;; closure
-        call ffi_closure_SYSV_inner
-        mov  ecx, [ebp - 12]
-
-cs_jumptable:
-        jmp  [cs_jumpdata + 4 * eax]
-cs_jumpdata:
-        ;; Do not insert anything here between the label and jump table.
-        dd offset cs_epilogue       ;; FFI_TYPE_VOID
-        dd offset cs_retint         ;; FFI_TYPE_INT
-        dd offset cs_retfloat       ;; FFI_TYPE_FLOAT
-        dd offset cs_retdouble      ;; FFI_TYPE_DOUBLE
-        dd offset cs_retlongdouble  ;; FFI_TYPE_LONGDOUBLE
-        dd offset cs_retuint8       ;; FFI_TYPE_UINT8
-        dd offset cs_retsint8       ;; FFI_TYPE_SINT8
-        dd offset cs_retuint16      ;; FFI_TYPE_UINT16
-        dd offset cs_retsint16      ;; FFI_TYPE_SINT16
-        dd offset cs_retint         ;; FFI_TYPE_UINT32
-        dd offset cs_retint         ;; FFI_TYPE_SINT32
-        dd offset cs_retint64       ;; FFI_TYPE_UINT64
-        dd offset cs_retint64       ;; FFI_TYPE_SINT64
-        dd offset cs_retstruct      ;; FFI_TYPE_STRUCT
-        dd offset cs_retint         ;; FFI_TYPE_POINTER
-        dd offset cs_retsint8       ;; FFI_TYPE_SMALL_STRUCT_1B
-        dd offset cs_retsint16      ;; FFI_TYPE_SMALL_STRUCT_2B
-        dd offset cs_retint         ;; FFI_TYPE_SMALL_STRUCT_4B
-        dd offset cs_retmsstruct    ;; FFI_TYPE_MS_STRUCT
-
-cs_retuint8:
-        movzx eax, BYTE PTR [ecx]
-        jmp   cs_epilogue
-
-cs_retsint8:
-        movsx eax, BYTE PTR [ecx]
-        jmp   cs_epilogue
-
-cs_retuint16:
-        movzx eax, WORD PTR [ecx]
-        jmp   cs_epilogue
-
-cs_retsint16:
-        movsx eax, WORD PTR [ecx]
-        jmp   cs_epilogue
-
-cs_retint:
-        mov   eax, [ecx]
-        jmp   cs_epilogue
-
-cs_retint64:
-        mov   eax, [ecx + 0]
-        mov   edx, [ecx + 4]
-        jmp   cs_epilogue
-
-cs_retfloat:
-        fld   DWORD PTR [ecx]
-        jmp   cs_epilogue
-
-cs_retdouble:
-        fld   QWORD PTR [ecx]
-        jmp   cs_epilogue
-
-cs_retlongdouble:
-        fld   TBYTE PTR [ecx]
-        jmp   cs_epilogue
-
-cs_retstruct:
-        ;; Caller expects us to pop struct return value pointer hidden arg.
-        ;; Epilogue code is autogenerated.
-        ret	4
-
-cs_retmsstruct:
-        ;; Caller expects us to return a pointer to the real return value.
-        mov   eax, ecx
-        ;; Caller doesn't expects us to pop struct return value pointer hidden arg.
-        jmp   cs_epilogue
-
-cs_epilogue:
-        ;; Epilogue code is autogenerated.
-        ret
-ffi_closure_SYSV ENDP
-
-#if !FFI_NO_RAW_API
-
-#define RAW_CLOSURE_CIF_OFFSET ((FFI_TRAMPOLINE_SIZE + 3) AND NOT 3)
-#define RAW_CLOSURE_FUN_OFFSET (RAW_CLOSURE_CIF_OFFSET + 4)
-#define RAW_CLOSURE_USER_DATA_OFFSET (RAW_CLOSURE_FUN_OFFSET + 4)
-
-ffi_closure_raw_THISCALL PROC NEAR USES esi FORCEFRAME
-        sub esp, 36
-        mov  esi, [eax + RAW_CLOSURE_CIF_OFFSET]        ;; closure->cif
-        mov  edx, [eax + RAW_CLOSURE_USER_DATA_OFFSET]  ;; closure->user_data
-        mov [esp + 12], edx
-        lea edx, [ebp + 12]
-        jmp stubraw
-ffi_closure_raw_THISCALL ENDP
-
-ffi_closure_raw_SYSV PROC NEAR USES esi FORCEFRAME
-    ;; the ffi_closure ctx is passed in eax by the trampoline.
-
-        sub  esp, 40
-        mov  esi, [eax + RAW_CLOSURE_CIF_OFFSET]        ;; closure->cif
-        mov  edx, [eax + RAW_CLOSURE_USER_DATA_OFFSET]  ;; closure->user_data
-        mov  [esp + 12], edx                            ;; user_data
-        lea  edx, [ebp + 8]
-stubraw::
-        mov  [esp + 8], edx                             ;; raw_args
-        lea  edx, [ebp - 24]
-        mov  [esp + 4], edx                             ;; &res
-        mov  [esp], esi                                 ;; cif
-        call DWORD PTR [eax + RAW_CLOSURE_FUN_OFFSET]   ;; closure->fun
-        mov  eax, [esi + CIF_FLAGS_OFFSET]              ;; cif->flags
-        lea  ecx, [ebp - 24]
-
-cr_jumptable:
-        jmp  [cr_jumpdata + 4 * eax]
-cr_jumpdata:
-        ;; Do not insert anything here between the label and jump table.
-        dd offset cr_epilogue       ;; FFI_TYPE_VOID
-        dd offset cr_retint         ;; FFI_TYPE_INT
-        dd offset cr_retfloat       ;; FFI_TYPE_FLOAT
-        dd offset cr_retdouble      ;; FFI_TYPE_DOUBLE
-        dd offset cr_retlongdouble  ;; FFI_TYPE_LONGDOUBLE
-        dd offset cr_retuint8       ;; FFI_TYPE_UINT8
-        dd offset cr_retsint8       ;; FFI_TYPE_SINT8
-        dd offset cr_retuint16      ;; FFI_TYPE_UINT16
-        dd offset cr_retsint16      ;; FFI_TYPE_SINT16
-        dd offset cr_retint         ;; FFI_TYPE_UINT32
-        dd offset cr_retint         ;; FFI_TYPE_SINT32
-        dd offset cr_retint64       ;; FFI_TYPE_UINT64
-        dd offset cr_retint64       ;; FFI_TYPE_SINT64
-        dd offset cr_epilogue       ;; FFI_TYPE_STRUCT
-        dd offset cr_retint         ;; FFI_TYPE_POINTER
-        dd offset cr_retsint8       ;; FFI_TYPE_SMALL_STRUCT_1B
-        dd offset cr_retsint16      ;; FFI_TYPE_SMALL_STRUCT_2B
-        dd offset cr_retint         ;; FFI_TYPE_SMALL_STRUCT_4B
-        dd offset cr_epilogue       ;; FFI_TYPE_MS_STRUCT
-
-cr_retuint8:
-        movzx eax, BYTE PTR [ecx]
-        jmp   cr_epilogue
-
-cr_retsint8:
-        movsx eax, BYTE PTR [ecx]
-        jmp   cr_epilogue
-
-cr_retuint16:
-        movzx eax, WORD PTR [ecx]
-        jmp   cr_epilogue
-
-cr_retsint16:
-        movsx eax, WORD PTR [ecx]
-        jmp   cr_epilogue
-
-cr_retint:
-        mov   eax, [ecx]
-        jmp   cr_epilogue
-
-cr_retint64:
-        mov   eax, [ecx + 0]
-        mov   edx, [ecx + 4]
-        jmp   cr_epilogue
-
-cr_retfloat:
-        fld   DWORD PTR [ecx]
-        jmp   cr_epilogue
-
-cr_retdouble:
-        fld   QWORD PTR [ecx]
-        jmp   cr_epilogue
-
-cr_retlongdouble:
-        fld   TBYTE PTR [ecx]
-        jmp   cr_epilogue
-
-cr_epilogue:
-        ;; Epilogue code is autogenerated.
-        ret
-ffi_closure_raw_SYSV ENDP
-
-#endif /* !FFI_NO_RAW_API */
-
-ffi_closure_STDCALL PROC NEAR FORCEFRAME
-        mov  eax, [esp] ;; the ffi_closure ctx passed by the trampoline.
-
-        sub  esp, 40
-        lea  edx, [ebp - 24]
-        mov  [ebp - 12], edx         ;; resp
-        lea  edx, [ebp + 12]         ;; account for stub return address on stack
-        mov  [esp + 8], edx          ;; args
-        lea  edx, [ebp - 12]
-        mov  [esp + 4], edx          ;; &resp
-        mov  [esp], eax              ;; closure
-        call ffi_closure_WIN32_inner
-        mov  ecx, [ebp - 12]
-
-        xchg [ebp + 4], eax          ;;xchg size of stack parameters and ffi_closure ctx
-        mov  eax, DWORD PTR [eax + CLOSURE_CIF_OFFSET]
-        mov  eax, DWORD PTR [eax + CIF_FLAGS_OFFSET]
-
-cd_jumptable:
-        jmp  [cd_jumpdata + 4 * eax]
-cd_jumpdata:
-        ;; Do not insert anything here between the label and jump table.
-        dd offset cd_epilogue       ;; FFI_TYPE_VOID
-        dd offset cd_retint         ;; FFI_TYPE_INT
-        dd offset cd_retfloat       ;; FFI_TYPE_FLOAT
-        dd offset cd_retdouble      ;; FFI_TYPE_DOUBLE
-        dd offset cd_retlongdouble  ;; FFI_TYPE_LONGDOUBLE
-        dd offset cd_retuint8       ;; FFI_TYPE_UINT8
-        dd offset cd_retsint8       ;; FFI_TYPE_SINT8
-        dd offset cd_retuint16      ;; FFI_TYPE_UINT16
-        dd offset cd_retsint16      ;; FFI_TYPE_SINT16
-        dd offset cd_retint         ;; FFI_TYPE_UINT32
-        dd offset cd_retint         ;; FFI_TYPE_SINT32
-        dd offset cd_retint64       ;; FFI_TYPE_UINT64
-        dd offset cd_retint64       ;; FFI_TYPE_SINT64
-        dd offset cd_epilogue       ;; FFI_TYPE_STRUCT
-        dd offset cd_retint         ;; FFI_TYPE_POINTER
-        dd offset cd_retsint8       ;; FFI_TYPE_SMALL_STRUCT_1B
-        dd offset cd_retsint16      ;; FFI_TYPE_SMALL_STRUCT_2B
-        dd offset cd_retint         ;; FFI_TYPE_SMALL_STRUCT_4B
-
-cd_retuint8:
-        movzx eax, BYTE PTR [ecx]
-        jmp   cd_epilogue
-
-cd_retsint8:
-        movsx eax, BYTE PTR [ecx]
-        jmp   cd_epilogue
-
-cd_retuint16:
-        movzx eax, WORD PTR [ecx]
-        jmp   cd_epilogue
-
-cd_retsint16:
-        movsx eax, WORD PTR [ecx]
-        jmp   cd_epilogue
-
-cd_retint:
-        mov   eax, [ecx]
-        jmp   cd_epilogue
-
-cd_retint64:
-        mov   eax, [ecx + 0]
-        mov   edx, [ecx + 4]
-        jmp   cd_epilogue
-
-cd_retfloat:
-        fld   DWORD PTR [ecx]
-        jmp   cd_epilogue
-
-cd_retdouble:
-        fld   QWORD PTR [ecx]
-        jmp   cd_epilogue
-
-cd_retlongdouble:
-        fld   TBYTE PTR [ecx]
-        jmp   cd_epilogue
-
-cd_epilogue:
-        mov   esp, ebp
-        pop   ebp
-        mov   ecx, [esp + 4]  ;; Return address
-        add   esp, [esp]      ;; Parameters stack size
-        add   esp, 8
-        jmp   ecx
-ffi_closure_STDCALL ENDP
-
-_TEXT ENDS
-END
-
-#else
-
-#define CLOSURE_CIF_OFFSET ((FFI_TRAMPOLINE_SIZE + 3) & ~3)
-
-#if defined(SYMBOL_UNDERSCORE)
-#define USCORE_SYMBOL(x) _##x
-#else
-#define USCORE_SYMBOL(x) x
-#endif
-        .text
- 
-        # This assumes we are using gas.
-        .balign 16
-FFI_HIDDEN(ffi_call_win32)
-        .globl	USCORE_SYMBOL(ffi_call_win32)
-#if defined(X86_WIN32) && !defined(__OS2__)
-        .def	_ffi_call_win32;	.scl	2;	.type	32;	.endef
-#endif
-USCORE_SYMBOL(ffi_call_win32):
-.LFB1:
-        pushl %ebp
-.LCFI0:
-        movl  %esp,%ebp
-.LCFI1:
-        # Make room for all of the new args.
-        movl  20(%ebp),%ecx                                                     
-        subl  %ecx,%esp
- 
-        movl  %esp,%eax
- 
-        # Call ffi_prep_args
-        pushl 12(%ebp)
-        pushl %eax
-        call  *8(%ebp)
-        addl  $8,%esp
-
-        # Prepare registers
-        # EAX stores the number of register arguments
-        cmpl  $0, %eax
-        je    .fun
-        cmpl  $3, %eax
-        jl    .prepr_two_cmp
-        
-        movl  %esp, %ecx
-        addl  $12, %esp
-        movl  8(%ecx), %eax
-        jmp   .prepr_two
-.prepr_two_cmp:
-        cmpl  $2, %eax
-        jl    .prepr_one_prep
-        movl  %esp, %ecx
-        addl  $8, %esp
-.prepr_two:
-        movl  4(%ecx), %edx
-        jmp   .prepr_one
-.prepr_one_prep:
-        movl  %esp, %ecx
-        addl  $4, %esp
-.prepr_one:
-        movl  (%ecx), %ecx
-        cmpl  $7, 16(%ebp) # FFI_REGISTER
-        jne   .fun
-
-        xchgl %eax, %ecx
-        
-.fun:
-        # FIXME: Align the stack to a 128-bit boundary to avoid
-        # potential performance hits.
-
-        # Call function
-        call  *32(%ebp)
- 
-        # stdcall functions pop arguments off the stack themselves
-
-        # Load %ecx with the return type code
-        movl  24(%ebp),%ecx
- 
-        # If the return value pointer is NULL, assume no return value.
-        cmpl  $0,28(%ebp)
-        jne   0f
- 
-        # Even if there is no space for the return value, we are
-        # obliged to handle floating-point values.
-        cmpl  $FFI_TYPE_FLOAT,%ecx
-        jne   .Lnoretval
-        fstp  %st(0)
- 
-        jmp   .Lepilogue
-
-0:
-        call 1f
-        # Do not insert anything here between the call and the jump table.
-.Lstore_table:
-        .long	.Lnoretval-.Lstore_table	/* FFI_TYPE_VOID */
-        .long	.Lretint-.Lstore_table		/* FFI_TYPE_INT */
-        .long	.Lretfloat-.Lstore_table	/* FFI_TYPE_FLOAT */
-        .long	.Lretdouble-.Lstore_table	/* FFI_TYPE_DOUBLE */
-        .long	.Lretlongdouble-.Lstore_table	/* FFI_TYPE_LONGDOUBLE */
-        .long	.Lretuint8-.Lstore_table	/* FFI_TYPE_UINT8 */
-        .long	.Lretsint8-.Lstore_table	/* FFI_TYPE_SINT8 */
-        .long	.Lretuint16-.Lstore_table	/* FFI_TYPE_UINT16 */
-        .long	.Lretsint16-.Lstore_table	/* FFI_TYPE_SINT16 */
-        .long	.Lretint-.Lstore_table		/* FFI_TYPE_UINT32 */
-        .long	.Lretint-.Lstore_table		/* FFI_TYPE_SINT32 */
-        .long	.Lretint64-.Lstore_table	/* FFI_TYPE_UINT64 */
-        .long	.Lretint64-.Lstore_table	/* FFI_TYPE_SINT64 */
-        .long	.Lretstruct-.Lstore_table	/* FFI_TYPE_STRUCT */
-        .long	.Lretint-.Lstore_table		/* FFI_TYPE_POINTER */
-        .long	.Lretstruct1b-.Lstore_table	/* FFI_TYPE_SMALL_STRUCT_1B */
-        .long	.Lretstruct2b-.Lstore_table	/* FFI_TYPE_SMALL_STRUCT_2B */
-        .long	.Lretstruct4b-.Lstore_table	/* FFI_TYPE_SMALL_STRUCT_4B */
-        .long	.Lretstruct-.Lstore_table	/* FFI_TYPE_MS_STRUCT */
-1:
-        shl	$2, %ecx
-        add	(%esp),%ecx
-        mov	(%ecx),%ecx
-        add	(%esp),%ecx
-        add	$4, %esp
-        jmp	*%ecx
-
-        /* Sign/zero extend as appropriate.  */
-.Lretsint8:
-        movsbl	%al, %eax
-        jmp	.Lretint
-
-.Lretsint16:
-        movswl	%ax, %eax
-        jmp	.Lretint
-
-.Lretuint8:
-        movzbl	%al, %eax
-        jmp	.Lretint
-
-.Lretuint16:
-        movzwl	%ax, %eax
-        jmp	.Lretint
-
-.Lretint:
-        # Load %ecx with the pointer to storage for the return value
-        movl  28(%ebp),%ecx
-        movl  %eax,0(%ecx)
-        jmp   .Lepilogue
- 
-.Lretfloat:
-         # Load %ecx with the pointer to storage for the return value
-        movl  28(%ebp),%ecx
-        fstps (%ecx)
-        jmp   .Lepilogue
- 
-.Lretdouble:
-        # Load %ecx with the pointer to storage for the return value
-        movl  28(%ebp),%ecx
-        fstpl (%ecx)
-        jmp   .Lepilogue
- 
-.Lretlongdouble:
-        # Load %ecx with the pointer to storage for the return value
-        movl  28(%ebp),%ecx
-        fstpt (%ecx)
-        jmp   .Lepilogue
- 
-.Lretint64:
-        # Load %ecx with the pointer to storage for the return value
-        movl  28(%ebp),%ecx
-        movl  %eax,0(%ecx)
-        movl  %edx,4(%ecx)
-        jmp   .Lepilogue
-
-.Lretstruct1b:
-        # Load %ecx with the pointer to storage for the return value
-        movl  28(%ebp),%ecx
-        movb  %al,0(%ecx)
-        jmp   .Lepilogue
- 
-.Lretstruct2b:
-        # Load %ecx with the pointer to storage for the return value
-        movl  28(%ebp),%ecx
-        movw  %ax,0(%ecx)
-        jmp   .Lepilogue
-
-.Lretstruct4b:
-        # Load %ecx with the pointer to storage for the return value
-        movl  28(%ebp),%ecx
-        movl  %eax,0(%ecx)
-        jmp   .Lepilogue
-
-.Lretstruct:
-        # Nothing to do!
- 
-.Lnoretval:
-.Lepilogue:
-        movl %ebp,%esp
-        popl %ebp
-        ret
-.ffi_call_win32_end:
-        .balign 16
-FFI_HIDDEN(ffi_closure_THISCALL)
-        .globl	USCORE_SYMBOL(ffi_closure_THISCALL)
-#if defined(X86_WIN32) && !defined(__OS2__)
-        .def	_ffi_closure_THISCALL;	.scl	2;	.type	32;	.endef
-#endif
-USCORE_SYMBOL(ffi_closure_THISCALL):
-        /* Insert the register argument on the stack as the first argument */
-        xchg	%ecx, 4(%esp)
-        xchg	%ecx, (%esp)
-        push	%ecx
-        jmp	.ffi_closure_STDCALL_internal
-
-        .balign 16
-FFI_HIDDEN(ffi_closure_FASTCALL)
-        .globl	USCORE_SYMBOL(ffi_closure_FASTCALL)
-#if defined(X86_WIN32) && !defined(__OS2__)
-        .def	_ffi_closure_FASTCALL;	.scl	2;	.type	32;	.endef
-#endif
-USCORE_SYMBOL(ffi_closure_FASTCALL):
-        /* Insert the 2 register arguments on the stack as the first two arguments */
-        xchg	%edx, 4(%esp)
-        xchg	%ecx, (%esp)
-        push	%edx
-        push	%ecx
-        jmp	.ffi_closure_STDCALL_internal
-FFI_HIDDEN(ffi_closure_REGISTER)
-        .globl	USCORE_SYMBOL(ffi_closure_REGISTER)
-#if defined(X86_WIN32) && !defined(__OS2__)
-        .def	_ffi_closure_REGISTER;	.scl	2;	.type	32;	.endef
-#endif
-USCORE_SYMBOL(ffi_closure_REGISTER):
-        /* Insert the 3 register arguments on the stack as the first two arguments */
-        push	%eax
-        xchg	%ecx, 8(%esp)
-        xchg	%edx, 4(%esp)
-        push	%ecx
-        push	%edx
-        jmp	.ffi_closure_STDCALL_internal
-
-.LFE1:
-        # This assumes we are using gas.
-        .balign 16
-FFI_HIDDEN(ffi_closure_SYSV)
-#if defined(X86_WIN32)
-        .globl	USCORE_SYMBOL(ffi_closure_SYSV)
-#if defined(X86_WIN32) && !defined(__OS2__)
-        .def	_ffi_closure_SYSV;	.scl	2;	.type	32;	.endef
-#endif
-USCORE_SYMBOL(ffi_closure_SYSV):
-#endif
-.LFB3:
-        pushl	%ebp
-.LCFI4:
-        movl	%esp, %ebp
-.LCFI5:
-        subl	$40, %esp
-        leal	-24(%ebp), %edx
-        movl	%edx, -12(%ebp)	/* resp */
-        leal	8(%ebp), %edx
-        movl	%edx, 4(%esp)	/* args = __builtin_dwarf_cfa () */
-        leal	-12(%ebp), %edx
-        movl	%edx, (%esp)	/* &resp */
-#if defined(HAVE_HIDDEN_VISIBILITY_ATTRIBUTE) || !defined(__PIC__)
-        call	USCORE_SYMBOL(ffi_closure_SYSV_inner)
-#elif defined(X86_DARWIN)
-        calll	L_ffi_closure_SYSV_inner$stub
-#else
-        movl	%ebx, 8(%esp)
-        call	1f
-1:      popl	%ebx
-        addl	$_GLOBAL_OFFSET_TABLE_+[.-1b], %ebx
-        call	ffi_closure_SYSV_inner@PLT
-        movl	8(%esp), %ebx
-#endif
-        movl	-12(%ebp), %ecx
-
-0:
-        call	1f
-        # Do not insert anything here between the call and the jump table.
-.Lcls_store_table:
-        .long	.Lcls_noretval-.Lcls_store_table	/* FFI_TYPE_VOID */
-        .long	.Lcls_retint-.Lcls_store_table		/* FFI_TYPE_INT */
-        .long	.Lcls_retfloat-.Lcls_store_table	/* FFI_TYPE_FLOAT */
-        .long	.Lcls_retdouble-.Lcls_store_table	/* FFI_TYPE_DOUBLE */
-        .long	.Lcls_retldouble-.Lcls_store_table	/* FFI_TYPE_LONGDOUBLE */
-        .long	.Lcls_retuint8-.Lcls_store_table	/* FFI_TYPE_UINT8 */
-        .long	.Lcls_retsint8-.Lcls_store_table	/* FFI_TYPE_SINT8 */
-        .long	.Lcls_retuint16-.Lcls_store_table	/* FFI_TYPE_UINT16 */
-        .long	.Lcls_retsint16-.Lcls_store_table	/* FFI_TYPE_SINT16 */
-        .long	.Lcls_retint-.Lcls_store_table		/* FFI_TYPE_UINT32 */
-        .long	.Lcls_retint-.Lcls_store_table		/* FFI_TYPE_SINT32 */
-        .long	.Lcls_retllong-.Lcls_store_table	/* FFI_TYPE_UINT64 */
-        .long	.Lcls_retllong-.Lcls_store_table	/* FFI_TYPE_SINT64 */
-        .long	.Lcls_retstruct-.Lcls_store_table	/* FFI_TYPE_STRUCT */
-        .long	.Lcls_retint-.Lcls_store_table		/* FFI_TYPE_POINTER */
-        .long	.Lcls_retstruct1-.Lcls_store_table	/* FFI_TYPE_SMALL_STRUCT_1B */
-        .long	.Lcls_retstruct2-.Lcls_store_table	/* FFI_TYPE_SMALL_STRUCT_2B */
-        .long	.Lcls_retstruct4-.Lcls_store_table	/* FFI_TYPE_SMALL_STRUCT_4B */
-        .long	.Lcls_retmsstruct-.Lcls_store_table	/* FFI_TYPE_MS_STRUCT */
-
-1:
-        shl	$2, %eax
-        add	(%esp),%eax
-        mov	(%eax),%eax
-        add	(%esp),%eax
-        add	$4, %esp
-        jmp	*%eax
-
-        /* Sign/zero extend as appropriate.  */
-.Lcls_retsint8:
-        movsbl	(%ecx), %eax
-        jmp	.Lcls_epilogue
-
-.Lcls_retsint16:
-        movswl	(%ecx), %eax
-        jmp	.Lcls_epilogue
-
-.Lcls_retuint8:
-        movzbl	(%ecx), %eax
-        jmp	.Lcls_epilogue
-
-.Lcls_retuint16:
-        movzwl	(%ecx), %eax
-        jmp	.Lcls_epilogue
-
-.Lcls_retint:
-        movl	(%ecx), %eax
-        jmp	.Lcls_epilogue
-
-.Lcls_retfloat:
-        flds	(%ecx)
-        jmp	.Lcls_epilogue
-
-.Lcls_retdouble:
-        fldl	(%ecx)
-        jmp	.Lcls_epilogue
-
-.Lcls_retldouble:
-        fldt	(%ecx)
-        jmp	.Lcls_epilogue
-
-.Lcls_retllong:
-        movl	(%ecx), %eax
-        movl	4(%ecx), %edx
-        jmp	.Lcls_epilogue
-
-.Lcls_retstruct1:
-        movsbl	(%ecx), %eax
-        jmp	.Lcls_epilogue
-
-.Lcls_retstruct2:
-        movswl	(%ecx), %eax
-        jmp	.Lcls_epilogue
-
-.Lcls_retstruct4:
-        movl	(%ecx), %eax
-        jmp	.Lcls_epilogue
-
-.Lcls_retstruct:
-        # Caller expects us to pop struct return value pointer hidden arg.
-        movl	%ebp, %esp
-        popl	%ebp
-        ret	$0x4
-
-.Lcls_retmsstruct:
-        # Caller expects us to return a pointer to the real return value.
-        mov	%ecx, %eax
-        # Caller doesn't expects us to pop struct return value pointer hidden arg.
-        jmp	.Lcls_epilogue
-
-.Lcls_noretval:
-.Lcls_epilogue:
-        movl	%ebp, %esp
-        popl	%ebp
-        ret
-.ffi_closure_SYSV_end:
-.LFE3:
-
-#if !FFI_NO_RAW_API
-
-#define RAW_CLOSURE_CIF_OFFSET ((FFI_TRAMPOLINE_SIZE + 3) & ~3)
-#define RAW_CLOSURE_FUN_OFFSET (RAW_CLOSURE_CIF_OFFSET + 4)
-#define RAW_CLOSURE_USER_DATA_OFFSET (RAW_CLOSURE_FUN_OFFSET + 4)
-
-#ifdef X86_WIN32
-        .balign 16
-FFI_HIDDEN(ffi_closure_raw_THISCALL)
-        .globl	USCORE_SYMBOL(ffi_closure_raw_THISCALL)
-#if defined(X86_WIN32) && !defined(__OS2__)
-        .def	_ffi_closure_raw_THISCALL;	.scl	2;	.type	32;	.endef
-#endif
-USCORE_SYMBOL(ffi_closure_raw_THISCALL):
-        pushl	%ebp
-        movl	%esp, %ebp
-        pushl	%esi
-        subl	$36, %esp
-        movl	RAW_CLOSURE_CIF_OFFSET(%eax), %esi	 /* closure->cif */
-        movl	RAW_CLOSURE_USER_DATA_OFFSET(%eax), %edx /* closure->user_data */
-        movl	%edx, 12(%esp)	/* user_data */
-        leal	12(%ebp), %edx	/* __builtin_dwarf_cfa () */
-        jmp	.stubraw
-#endif /* X86_WIN32 */
-
-        # This assumes we are using gas.
-        .balign 16
-#if defined(X86_WIN32)
-        .globl	USCORE_SYMBOL(ffi_closure_raw_SYSV)
-#if defined(X86_WIN32) && !defined(__OS2__)
-        .def	_ffi_closure_raw_SYSV;	.scl	2;	.type	32;	.endef
-#endif
-USCORE_SYMBOL(ffi_closure_raw_SYSV):
-#endif /* defined(X86_WIN32) */
-.LFB4:
-        pushl	%ebp
-.LCFI6:
-        movl	%esp, %ebp
-.LCFI7:
-        pushl	%esi
-.LCFI8:
-        subl	$36, %esp
-        movl	RAW_CLOSURE_CIF_OFFSET(%eax), %esi	 /* closure->cif */
-        movl	RAW_CLOSURE_USER_DATA_OFFSET(%eax), %edx /* closure->user_data */
-        movl	%edx, 12(%esp)	/* user_data */
-        leal	8(%ebp), %edx	/* __builtin_dwarf_cfa () */
-.stubraw:
-        movl	%edx, 8(%esp)	/* raw_args */
-        leal	-24(%ebp), %edx
-        movl	%edx, 4(%esp)	/* &res */
-        movl	%esi, (%esp)	/* cif */
-        call	*RAW_CLOSURE_FUN_OFFSET(%eax)		 /* closure->fun */
-        movl	CIF_FLAGS_OFFSET(%esi), %eax		 /* rtype */
-0:
-        call	1f
-        # Do not insert anything here between the call and the jump table.
-.Lrcls_store_table:
-        .long	.Lrcls_noretval-.Lrcls_store_table	/* FFI_TYPE_VOID */
-        .long	.Lrcls_retint-.Lrcls_store_table	/* FFI_TYPE_INT */
-        .long	.Lrcls_retfloat-.Lrcls_store_table	/* FFI_TYPE_FLOAT */
-        .long	.Lrcls_retdouble-.Lrcls_store_table	/* FFI_TYPE_DOUBLE */
-        .long	.Lrcls_retldouble-.Lrcls_store_table	/* FFI_TYPE_LONGDOUBLE */
-        .long	.Lrcls_retuint8-.Lrcls_store_table	/* FFI_TYPE_UINT8 */
-        .long	.Lrcls_retsint8-.Lrcls_store_table	/* FFI_TYPE_SINT8 */
-        .long	.Lrcls_retuint16-.Lrcls_store_table	/* FFI_TYPE_UINT16 */
-        .long	.Lrcls_retsint16-.Lrcls_store_table	/* FFI_TYPE_SINT16 */
-        .long	.Lrcls_retint-.Lrcls_store_table	/* FFI_TYPE_UINT32 */
-        .long	.Lrcls_retint-.Lrcls_store_table	/* FFI_TYPE_SINT32 */
-        .long	.Lrcls_retllong-.Lrcls_store_table	/* FFI_TYPE_UINT64 */
-        .long	.Lrcls_retllong-.Lrcls_store_table	/* FFI_TYPE_SINT64 */
-        .long	.Lrcls_retstruct-.Lrcls_store_table	/* FFI_TYPE_STRUCT */
-        .long	.Lrcls_retint-.Lrcls_store_table	/* FFI_TYPE_POINTER */
-        .long	.Lrcls_retstruct1-.Lrcls_store_table	/* FFI_TYPE_SMALL_STRUCT_1B */
-        .long	.Lrcls_retstruct2-.Lrcls_store_table	/* FFI_TYPE_SMALL_STRUCT_2B */
-        .long	.Lrcls_retstruct4-.Lrcls_store_table	/* FFI_TYPE_SMALL_STRUCT_4B */
-        .long	.Lrcls_retstruct-.Lrcls_store_table	/* FFI_TYPE_MS_STRUCT */
-1:
-        shl	$2, %eax
-        add	(%esp),%eax
-        mov	(%eax),%eax
-        add	(%esp),%eax
-        add	$4, %esp
-        jmp	*%eax
-
-        /* Sign/zero extend as appropriate.  */
-.Lrcls_retsint8:
-        movsbl	-24(%ebp), %eax
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retsint16:
-        movswl	-24(%ebp), %eax
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retuint8:
-        movzbl	-24(%ebp), %eax
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retuint16:
-        movzwl	-24(%ebp), %eax
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retint:
-        movl	-24(%ebp), %eax
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retfloat:
-        flds	-24(%ebp)
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retdouble:
-        fldl	-24(%ebp)
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retldouble:
-        fldt	-24(%ebp)
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retllong:
-        movl	-24(%ebp), %eax
-        movl	-20(%ebp), %edx
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retstruct1:
-        movsbl	-24(%ebp), %eax
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retstruct2:
-        movswl	-24(%ebp), %eax
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retstruct4:
-        movl	-24(%ebp), %eax
-        jmp	.Lrcls_epilogue
-
-.Lrcls_retstruct:
-        # Nothing to do!
-
-.Lrcls_noretval:
-.Lrcls_epilogue:
-        addl	$36, %esp
-        popl	%esi
-        popl	%ebp
-        ret
-.ffi_closure_raw_SYSV_end:
-.LFE4:
-
-#endif /* !FFI_NO_RAW_API */
-
-        # This assumes we are using gas.
-        .balign	16
-FFI_HIDDEN(ffi_closure_STDCALL)
-        .globl	USCORE_SYMBOL(ffi_closure_STDCALL)
-#if defined(X86_WIN32) && !defined(__OS2__)
-        .def	_ffi_closure_STDCALL;	.scl	2;	.type	32;	.endef
-#endif
-USCORE_SYMBOL(ffi_closure_STDCALL):
-.ffi_closure_STDCALL_internal:
-        /* ffi_closure ctx is at top of the stack */
-        movl	(%esp), %eax
-.LFB5:
-        pushl	%ebp
-.LCFI9:
-        movl	%esp, %ebp
-.LCFI10:
-        subl	$40, %esp
-        leal	-24(%ebp), %edx
-        movl	%edx, -12(%ebp)	/* resp */
-        leal	12(%ebp), %edx  /* account for stub return address on stack */
-        movl	%edx, 4(%esp)	/* args */
-        leal	-12(%ebp), %edx
-        movl	%edx, (%esp)	/* &resp */
-#if defined(HAVE_HIDDEN_VISIBILITY_ATTRIBUTE) || !defined(__PIC__)
-        call	USCORE_SYMBOL(ffi_closure_WIN32_inner)
-#elif defined(X86_DARWIN)
-        calll	L_ffi_closure_WIN32_inner$stub
-#else
-        movl	%ebx, 8(%esp)
-        call	1f
-1:      popl	%ebx
-        addl	$_GLOBAL_OFFSET_TABLE_+[.-1b], %ebx
-        call	ffi_closure_WIN32_inner@PLT
-        movl	8(%esp), %ebx
-#endif
-        movl	-12(%ebp), %ecx
-0:
-        xchgl	4(%ebp), %eax /* xchg size of stack parameters and ffi_closure ctx */
-        movl	CLOSURE_CIF_OFFSET(%eax), %eax
-        movl	CIF_FLAGS_OFFSET(%eax), %eax
-
-        call	1f
-        # Do not insert anything here between the call and the jump table.
-.Lscls_store_table:
-        .long	.Lscls_noretval-.Lscls_store_table	/* FFI_TYPE_VOID */
-        .long	.Lscls_retint-.Lscls_store_table	/* FFI_TYPE_INT */
-        .long	.Lscls_retfloat-.Lscls_store_table	/* FFI_TYPE_FLOAT */
-        .long	.Lscls_retdouble-.Lscls_store_table	/* FFI_TYPE_DOUBLE */
-        .long	.Lscls_retldouble-.Lscls_store_table	/* FFI_TYPE_LONGDOUBLE */
-        .long	.Lscls_retuint8-.Lscls_store_table	/* FFI_TYPE_UINT8 */
-        .long	.Lscls_retsint8-.Lscls_store_table	/* FFI_TYPE_SINT8 */
-        .long	.Lscls_retuint16-.Lscls_store_table	/* FFI_TYPE_UINT16 */
-        .long	.Lscls_retsint16-.Lscls_store_table	/* FFI_TYPE_SINT16 */
-        .long	.Lscls_retint-.Lscls_store_table	/* FFI_TYPE_UINT32 */
-        .long	.Lscls_retint-.Lscls_store_table	/* FFI_TYPE_SINT32 */
-        .long	.Lscls_retllong-.Lscls_store_table	/* FFI_TYPE_UINT64 */
-        .long	.Lscls_retllong-.Lscls_store_table	/* FFI_TYPE_SINT64 */
-        .long	.Lscls_retstruct-.Lscls_store_table	/* FFI_TYPE_STRUCT */
-        .long	.Lscls_retint-.Lscls_store_table	/* FFI_TYPE_POINTER */
-        .long	.Lscls_retstruct1-.Lscls_store_table	/* FFI_TYPE_SMALL_STRUCT_1B */
-        .long	.Lscls_retstruct2-.Lscls_store_table	/* FFI_TYPE_SMALL_STRUCT_2B */
-        .long	.Lscls_retstruct4-.Lscls_store_table	/* FFI_TYPE_SMALL_STRUCT_4B */
-1:
-        shl	$2, %eax
-        add	(%esp),%eax
-        mov	(%eax),%eax
-        add	(%esp),%eax
-        add	$4, %esp
-        jmp	*%eax
-
-        /* Sign/zero extend as appropriate.  */
-.Lscls_retsint8:
-        movsbl	(%ecx), %eax
-        jmp	.Lscls_epilogue
-
-.Lscls_retsint16:
-        movswl	(%ecx), %eax
-        jmp	.Lscls_epilogue
-
-.Lscls_retuint8:
-        movzbl	(%ecx), %eax
-        jmp	.Lscls_epilogue
-
-.Lscls_retuint16:
-        movzwl	(%ecx), %eax
-        jmp	.Lscls_epilogue
-
-.Lscls_retint:
-        movl	(%ecx), %eax
-        jmp	.Lscls_epilogue
-
-.Lscls_retfloat:
-        flds	(%ecx)
-        jmp	.Lscls_epilogue
-
-.Lscls_retdouble:
-        fldl	(%ecx)
-        jmp	.Lscls_epilogue
-
-.Lscls_retldouble:
-        fldt	(%ecx)
-        jmp	.Lscls_epilogue
-
-.Lscls_retllong:
-        movl	(%ecx), %eax
-        movl	4(%ecx), %edx
-        jmp	.Lscls_epilogue
-
-.Lscls_retstruct1:
-        movsbl	(%ecx), %eax
-        jmp	.Lscls_epilogue
-
-.Lscls_retstruct2:
-        movswl	(%ecx), %eax
-        jmp	.Lscls_epilogue
-
-.Lscls_retstruct4:
-        movl	(%ecx), %eax
-        jmp	.Lscls_epilogue
-
-.Lscls_retstruct:
-        # Nothing to do!
-
-.Lscls_noretval:
-.Lscls_epilogue:
-        movl	%ebp, %esp
-        popl	%ebp
-        movl	4(%esp), %ecx /* Return address */
-        addl	(%esp), %esp  /* Parameters stack size */
-        addl	$8, %esp
-        jmp	*%ecx
-.ffi_closure_STDCALL_end:
-.LFE5:
-
-#if defined(X86_DARWIN)
-.section __IMPORT,__jump_table,symbol_stubs,self_modifying_code+pure_instructions,5
-L_ffi_closure_SYSV_inner$stub:
-        .indirect_symbol _ffi_closure_SYSV_inner
-        hlt ; hlt ; hlt ; hlt ; hlt
-L_ffi_closure_WIN32_inner$stub:
-        .indirect_symbol _ffi_closure_WIN32_inner
-        hlt ; hlt ; hlt ; hlt ; hlt
-#endif
-
-#if defined(X86_WIN32) && !defined(__OS2__)
-        .section	.eh_frame,"w"
-#endif
-.Lframe1:
-.LSCIE1:
-        .long	.LECIE1-.LASCIE1  /* Length of Common Information Entry */
-.LASCIE1:
-        .long	0x0	/* CIE Identifier Tag */
-        .byte	0x1	/* CIE Version */
-#ifdef __PIC__
-        .ascii "zR\0"	/* CIE Augmentation */
-#else
-        .ascii "\0"	/* CIE Augmentation */
-#endif
-        .byte	0x1	/* .uleb128 0x1; CIE Code Alignment Factor */
-        .byte	0x7c	/* .sleb128 -4; CIE Data Alignment Factor */
-        .byte	0x8	/* CIE RA Column */
-#ifdef __PIC__
-        .byte	0x1	/* .uleb128 0x1; Augmentation size */
-        .byte	0x1b	/* FDE Encoding (pcrel sdata4) */
-#endif
-        .byte	0xc	/* DW_CFA_def_cfa CFA = r4 + 4 = 4(%esp) */
-        .byte	0x4	/* .uleb128 0x4 */
-        .byte	0x4	/* .uleb128 0x4 */
-        .byte	0x88	/* DW_CFA_offset, column 0x8 %eip at CFA + 1 * -4 */
-        .byte	0x1	/* .uleb128 0x1 */
-        .align 4
-.LECIE1:
-
-.LSFDE1:
-        .long	.LEFDE1-.LASFDE1	/* FDE Length */
-.LASFDE1:
-        .long	.LASFDE1-.Lframe1	/* FDE CIE offset */
-#if defined __PIC__ && defined HAVE_AS_X86_PCREL
-        .long	.LFB1-.	/* FDE initial location */
-#else
-        .long	.LFB1
-#endif
-        .long	.LFE1-.LFB1	/* FDE address range */
-#ifdef __PIC__
-        .byte	0x0	/* .uleb128 0x0; Augmentation size */
-#endif
-        /* DW_CFA_xxx CFI instructions go here.  */
-
-        .byte	0x4	/* DW_CFA_advance_loc4 */
-        .long	.LCFI0-.LFB1
-        .byte	0xe	/* DW_CFA_def_cfa_offset CFA = r4 + 8 = 8(%esp) */
-        .byte	0x8	/* .uleb128 0x8 */
-        .byte	0x85	/* DW_CFA_offset, column 0x5 %ebp at CFA + 2 * -4 */
-        .byte	0x2	/* .uleb128 0x2 */
-
-        .byte	0x4	/* DW_CFA_advance_loc4 */
-        .long	.LCFI1-.LCFI0
-        .byte	0xd	/* DW_CFA_def_cfa_register CFA = r5 = %ebp */
-        .byte	0x5	/* .uleb128 0x5 */
-
-        /* End of DW_CFA_xxx CFI instructions.  */
-        .align 4
-.LEFDE1:
-
-.LSFDE3:
-        .long	.LEFDE3-.LASFDE3	/* FDE Length */
-.LASFDE3:
-        .long	.LASFDE3-.Lframe1	/* FDE CIE offset */
-#if defined __PIC__ && defined HAVE_AS_X86_PCREL
-        .long	.LFB3-.	/* FDE initial location */
-#else
-        .long	.LFB3
-#endif
-        .long	.LFE3-.LFB3	/* FDE address range */
-#ifdef __PIC__
-        .byte	0x0	/* .uleb128 0x0; Augmentation size */
-#endif
-        /* DW_CFA_xxx CFI instructions go here.  */
-
-        .byte	0x4	/* DW_CFA_advance_loc4 */
-        .long	.LCFI4-.LFB3
-        .byte	0xe	/* DW_CFA_def_cfa_offset CFA = r4 + 8 = 8(%esp) */
-        .byte	0x8	/* .uleb128 0x8 */
-        .byte	0x85	/* DW_CFA_offset, column 0x5 %ebp at CFA + 2 * -4 */
-        .byte	0x2	/* .uleb128 0x2 */
-
-        .byte	0x4	/* DW_CFA_advance_loc4 */
-        .long	.LCFI5-.LCFI4
-        .byte	0xd	/* DW_CFA_def_cfa_register CFA = r5 = %ebp */
-        .byte	0x5	/* .uleb128 0x5 */
-
-        /* End of DW_CFA_xxx CFI instructions.  */
-        .align 4
-.LEFDE3:
-
-#if !FFI_NO_RAW_API
-
-.LSFDE4:
-        .long	.LEFDE4-.LASFDE4	/* FDE Length */
-.LASFDE4:
-        .long	.LASFDE4-.Lframe1	/* FDE CIE offset */
-#if defined __PIC__ && defined HAVE_AS_X86_PCREL
-        .long	.LFB4-.	/* FDE initial location */
-#else
-        .long	.LFB4
-#endif
-        .long	.LFE4-.LFB4	/* FDE address range */
-#ifdef __PIC__
-        .byte	0x0	/* .uleb128 0x0; Augmentation size */
-#endif
-        /* DW_CFA_xxx CFI instructions go here.  */
-
-        .byte	0x4	/* DW_CFA_advance_loc4 */
-        .long	.LCFI6-.LFB4
-        .byte	0xe	/* DW_CFA_def_cfa_offset CFA = r4 + 8 = 8(%esp) */
-        .byte	0x8	/* .uleb128 0x8 */
-        .byte	0x85	/* DW_CFA_offset, column 0x5 %ebp at CFA + 2 * -4 */
-        .byte	0x2	/* .uleb128 0x2 */
-
-        .byte	0x4	/* DW_CFA_advance_loc4 */
-        .long	.LCFI7-.LCFI6
-        .byte	0xd	/* DW_CFA_def_cfa_register CFA = r5 = %ebp */
-        .byte	0x5	/* .uleb128 0x5 */
-
-        .byte	0x4	/* DW_CFA_advance_loc4 */
-        .long	.LCFI8-.LCFI7
-        .byte	0x86	/* DW_CFA_offset, column 0x6 %esi at CFA + 3 * -4 */
-        .byte	0x3	/* .uleb128 0x3 */
-
-        /* End of DW_CFA_xxx CFI instructions.  */
-        .align 4
-.LEFDE4:
-
-#endif /* !FFI_NO_RAW_API */
-
-.LSFDE5:
-        .long	.LEFDE5-.LASFDE5	/* FDE Length */
-.LASFDE5:
-        .long	.LASFDE5-.Lframe1	/* FDE CIE offset */
-#if defined __PIC__ && defined HAVE_AS_X86_PCREL
-        .long	.LFB5-.	/* FDE initial location */
-#else
-        .long	.LFB5
-#endif
-        .long	.LFE5-.LFB5	/* FDE address range */
-#ifdef __PIC__
-        .byte	0x0	/* .uleb128 0x0; Augmentation size */
-#endif
-        /* DW_CFA_xxx CFI instructions go here.  */
-
-        .byte	0x4	/* DW_CFA_advance_loc4 */
-        .long	.LCFI9-.LFB5
-        .byte	0xe	/* DW_CFA_def_cfa_offset CFA = r4 + 8 = 8(%esp) */
-        .byte	0x8	/* .uleb128 0x8 */
-        .byte	0x85	/* DW_CFA_offset, column 0x5 %ebp at CFA + 2 * -4 */
-        .byte	0x2	/* .uleb128 0x2 */
-
-        .byte	0x4	/* DW_CFA_advance_loc4 */
-        .long	.LCFI10-.LCFI9
-        .byte	0xd	/* DW_CFA_def_cfa_register CFA = r5 = %ebp */
-        .byte	0x5	/* .uleb128 0x5 */
-
-        /* End of DW_CFA_xxx CFI instructions.  */
-        .align 4
-.LEFDE5:
-
-#endif /* !_MSC_VER */
-
-#if defined __ELF__ && defined __linux__
-        .section	.note.GNU-stack,"",@progbits
-#endif
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 03/13] x86: Force FFI_TYPE_LONGDOUBLE different from FFI_TYPE_DOUBLE
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
                   ` (8 preceding siblings ...)
  2014-11-07 15:31 ` [PATCH 13/13] x86: Work around two clang assembler bugs Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 06/13] x86: Rewrite ffi_call Richard Henderson
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

There are few abis that set double = long double.  Eliminate the
conditional compilation and let this code simply be unused there.
---
 src/x86/ffi.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/x86/ffi.c b/src/x86/ffi.c
index 98aa008..339ca89 100644
--- a/src/x86/ffi.c
+++ b/src/x86/ffi.c
@@ -34,6 +34,18 @@
 #include <ffi_common.h>
 #include <stdlib.h>
 
+/* Force FFI_TYPE_LONGDOUBLE to be different than FFI_TYPE_DOUBLE;
+   all further uses in this file will refer to the 80-bit type.  */
+#if FFI_TYPE_LONGDOUBLE != FFI_TYPE_DOUBLE
+# if FFI_TYPE_LONGDOUBLE != 4
+#  error FFI_TYPE_LONGDOUBLE out of date
+# endif
+#else
+# undef FFI_TYPE_LONGDOUBLE
+# define FFI_TYPE_LONGDOUBLE 4
+#endif
+
+
 /* ffi_prep_args is called by the assembly routine once stack space
    has been allocated for the function's arguments */
 
@@ -205,9 +217,7 @@ ffi_status ffi_prep_cif_machdep(ffi_cif *cif)
     case FFI_TYPE_SINT64:
     case FFI_TYPE_FLOAT:
     case FFI_TYPE_DOUBLE:
-#if FFI_TYPE_DOUBLE != FFI_TYPE_LONGDOUBLE
     case FFI_TYPE_LONGDOUBLE:
-#endif
       cif->flags = (unsigned) cif->rtype->type;
       break;
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 13/13] x86: Work around two clang assembler bugs
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
                   ` (7 preceding siblings ...)
  2014-11-07 15:31 ` [PATCH 04/13] x86: Convert to gas generated unwind info Richard Henderson
@ 2014-11-07 15:31 ` Richard Henderson
  2014-11-07 15:31 ` [PATCH 03/13] x86: Force FFI_TYPE_LONGDOUBLE different from FFI_TYPE_DOUBLE Richard Henderson
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 15:31 UTC (permalink / raw)
  To: libffi-discuss

http://llvm.org/bugs/show_bug.cgi?id=21500
http://llvm.org/bugs/show_bug.cgi?id=21501

Basically, we can't trust .macro at all, and .org doesn't work.
We have to omit the checking that .org gave, and hope that errors
are noticed when building with gcc+gas.
---
 src/x86/sysv.S | 125 +++++++++++++++++++++++++++++++--------------------------
 1 file changed, 67 insertions(+), 58 deletions(-)

diff --git a/src/x86/sysv.S b/src/x86/sysv.S
index e6a8c1e..b3ed87e 100644
--- a/src/x86/sysv.S
+++ b/src/x86/sysv.S
@@ -59,7 +59,12 @@
 /* This macro allows the safe creation of jump tables without an
    actual table.  The entry points into the table are all 8 bytes.
    The use of ORG asserts that we're at the correct location.  */
-#define E(X)      .align 8; .org 0b + X * 8
+/* ??? The clang assembler doesn't handle .org with symbolic expressions.  */
+#ifdef __clang__
+# define E(X)	.align 8
+#else
+# define E(X)	.align 8; .org 0b + X * 8
+#endif
 
 	.text
 	.align	16
@@ -194,70 +199,74 @@ ENDF(ffi_call_i386)
 
 #define closure_FS	(16 + 3*4 + 3*4 + 4)
 
-.macro	FFI_CLOSURE_SAVE_REGS
-	movl	%eax, 16+R_EAX*4(%esp)
-	movl	%edx, 16+R_EDX*4(%esp)
+#define FFI_CLOSURE_SAVE_REGS		\
+	movl	%eax, 16+R_EAX*4(%esp);	\
+	movl	%edx, 16+R_EDX*4(%esp);	\
 	movl	%ecx, 16+R_ECX*4(%esp)
-.endm
-
-.macro	FFI_CLOSURE_COPY_TRAMP_DATA chain
-	movl	FFI_TRAMPOLINE_SIZE(%eax), %edx		/* copy cif */
-	movl	FFI_TRAMPOLINE_SIZE+4(%eax), %ecx	/* copy fun */
-	movl	FFI_TRAMPOLINE_SIZE+8(%eax), %eax	/* copy user_data */
-	movl	%edx, 28(%esp)
-	movl	%ecx, 32(%esp)
+
+#define FFI_CLOSURE_COPY_TRAMP_DATA					\
+	movl	FFI_TRAMPOLINE_SIZE(%eax), %edx;	/* copy cif */	\
+	movl	FFI_TRAMPOLINE_SIZE+4(%eax), %ecx;	/* copy fun */	\
+	movl	FFI_TRAMPOLINE_SIZE+8(%eax), %eax;	/* copy user_data */ \
+	movl	%edx, 28(%esp);						\
+	movl	%ecx, 32(%esp);						\
 	movl	%eax, 36(%esp)
-.endm
 
-.macro	FFI_CLOSURE_CALL_INNER
-	movl	%esp, %ecx			/* load closure_data */
-	leal	closure_FS+4(%esp), %edx	/* load incoming stack */
-#ifdef __PIC__
-	movl	%ebx, 40(%esp)			/* save ebx */
-	cfi_rel_offset(%ebx, 40)
-	call	__x86.get_pc_thunk.bx		/* load got register */
-	addl	$C(_GLOBAL_OFFSET_TABLE_), %ebx
-#endif
-#if defined HAVE_HIDDEN_VISIBILITY_ATTRIBUTE || !defined __PIC__
-	call	ffi_closure_inner
-#else
-	call	ffi_closure_inner@PLT
-#endif
-.endm
 
-.macro	FFI_CLOSURE_MASK_AND_JUMP
-	andl	$X86_RET_TYPE_MASK, %eax
 #ifdef __PIC__
-	leal	0f@GOTOFF(%ebx, %eax, 8), %eax
-	movl	40(%esp), %ebx			/* restore ebx */
-	cfi_restore(%ebx)
+/* We're going to always load the got register here, even if .hidden says
+   we're going to avoid the PLT call.  We'll use the got register in
+   FFI_CLOSURE_MASK_AND_JUMP.  */
+# if defined HAVE_HIDDEN_VISIBILITY_ATTRIBUTE
+#  define PLT(X) X
+# else
+#  define PLT(X) X@PLT
+# endif
+# define FFI_CLOSURE_CALL_INNER						\
+	movl	%esp, %ecx;			/* load closure_data */	\
+	leal	closure_FS+4(%esp), %edx;	/* load incoming stack */ \
+	movl	%ebx, 40(%esp);			/* save ebx */		\
+	cfi_rel_offset(%ebx, 40);					\
+	call	__x86.get_pc_thunk.bx;		/* load got register */	\
+	addl	$C(_GLOBAL_OFFSET_TABLE_), %ebx;			\
+	call	PLT(ffi_closure_inner)
+#define FFI_CLOSURE_MASK_AND_JUMP					\
+	andl	$X86_RET_TYPE_MASK, %eax;				\
+	leal	0f@GOTOFF(%ebx, %eax, 8), %eax;				\
+	movl	40(%esp), %ebx;			/* restore ebx */	\
+	cfi_restore(%ebx);						\
+	jmp	*%eax
 #else
-	leal	0f(, %eax, 8), %eax
-#endif
+# define FFI_CLOSURE_CALL_INNER						\
+	movl	%esp, %ecx;			/* load closure_data */	\
+	leal	closure_FS+4(%esp), %edx;	/* load incoming stack */ \
+	call	ffi_closure_inner
+#define FFI_CLOSURE_MASK_AND_JUMP					\
+	andl	$X86_RET_TYPE_MASK, %eax;				\
+	leal	0f(, %eax, 8), %eax;					\
 	jmp	*%eax
-.endm
-
-.macro	FFI_GO_CLOSURE suffix, chain, t1, t2
-	.align	16
-	.globl	C(ffi_go_closure_\suffix)
-	FFI_HIDDEN(C(ffi_go_closure_\suffix))
-C(ffi_go_closure_\suffix):
-	cfi_startproc
-	subl	$closure_FS, %esp
-	cfi_adjust_cfa_offset(closure_FS)
-	FFI_CLOSURE_SAVE_REGS
-	movl	4(\chain), \t1		/* copy cif */
-	movl	8(\chain), \t2		/* copy fun */
-	movl	\t1, 28(%esp)
-	movl	\t2, 32(%esp)
-	movl	\chain, 36(%esp)	/* closure is user_data */
-	jmp	88f
-	cfi_endproc
-ENDF(C(ffi_go_closure_\suffix))
-.endm
+#endif /* __PIC__ */
 
-FFI_GO_CLOSURE EAX, %eax, %edx, %ecx
-FFI_GO_CLOSURE ECX, %ecx, %edx, %eax
+#define FFI_GO_CLOSURE(suffix, chain, t1, t2)				\
+	.align	16;							\
+	.globl	C(C1(ffi_go_closure_,suffix));				\
+	FFI_HIDDEN(C(C1(ffi_go_closure_,suffix)));			\
+C(C1(ffi_go_closure_,suffix)):						\
+	cfi_startproc;							\
+	subl	$closure_FS, %esp;					\
+	cfi_adjust_cfa_offset(closure_FS);				\
+	FFI_CLOSURE_SAVE_REGS;						\
+	movl	4(chain), t1;		/* copy cif */			\
+	movl	8(chain), t2;		/* copy fun */			\
+	movl	t1, 28(%esp);						\
+	movl	t2, 32(%esp);						\
+	movl	chain, 36(%esp);	/* closure is user_data */	\
+	jmp	88f;							\
+	cfi_endproc;							\
+ENDF(C(C1(ffi_go_closure_,suffix)))
+
+FFI_GO_CLOSURE(EAX, %eax, %edx, %ecx)
+FFI_GO_CLOSURE(ECX, %ecx, %edx, %eax)
 
 /* The closure entry points are reached from the ffi_closure trampoline.
    On entry, %eax contains the address of the ffi_closure.  */
@@ -337,7 +346,7 @@ E(X86_RET_UNUSED15)
 	cfi_endproc
 ENDF(C(ffi_closure_i386))
 
-FFI_GO_CLOSURE STDCALL, %ecx, %edx, %eax
+FFI_GO_CLOSURE(STDCALL, %ecx, %edx, %eax)
 
 /* For REGISTER, we have no available parameter registers, and so we
    enter here having pushed the closure onto the stack.  */
-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/13] Go closures for i686
  2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
                   ` (12 preceding siblings ...)
  2014-11-07 15:31 ` [PATCH 01/13] x86: Tidy ffi_abi Richard Henderson
@ 2014-11-07 16:09 ` Richard Henderson
  13 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-11-07 16:09 UTC (permalink / raw)
  To: Richard Henderson, libffi-discuss

On 11/07/2014 04:30 PM, Richard Henderson wrote:
> With the final patch, building with clang mostly works.  For some reason
> the unwind tests fail, despite the .eh_frame looking correct.  Freebsd
> continues to work if you use gcc+gas from ports.

On closer inspection, .eh_frame was not correct due to another clang bug:

  http://llvm.org/bugs/show_bug.cgi?id=21515

Using -no-integrated-as works around this, and would probably obviate
the final patch completely.


r~

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-11-07 16:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-07 15:30 [PATCH 00/13] Go closures for i686 Richard Henderson
2014-11-07 15:30 ` [PATCH 02/13] x86: Remove some conditional compilation Richard Henderson
2014-11-07 15:31 ` [PATCH 10/13] x86: Add support for Go closures Richard Henderson
2014-11-07 15:31 ` [PATCH 12/13] testsuite: Add two dg-do run markers Richard Henderson
2014-11-07 15:31 ` [PATCH 05/13] ffi_cif: Add cfa_escape Richard Henderson
2014-11-07 15:31 ` [PATCH 07/13] x86: Rewrite closures Richard Henderson
2014-11-07 15:31 ` [PATCH 09/13] x86: Add support for Complex Richard Henderson
2014-11-07 15:31 ` [PATCH 11/13] x86: Use win32 name mangling for fastcall functions Richard Henderson
2014-11-07 15:31 ` [PATCH 04/13] x86: Convert to gas generated unwind info Richard Henderson
2014-11-07 15:31 ` [PATCH 13/13] x86: Work around two clang assembler bugs Richard Henderson
2014-11-07 15:31 ` [PATCH 03/13] x86: Force FFI_TYPE_LONGDOUBLE different from FFI_TYPE_DOUBLE Richard Henderson
2014-11-07 15:31 ` [PATCH 06/13] x86: Rewrite ffi_call Richard Henderson
2014-11-07 15:31 ` [PATCH 08/13] testsuite: Fix return_complex2 vs excessive precision Richard Henderson
2014-11-07 15:31 ` [PATCH 01/13] x86: Tidy ffi_abi Richard Henderson
2014-11-07 16:09 ` [PATCH 00/13] Go closures for i686 Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).