public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [CFT] i386 sync functions for PR 39677
@ 2009-10-16 23:11 Richard Henderson
  2009-10-16 23:27 ` Joseph S. Myers
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Richard Henderson @ 2009-10-16 23:11 UTC (permalink / raw)
  To: GCC Patches; +Cc: ro, dannysmith, ubizjak

[-- Attachment #1: Type: text/plain, Size: 1522 bytes --]

Some simple experimentation shows that while adding the lfence insn has 
a performance impact on cpus that don't need it, it's approximately the 
same as the performance impact of an external function call.  Therefore, 
my approach for this PR is going to be:

  * If sse2 is supported, inline the lfence insn as needed.
    This takes care of all 64-bit code.  And I believe
    all Darwin, so we don't have to worry about the odd
    shared libgcc issues we have there.

  * Include 32-bit routines in the shared libgcc.
    When possible, use @gnu_indirect_function support
    to minimize the overhead of the cpu detection.

  * Given that we now have a central location for handling
    atomic synchronization, handle 80386 and 80486 via spinlock.
    This means that we'll no longer have to inject -march=i586
    for compiling some of our runtime libraries.

I've now written the external functions in assembly.  Primarily because 
some of the DImode routines use all 7 registers, which made -fpic 
compilation in the compiler very tricky.  Secondarily, it's much easier 
to implement the indirect function support directly in assembly.

First, I'd like to ask for extra sets of eyes to look over the code and 
make sure I haven't made any silly typos.

Second, I'd like to ask different port maintainers (cygwin and solaris 
particularly) to try to compile the code and report any portability 
problems.  Use any relevant combinations of:

   -fpic
   -DHAVE_GAS_CFI_DIRECTIVE
   -DHAVE_GNU_INDIRECT_FUNCTION


r~

[-- Attachment #2: sync.S --]
[-- Type: text/plain, Size: 12420 bytes --]

/* Synchronization functions for i386.

   Copyright (C) 2009 Free Software Foundation, Inc.

   This file is part of GCC.

   GCC is free software; you can redistribute it and/or modify it under
   the terms of the GNU General Public License as published by the Free
   Software Foundation; either version 3, or (at your option) any later
   version.

   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
   WARRANTY; without even the implied warranty of MERCHANTABILITY or
   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
   for more details.

   Under Section 7 of GPL version 3, you are granted additional
   permissions described in the GCC Runtime Library Exception, version
   3.1, as published by the Free Software Foundation.

   You should have received a copy of the GNU General Public License and
   a copy of the GCC Runtime Library Exception along with this program;
   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
   <http://www.gnu.org/licenses/>.  */

/* Note that we don't bother with a 64-bit version, as there we know
   that cmpxchg and lfence are both supported by the cpu.  */

/* Token concatenation macros.  */
#define CAT_(A,B)	A ## B
#define CAT(A,B)	CAT_(A,B)
#define CAT4_(A,B,C,D)	A ## B ## C ## D
#define CAT4(A,B,C,D)	CAT4_(A,B,C,D)

/* Redefine this to add a prefix to all symbols defined.  */
#define PREFIX
#define P(X)		CAT(PREFIX,X)

/* Redefine this to change the default alignment of subsequent functions.  */
#define ALIGN 4

/* Define the type of a symbol.  */
#ifdef __ELF__
# define TYPE(N,T)	.type P(N),T
#else
# define TYPE(N,T)
#endif

/* Define a new symbol, with appropriate type and alignment.  */
#define _DEFINE(N,T,A)	.p2align A; TYPE(N,T); P(N):

/* End the definition of a symbol.  */
#ifdef __ELF__
# define END(N)		.size P(N), .-P(N)
#else
# define END(N)
#endif

/* Redefine these to generate functions with different prefixes.  */
#define FUNC(N)		_DEFINE(N,@function,ALIGN)

/* Define an object name.  */
#define OBJECT(N,A)	_DEFINE(N,@object,A)

/* If gas cfi directives are supported, use them, otherwise do nothing.  */
#ifdef HAVE_GAS_CFI_DIRECTIVE
# define cfi_startproc			.cfi_startproc
# define cfi_endproc			.cfi_endproc
# define cfi_adjust_cfa_offset(O)	.cfi_adjust_cfa_offset O
# define cfi_rel_offset(R,O)		.cfi_rel_offset R, O
# define cfi_restore(R)			.cfi_restore R
#else
# define cfi_startproc
# define cfi_endproc
# define cfi_adjust_cfa_offset(O)
# define cfi_rel_offset(R,O)
# define cfi_restore(R)
#endif

/* Simplify generation of those cfi directives for the common cases.
   The PUSHS/POPS pair indicates the register should be saved for unwind;
   otherwise we simply adjust the CFA.  */
#define PUSH(R)			pushl R; cfi_adjust_cfa_offset(4)
#define PUSHS(R)		PUSH(R); cfi_rel_offset(R,0)
#define PUSHF			pushfl; cfi_adjust_cfa_offset(4)
#define POP(R)			popl R; cfi_adjust_cfa_offset(-4)
#define POPS(R)			POP(R); cfi_restore(R)
#define POPF			popfl; cfi_adjust_cfa_offset(-4)

/* Define a function name and do begin a new CFI proc.  */
#define FUNC_CFI(N)	FUNC(N); cfi_startproc

/* End a function (without CFI proc) or object.  */
#define END_CFI(N)	cfi_endproc; END(N)

/* Parameterize the PIC model for the target.  */
#ifdef __PIC__
# ifdef __ELF__
#  define PIC_INIT(REG)					\
	call	__i686.get_pc_thunk.REG;		\
	addl	$_GLOBAL_OFFSET_TABLE_, %CAT(e,REG)
#  define PIC_ADD(P,D)		addl P,D
#  define PIC_OFFSET(S)		S@GOTOFF
#  define PIC_ADDRESS(S,P)	S@GOTOFF(P)
# else
#  error "Unknown PIC model"
# endif
#else
# define PIC_INIT(REG)
# define PIC_ADD(P,D)
# define PIC_OFFSET(S)		S
# define PIC_ADDRESS(S,P)	S
#endif
\f
/* This variable caches the (relevant) properties of the currently
   running cpu.  It has the following values:
	-1	Uninitialized

	0	An LFENCE instruction required after any sync
		function with acquire semantics.  Given that
		lfence is an SSE2 insn, we also assume cmpxchg8b.

	1	No cmpxchg support.  Note that we're not interested
		in the 80486 XADD instruction, which we use.  If we
		have to use a spinlock for any of the routines for
		a data size, we have to use a spinlock for all of
		the routines for a data size.  Therefore XADD by
		itself isn't interesting.

	2	CMPXCHG supported
	3	CMPXCHG8B supported
*/

	.data
OBJECT(cpu_prop_index,2)
	.long	-1
END(cpu_prop_index)

	.text
/* Detect the properties of the currently running cpu, according to
   the values listed above.  Preserves all registers except EAX, which
   holds the return value.  */

FUNC_CFI(detect_cpu)
	PUSHS(%ebx)
	PUSH(%ecx)
	PUSH(%edx)
	PUSHS(%esi)
	PUSHS(%edi)

	/* Determine 386 vs 486 and presence of cpuid all at once.  */
	PUSHF
	PUSHF
	POP(%eax)
	movl	%eax, %edx
	xorl	$0x00200000, %eax
	PUSH(%eax)
	POPF
	PUSHF
	POP(%eax)
	POPF
	xorl	%edx, %eax

	/* If we weren't able to toggle the ID bit in the flags,
	   we don't have the cpuid instruction, and also don't
	   have the cmpxchg instruction.  */
	movl	$1, %esi		/* do not have cmpxchg */
	jz	.Legress

	movl	$2, %esi		/* have cmpxchg */

	xorl	%eax, %eax
	cpuid

	/* Check for AuthenticAMD.  At the end, %edi is zero for matched.  */
	xorl	$0x68747541, %ebx
	xorl	$0x444D4163, %ecx
	xorl	$0x69746E65, %edx
	movl	%ebx, %edi
	orl	%ecx, %edi
	orl	%edx, %edi

	/* If max_cpuid == 0, we can check no further.  */
	testl	%eax, %eax
	jz	.Legress

	movl	$1, %eax
	cpuid

	/* Check for cmpxchg8b support.  The CX8 bit is 1<<8 in EDX.  */
	shr	$8, %edx
	andl	$1, %edx
	addl	%edx, %esi		/* incr iff cmpxchg8b */

	/* Check for AMD cpu.  */
	testl	%edi, %edi
	jnz	.Legress

	/* Extract family (%edx) and model (%ecx).  */
	movl	%eax, %edx
	movl	%eax, %ecx
	shl	$4, %edx
	shl	$8, %ecx
	andl	$0xf, %edx
	andl	$0xf, %ecx
	cmpl	$0xf, %edx		/* if family=15... */
	jne	2f
	shl	$12, %eax		/* ... include extended fields.  */
	movl	%eax, %ebx
	andl	$0xf0, %ebx
	addl	%ebx, %ecx
	movzbl	%ah, %eax
	addl	%eax, %edx
2:
	/* Opteron Rev E has a bug in which on very rare occasions
           a locked instruction doesn't act as a read-acquire barrier
           if followed by a non-locked read-modify-write instruction.
           Rev F has this bug in pre-release versions, but not in
           versions released to customers, so we test only for Rev E,
           which is family 15, model 32..63 inclusive.  */
	cmpl	$15, %edx
	jne	.Legress
	cmpl	$32, %ecx
	jb	.Legress
	cmpl	$63, %ecx
	ja	.Legress
	xorl	%esi, %esi		/* need lfence */

.Legress:
	movl	%esi, %eax
	POPS(%edi)
	POPS(%esi)
	POP(%edx)
	POP(%ecx)
	POPS(%ebx)
	ret
END_CFI(detect_cpu)
\f
	/* Note that this CFI proc covers all of the ifuncs.  */
	.p2align ALIGN
	cfi_startproc

#if defined(HAVE_GNU_INDIRECT_FUNCTION) && defined(__PIC__)
/* If we have indirect function support in the shared libgcc, we wish
   to define the entry point symbol such that it returns the address
   of the function we wish to execute for this cpu.  The result of the
   indirect function is stored in the PLT so that future invocations
   proceed directly to the target function.

   Each entry point defines a 4-entry table according to the values
   for cpu_prop_index and we use a common routine to load the value.  */

FUNC(common_indirect_function)
	PIC_INIT(cx)
	PIC_ADD(%ecx, %edx)
	movl	PIC_ADDRESS(cpu_prop_index,%ecx), %eax
	testl	%eax, %eax
	jns	1f
	call	detect_cpu
	movl	%eax, PIC_ADDRESS(cpu_prop_index,%ecx)
1:	movl	(%edx,%eax,4), %eax
	PIC_ADD(%ecx, %eax)
	ret
END(common_indirect_function)

#define _IFUNC(N,P2,P3)						\
	_DEFINE(CAT(__,N),@gnu_indirect_function,3);		\
	movl	$PIC_OFFSET(CAT(t_,N)), %edx;			\
	jmp	P(common_indirect_function);			\
	END(CAT(__,N));						\
	.globl	CAT(__,N);					\
	.section .rodata;					\
	OBJECT(CAT(t_,N),2);					\
	.long	PIC_OFFSET(CAT(l_,N));				\
	.long	PIC_OFFSET(CAT(o_,N));				\
	.long	PIC_OFFSET(CAT(P2,N));				\
	.long	PIC_OFFSET(CAT(P3,N));				\
	END(CAT(t_,N));						\
	.text

#define IFUNC(N)	_IFUNC(N,n_,n_)
#define IFUNC8(N)	_IFUNC(N,o_,n_)

#else
/* If we don't have (or aren't using) indirect function support, define
   functions that dispatch to the correct implementation function.  */
/* ??? The question is, what's the best method for the branch predictors?
   My guess is that indirect branches are, in general, hardest.  Therefore
   separate the 3 with compares and use direct branches.  Aid the Pentium4
   static branch predictor by indicating that the "normal" function is the
   one we expect to execute.  */

#define _IFUNC(N,CX_IDX)					\
	.globl	CAT(__,N);					\
	FUNC(CAT(__,N));					\
	PIC_INIT(cx);						\
	movl	PIC_ADDRESS(cpu_prop_index,%ecx), %eax;		\
	testl	%eax, %eax;					\
	jns,pt	1f;						\
	call	detect_cpu;					\
	movl	%eax, PIC_ADDRESS(cpu_prop_index,%ecx);		\
1:	cmpl	$CX_IDX, %eax;					\
	jge,pt	CAT(n_,N);					\
	testl	%eax, %eax;					\
	jz	CAT(l_,N);					\
	jmp	CAT(o_,N);					\
	END(CAT(__,N))

#define IFUNC(N)	_IFUNC(N,2)
#define IFUNC8(N)	_IFUNC(N,3)

#endif /* HAVE_GNU_INDIRECT_FUNCTION */

	IFUNC(sync_val_compare_and_swap_1)
	IFUNC(sync_val_compare_and_swap_2)
	IFUNC(sync_val_compare_and_swap_4)
	IFUNC8(sync_val_compare_and_swap_8)

	IFUNC(sync_bool_compare_and_swap_1)
	IFUNC(sync_bool_compare_and_swap_2)
	IFUNC(sync_bool_compare_and_swap_4)
	IFUNC8(sync_bool_compare_and_swap_8)

	IFUNC(sync_fetch_and_add_1)
	IFUNC(sync_fetch_and_add_2)
	IFUNC(sync_fetch_and_add_4)
	IFUNC8(sync_fetch_and_add_8)

	IFUNC(sync_add_and_fetch_1)
	IFUNC(sync_add_and_fetch_2)
	IFUNC(sync_add_and_fetch_4)
	IFUNC8(sync_add_and_fetch_8)

	IFUNC(sync_fetch_and_sub_1)
	IFUNC(sync_fetch_and_sub_2)
	IFUNC(sync_fetch_and_sub_4)
	IFUNC8(sync_fetch_and_sub_8)

	IFUNC(sync_sub_and_fetch_1)
	IFUNC(sync_sub_and_fetch_2)
	IFUNC(sync_sub_and_fetch_4)
	IFUNC8(sync_sub_and_fetch_8)

	IFUNC(sync_fetch_and_or_1)
	IFUNC(sync_fetch_and_or_2)
	IFUNC(sync_fetch_and_or_4)
	IFUNC8(sync_fetch_and_or_8)

	IFUNC(sync_or_and_fetch_1)
	IFUNC(sync_or_and_fetch_2)
	IFUNC(sync_or_and_fetch_4)
	IFUNC8(sync_or_and_fetch_8)

	IFUNC(sync_fetch_and_and_1)
	IFUNC(sync_fetch_and_and_2)
	IFUNC(sync_fetch_and_and_4)
	IFUNC8(sync_fetch_and_and_8)

	IFUNC(sync_and_and_fetch_1)
	IFUNC(sync_and_and_fetch_2)
	IFUNC(sync_and_and_fetch_4)
	IFUNC8(sync_and_and_fetch_8)

	IFUNC(sync_fetch_and_nand_1)
	IFUNC(sync_fetch_and_nand_2)
	IFUNC(sync_fetch_and_nand_4)
	IFUNC8(sync_fetch_and_nand_8)

	IFUNC(sync_nand_and_fetch_1)
	IFUNC(sync_nand_and_fetch_2)
	IFUNC(sync_nand_and_fetch_4)
	IFUNC8(sync_nand_and_fetch_8)

	cfi_endproc
\f
/* The actual body of the functions are implemented in sync.inc.
   Include it 3 times with different parameters to generate the
   "normal" (i.e. cmpxchg), "lfence", and "old" (i.e. no cmpxchg)
   versions of the code.  */

#undef	ALIGN
#undef	PREFIX
#define	ALIGN	4
#define	PREFIX	n_
#define	LFENCE
#include "sync.inc"

#undef	PREFIX
#undef	LFENCE
#define	PREFIX	l_
#define	LFENCE	lfence
#include "sync.inc"

#undef	ALIGN
#undef	PREFIX
#undef	LFENCE
#define ALIGN 2
#define	PREFIX	o_
#define SPINLOCK 1

	.local	spinlock
	.comm	spinlock,4,4

/* Common code for the beginning and end of any spinlock protected function. */

#ifdef __PIC__
#define ARG(N)	N+8(%esp)	/* Skip saved ebx and return address.  */
#define SPINLOCK_LOCK		PUSHS(%ebx); call P(spinlock_lock)
FUNC(spinlock_lock)
	/* Note that this startproc covers both lock and unlock functions.  */
	cfi_startproc
	PIC_INIT(bx)
1:	lock
	btsl	$0, PIC_ADDRESS(spinlock,%ebx)
	jc	1b
	ret
END(spinlock_lock)

#define SPINLOCK_UNLOCK_AND_RET		jmp P(spinlock_unlock)
FUNC(spinlock_unlock)
	cfi_adjust_cfa_offset(4)
	cfi_rel_offset(%ebx,0)
	xorl	%ecx, %ecx
	movl	%ecx, PIC_ADDRESS(spinlock,%ebx)
	POPS(%ebx)
	ret
	cfi_endproc
END(spinlock_unlock)
#else
#define ARG(N)	N+4(%esp)	/* Skip return address.  */
#define SPINLOCK_LOCK				\
	1: lock; btsl $0, spinlock; jc 1b
#define SPINLOCK_UNLOCK_AND_RET			\
	xorl	%ecx,%ecx; movl	%ecx,spinlock; ret
#endif /* PIC */

#include "sync.inc"
\f
#ifdef __ELF__
#ifdef __PIC__
	.section .text.__i686.get_pc_thunk.bx,"axG",@progbits,__i686.get_pc_thunk.bx,comdat
.globl __i686.get_pc_thunk.bx
	.hidden	__i686.get_pc_thunk.bx
	.type	__i686.get_pc_thunk.bx, @function
__i686.get_pc_thunk.bx:
	movl	(%esp), %ebx
	ret
	.section .text.__i686.get_pc_thunk.cx,"axG",@progbits,__i686.get_pc_thunk.cx,comdat
.globl __i686.get_pc_thunk.cx
	.hidden	__i686.get_pc_thunk.cx
	.type	__i686.get_pc_thunk.cx, @function
__i686.get_pc_thunk.cx:
	movl	(%esp), %ecx
	ret
#endif
	.section	.note.GNU-stack,"",@progbits
#endif

[-- Attachment #3: sync.inc --]
[-- Type: text/plain, Size: 12187 bytes --]

/* This file is logically a part of sync.S; it is included 3 times.  */

FUNC_CFI(sync_val_compare_and_swap_1)
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	movzbl	(%ecx), %eax
	cmpb	%al, ARG(4)
	jne	2f
	movl	ARG(8), %edx
	movb	%dl, (%ecx)
2:	SPINLOCK_UNLOCK_AND_RET
#else
	movl	4(%esp), %ecx
	movzbl	8(%esp), %eax
	movl	12(%esp), %edx
	lock; cmpxchgb %dl, (%ecx)
	LFENCE
	ret
#endif
END_CFI(sync_val_compare_and_swap_1)

FUNC_CFI(sync_val_compare_and_swap_2)
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	movzwl	(%ecx), %eax
	cmpw	%ax, ARG(4)
	jne	2f
	movl	ARG(8), %edx
	movw	%dx, (%ecx)
2:	SPINLOCK_UNLOCK_AND_RET
#else
	movl	4(%esp), %ecx
	movzwl	8(%esp), %eax
	movl	12(%esp), %edx
	lock; cmpxchgw %dx, (%ecx)
	LFENCE
	ret
#endif
END_CFI(sync_val_compare_and_swap_2)

FUNC_CFI(sync_val_compare_and_swap_4)
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	movl	(%ecx), %eax
	cmpl	%eax, ARG(4)
	jne	2f
	movl	ARG(8), %edx
	movl	%edx, (%ecx)
2:	SPINLOCK_UNLOCK_AND_RET
#else
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	movl	12(%esp), %edx
	lock; cmpxchgl %edx, (%ecx)
	LFENCE
	ret
#endif
END_CFI(sync_val_compare_and_swap_4)

FUNC_CFI(sync_val_compare_and_swap_8)
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	movl	(%ecx), %eax
	movl	4(%ecx), %edx
	cmpl	%eax, ARG(4)
	jne	2f
	cmpl	%edx, ARG(8)
	jne	2f
	PUSHS(%esi)
	movl	ARG(12), %esi
	movl	%esi, (%ecx)
	movl	ARG(16), %esi
	movl	%esi, 4(%ecx)
	POPS(%esi)
2:	SPINLOCK_UNLOCK_AND_RET
#else
	PUSHS(%ebx)
	PUSHS(%esi)
	movl	12(%esp), %esi
	movl	16(%esp), %eax
	movl	20(%esp), %edx
	movl	24(%esp), %ebx
	movl	28(%esp), %ecx
	lock; cmpxchg8b (%esi)
	LFENCE
	POPS(%esi)
	POPS(%ebx)
	ret
#endif
END_CFI(sync_val_compare_and_swap_8)
\f
FUNC_CFI(sync_bool_compare_and_swap_1)
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	movl	ARG(4), %eax
	cmpb	%al, (%ecx)
	jne	2f
	movl	ARG(8), %edx
	movb	%dl, (%ecx)
2:	sete	%al
	movzbl	%al, %eax
	SPINLOCK_UNLOCK_AND_RET
#else
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	movl	12(%esp), %edx
	lock; cmpxchgb %dl, (%ecx)
	LFENCE
	setz	%al
	movzbl	%al, %eax
	ret
#endif
END_CFI(sync_bool_compare_and_swap_1)

FUNC_CFI(sync_bool_compare_and_swap_2)
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	movl	ARG(4), %eax
	cmpw	%ax, (%ecx)
	jne	2f
	movl	ARG(8), %edx
	movw	%dx, (%ecx)
2:	sete	%al
	movzbl	%al, %eax
	SPINLOCK_UNLOCK_AND_RET
#else
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	movl	12(%esp), %edx
	lock; cmpxchgw %dx, (%ecx)
	LFENCE
	setz	%al
	movzbl	%al, %eax
	ret
#endif
END_CFI(sync_bool_compare_and_swap_2)

FUNC_CFI(sync_bool_compare_and_swap_4)
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	movl	ARG(4), %eax
	cmpl	%eax, (%ecx)
	jne	2f
	movl	ARG(8), %edx
	movl	%edx, (%ecx)
2:	sete	%al
	movzbl	%al, %eax
	SPINLOCK_UNLOCK_AND_RET
#else
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	movl	12(%esp), %edx
	lock; cmpxchgl %edx,(%ecx)
	LFENCE
	setz	%al
	movzbl	%al,%eax
	ret
#endif
END_CFI(sync_bool_compare_and_swap_4)

FUNC_CFI(sync_bool_compare_and_swap_8)
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	movl	ARG(4), %eax
	movl	ARG(8), %edx
	cmpl	%eax, (%ecx)
	jne	2f
	cmpl	%edx, 4(%ecx)
	jne	2f
	movl	ARG(12), %eax
	movl	ARG(16), %edx
	movl	%eax, (%ecx)
	movl	%edx, 4(%ecx)
	movl	$1, %eax
	SPINLOCK_UNLOCK_AND_RET
2:	xorl	%eax, %eax
	SPINLOCK_UNLOCK_AND_RET
#else
	PUSHS(%ebx)
	PUSHS(%esi)
	movl	12(%esp), %esi
	movl	16(%esp), %eax
	movl	20(%esp), %edx
	movl	24(%esp), %ebx
	movl	28(%esp), %ecx
	lock; cmpxchg8b (%esi)
	LFENCE
	setz	%al
	movzbl	%al,%eax
	POPS(%esi)
	POPS(%ebx)
	ret
#endif
END_CFI(sync_bool_compare_and_swap_8)
\f
#ifndef SPINLOCK
/* This CFI covers all of the add and subtract functions.  */
	.p2align ALIGN
	cfi_startproc

FUNC(sync_fetch_and_add_1)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	lock; xaddb %al, (%ecx)
	LFENCE
	movzbl	%al, %eax
	ret
END(sync_fetch_and_add_1)

FUNC(sync_fetch_and_add_2)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	lock; xaddw %ax, (%ecx)
	LFENCE
	movzwl	%ax, %eax
	ret
END(sync_fetch_and_add_2)

FUNC(sync_fetch_and_add_4)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	lock; xaddl %eax, (%ecx)
	LFENCE
	ret
END(sync_fetch_and_add_4)

FUNC(sync_add_and_fetch_1)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	movl	%eax, %edx
	lock; xaddb %dl, (%ecx)
	LFENCE
	addb	%dl, %al
	movzbl	%al, %eax
	ret
END(sync_add_and_fetch_1)

FUNC(sync_add_and_fetch_2)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	movl	%eax, %edx
	lock; xaddw %dx, (%ecx)
	LFENCE
	addw	%dx, %ax
	movzwl	%ax, %eax
	ret
END(sync_add_and_fetch_2)

FUNC(sync_add_and_fetch_4)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	movl	%eax, %edx
	lock; xaddl %edx, (%ecx)
	LFENCE
	addl	%edx, %eax
	ret
END(sync_add_and_fetch_4)

FUNC(sync_fetch_and_sub_1)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	negl	%eax
	lock; xaddb %al, (%ecx)
	LFENCE
	movzbl	%al, %eax
	ret
END(sync_fetch_and_sub_1)

FUNC(sync_fetch_and_sub_2)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	negl	%eax
	lock; xaddw %ax, (%ecx)
	LFENCE
	movzwl	%ax, %eax
	ret
END(sync_fetch_and_sub_2)

FUNC(sync_fetch_and_sub_4)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	negl	%eax
	lock; xaddl %eax, (%ecx)
	LFENCE
	ret
END(sync_fetch_and_sub_4)

FUNC(sync_sub_and_fetch_1)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	negl	%eax
	movl	%eax, %edx
	lock; xaddb %dl, (%ecx)
	LFENCE
	addb	%dl, %al
	movzbl	%al, %eax
	ret
END(sync_sub_and_fetch_1)

FUNC(sync_sub_and_fetch_2)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	negl	%eax
	movl	%eax, %edx
	lock; xaddw %dx, (%ecx)
	LFENCE
	addw	%dx, %ax
	movzwl	%ax, %eax
	ret
END(sync_sub_and_fetch_2)

FUNC(sync_sub_and_fetch_4)
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	negl	%eax
	movl	%eax, %edx
	lock; xaddl %edx, (%ecx)
	LFENCE
	addl	%edx, %eax
	ret
END(sync_sub_and_fetch_4)

	cfi_endproc
#endif /* SPINLOCK */
\f
#define OR(S,D)		or S,D
#define AND(S,D)	and S,D
#define NAND(S,D)	not D; and S,D
#define ADD(S,D)	add S,D
#define ADC(S,D)	adc S,D
#define SUB(S,D)	sub S,D
#define SBB(S,D)	sbb S,D
#define NIL(S,D)
#define MOV(S,D)	mov S,D
#define MOVZX(S,D)	movzx S,D

#ifdef SPINLOCK
#define _SYNC_FETCH_AND_OP(N, S, sax, sbx, sdx, OP, LDEXT, EXT)	\
	FUNC_CFI(CAT4(sync_fetch_and_,N,_,S));			\
	SPINLOCK_LOCK;						\
	movl	ARG(0), %ecx;					\
	movl	ARG(4), %edx;					\
	LDEXT((%ecx),%eax);					\
	OP(%eax, %edx);						\
	mov	sdx, (%ecx);					\
	SPINLOCK_UNLOCK_AND_RET;				\
	END_CFI(CAT4(sync_fetch_and_,N,_,S))
#else
#define _SYNC_FETCH_AND_OP(N, S, sax, sbx, sdx, OP, LDEXT, EXT)	\
	FUNC_CFI(CAT4(sync_fetch_and_,N,_,S));			\
	PUSHS(%ebx);						\
	movl	8(%esp), %ecx;					\
	movl	12(%esp), %ebx;					\
	mov	(%ecx), sax;					\
1:	mov	sax, sdx;					\
	OP(sbx, sdx);						\
	lock; cmpxchg sdx, (%ecx);				\
	jnz	1b;						\
	LFENCE;							\
	EXT(sax, %eax);						\
	POPS(%ebx);						\
	ret;							\
	END_CFI(CAT4(sync_fetch_and_,N,_,S))
#endif /* SPINLOCK */

#define SYNC_FETCH_AND_OP_1(N,OP) \
	_SYNC_FETCH_AND_OP(N, 1, %al, %bl, %dl, OP, MOVZX, MOVZX)

#define SYNC_FETCH_AND_OP_2(N,OP) \
	_SYNC_FETCH_AND_OP(N, 2, %ax, %bx, %dx, OP, MOVZX, MOVZX)

#define SYNC_FETCH_AND_OP_4(N, OP) \
	_SYNC_FETCH_AND_OP(N, 4, %eax, %ebx, %edx, OP, MOV, NIL)

#ifdef SPINLOCK
#define SYNC_FETCH_AND_OP_8(N, OPLO, OPHI)			\
	FUNC_CFI(CAT4(sync_fetch_and_,N,_,8));			\
	SPINLOCK_LOCK;						\
	PUSHS(%esi);						\
	PUSHS(%edi);						\
	/* Note that the ARG macro doesn't include the two	\
	   pushes that we do above, so need to bias by 8.  */	\
	movl	ARG(8), %ecx;					\
	movl	ARG(12), %esi;					\
	movl	ARG(16), %edi;					\
	movl	(%ecx), %eax;					\
	movl	4(%ecx), %edx;					\
	OPLO(%eax, %esi);					\
	OPHI(%edx, %edi);					\
	movl	%esi, (%ecx);					\
	movl	%edi, 4(%ecx);					\
	POPS(%edi);						\
	POPS(%esi);						\
	SPINLOCK_UNLOCK_AND_RET;				\
	END_CFI(CAT4(sync_fetch_and_,N,_,8))
#else
#define SYNC_FETCH_AND_OP_8(N, OPLO, OPHI)			\
	FUNC_CFI(CAT4(sync_fetch_and_,N,_,8));			\
	PUSHS(%ebx);						\
	PUSHS(%esi);						\
	PUSHS(%edi);						\
	PUSHS(%ebp);						\
	movl	20(%esp), %esi;					\
	movl	24(%esp), %edi;					\
	movl	28(%esp), %ebp;					\
	movl	(%esi), %eax;					\
	movl	4(%esi), %edx;					\
1:	movl	%eax, %ebx;					\
	movl	%edx, %ecx;					\
	OPLO(%edi, %ebx);					\
	OPHI(%ebp, %ecx);					\
	lock; cmpxchg8b (%esi);					\
	jnz	1b;						\
	LFENCE;							\
	POPS(%ebp);						\
	POPS(%edi);						\
	POPS(%esi);						\
	POPS(%ebx);						\
	ret;							\
	END_CFI(CAT4(sync_fetch_and_,N,_,8))
#endif /* SPINLOCK */

#ifdef SPINLOCK
#define _SYNC_OP_AND_FETCH(N, S, sax, sbx, sdx, OP, LDEXT, EXT)	\
	FUNC_CFI(CAT4(sync_,N,_and_fetch_,S));			\
	SPINLOCK_LOCK;						\
	movl	ARG(0), %ecx;					\
	movl	ARG(4), %eax;					\
	OP((%ecx), sax);					\
	mov	sax, (%ecx);					\
	EXT(sax, %eax);						\
	SPINLOCK_UNLOCK_AND_RET;				\
	END_CFI(CAT4(sync_,N,_and_fetch_,S))
#else
#define _SYNC_OP_AND_FETCH(N, S, sax, sbx, sdx, OP, LDEXT, EXT)	\
	FUNC_CFI(CAT4(sync_,N,_and_fetch_,S));			\
	PUSHS(%ebx);						\
	movl	8(%esp), %ecx;					\
	movl	12(%esp), %ebx;					\
	mov	(%ecx), sax;					\
1:	mov	sax, sdx;					\
	OP(sbx, sdx);						\
	lock; cmpxchg sdx, (%ecx);				\
	jnz	1b;						\
	LFENCE;							\
	LDEXT(sdx, %eax);					\
	POPS(%ebx);						\
	ret;							\
	END_CFI(CAT4(sync_,N,_and_fetch_,S))
#endif /* SPINLOCK */

#define SYNC_OP_AND_FETCH_1(N,OP) \
	_SYNC_OP_AND_FETCH(N, 1, %al, %bl, %dl, OP, MOVZX, MOVZX)

#define SYNC_OP_AND_FETCH_2(N,OP) \
	_SYNC_OP_AND_FETCH(N, 2, %ax, %bx, %dx, OP, MOVZX, MOVZX)

#define SYNC_OP_AND_FETCH_4(N, OP) \
	_SYNC_OP_AND_FETCH(N, 4, %eax, %ebx, %edx, OP, MOV, NIL)


#ifdef SPINLOCK
#define SYNC_OP_AND_FETCH_8(N, OPLO, OPHI)			\
	FUNC_CFI(CAT4(sync_,N,_and_fetch_,8));			\
	SPINLOCK_LOCK;						\
	movl	ARG(0), %ecx;					\
	movl	ARG(4), %eax;					\
	movl	ARG(8), %edx;					\
	OPLO((%ecx), %eax);					\
	OPHI(4(%ecx), %edx);					\
	movl	%eax, (%ecx);					\
	movl	%edx, 4(%ecx);					\
	SPINLOCK_UNLOCK_AND_RET;				\
	END_CFI(CAT4(sync_,N,_and_fetch_,8))
#else
#define SYNC_OP_AND_FETCH_8(N, OPLO, OPHI)			\
	FUNC_CFI(CAT4(sync_,N,_and_fetch_,8));			\
	PUSHS(%ebx);						\
	PUSHS(%esi);						\
	PUSHS(%edi);						\
	PUSHS(%ebp);						\
	movl	20(%esp), %esi;					\
	movl	24(%esp), %edi;					\
	movl	28(%esp), %ebp;					\
	movl	(%esi), %eax;					\
	movl	4(%esi), %edx;					\
1:	movl	%eax, %ebx;					\
	movl	%edx, %ecx;					\
	OPLO(%edi, %ebx);					\
	OPHI(%ebp, %ecx);					\
	lock; cmpxchg8b (%esi);					\
	jnz	1b;						\
	LFENCE;							\
	movl	%ebx, %eax;					\
	movl	%ecx, %edx;					\
	POPS(%ebp);						\
	POPS(%edi);						\
	POPS(%esi);						\
	POPS(%ebx);						\
	ret;							\
	END_CFI(CAT4(sync_,N,_and_fetch_,8))
#endif /* SPINLOCK */

#ifdef SPINLOCK
	SYNC_FETCH_AND_OP_1(add, ADD)
	SYNC_FETCH_AND_OP_2(add, ADD)
	SYNC_FETCH_AND_OP_4(add, ADD)
	SYNC_FETCH_AND_OP_1(sub, SUB)
	SYNC_FETCH_AND_OP_2(sub, SUB)
	SYNC_FETCH_AND_OP_4(sub, SUB)
#endif
	SYNC_FETCH_AND_OP_8(add, ADD, ADC)
	SYNC_FETCH_AND_OP_8(sub, SUB, SBB)

	SYNC_FETCH_AND_OP_1(or, OR)
	SYNC_FETCH_AND_OP_2(or, OR)
	SYNC_FETCH_AND_OP_4(or, OR)
	SYNC_FETCH_AND_OP_8(or, OR, OR)

	SYNC_FETCH_AND_OP_1(and, AND)
	SYNC_FETCH_AND_OP_2(and, AND)
	SYNC_FETCH_AND_OP_4(and, AND)
	SYNC_FETCH_AND_OP_8(and, AND, AND)

	SYNC_FETCH_AND_OP_1(nand, NAND)
	SYNC_FETCH_AND_OP_2(nand, NAND)
	SYNC_FETCH_AND_OP_4(nand, NAND)
	SYNC_FETCH_AND_OP_8(nand, NAND, NAND)

#ifdef SPINLOCK
	SYNC_OP_AND_FETCH_1(add, ADD)
	SYNC_OP_AND_FETCH_2(add, ADD)
	SYNC_OP_AND_FETCH_4(add, ADD)
	SYNC_OP_AND_FETCH_1(sub, SUB)
	SYNC_OP_AND_FETCH_2(sub, SUB)
	SYNC_OP_AND_FETCH_4(sub, SUB)
#endif
	SYNC_OP_AND_FETCH_8(add, ADD, ADC)
	SYNC_OP_AND_FETCH_8(sub, SUB, SBB)

	SYNC_OP_AND_FETCH_1(or, OR)
	SYNC_OP_AND_FETCH_2(or, OR)
	SYNC_OP_AND_FETCH_4(or, OR)
	SYNC_OP_AND_FETCH_8(or, OR, OR)

	SYNC_OP_AND_FETCH_1(and, AND)
	SYNC_OP_AND_FETCH_2(and, AND)
	SYNC_OP_AND_FETCH_4(and, AND)
	SYNC_OP_AND_FETCH_8(and, AND, AND)

	SYNC_OP_AND_FETCH_1(nand, NAND)
	SYNC_OP_AND_FETCH_2(nand, NAND)
	SYNC_OP_AND_FETCH_4(nand, NAND)
	SYNC_OP_AND_FETCH_8(nand, NAND, NAND)

/* Undef all macros defined herein for reinclude.  */
#undef OR
#undef AND
#undef NAND
#undef ADD
#undef ADC
#undef SUB
#undef NIL
#undef MOV
#undef MOVZX
#undef _SYNC_FETCH_AND_OP
#undef SYNC_FETCH_AND_OP_1
#undef SYNC_FETCH_AND_OP_2
#undef SYNC_FETCH_AND_OP_4
#undef SYNC_FETCH_AND_OP_8
#undef _SYNC_OP_AND_FETCH
#undef SYNC_OP_AND_FETCH_1
#undef SYNC_OP_AND_FETCH_2
#undef SYNC_OP_AND_FETCH_4
#undef SYNC_OP_AND_FETCH_8

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [CFT] i386 sync functions for PR 39677
  2009-10-16 23:11 [CFT] i386 sync functions for PR 39677 Richard Henderson
@ 2009-10-16 23:27 ` Joseph S. Myers
  2009-10-16 23:48   ` Richard Henderson
  2009-10-19 14:24 ` Rainer Orth
  2009-10-24  6:51 ` Danny Smith
  2 siblings, 1 reply; 11+ messages in thread
From: Joseph S. Myers @ 2009-10-16 23:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, ro, dannysmith, ubizjak

The uses of __i686 in sync.S look likely to break tools configured 
--with-arch=i686 (when __i686 is a macro defined to 1).  Building glibc 
with such a compiler is notoriously broken (there have been many bug 
reports and patches over the years, from 
<http://sourceware.org/ml/libc-alpha/2002-10/msg00156.html> through to 
<http://sourceware.org/ml/libc-alpha/2009-07/msg00072.html> with many 
inbetween, but none of the patches have been applied), but it's worked to 
build GCC that way.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [CFT] i386 sync functions for PR 39677
  2009-10-16 23:27 ` Joseph S. Myers
@ 2009-10-16 23:48   ` Richard Henderson
  2009-10-17 18:14     ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2009-10-16 23:48 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Henderson, GCC Patches, ro, dannysmith, ubizjak

[-- Attachment #1: Type: text/plain, Size: 309 bytes --]

On 10/16/2009 04:11 PM, Joseph S. Myers wrote:
> The uses of __i686 in sync.S look likely to break tools configured
> --with-arch=i686 (when __i686 is a macro defined to 1).

Fixed.

I've also re-partitioned into 2 nested include files, simplifying the 
macros and shaving 200 lines of code duplication.


r~

[-- Attachment #2: sync.S --]
[-- Type: text/plain, Size: 12960 bytes --]

/* Synchronization functions for i386.

   Copyright (C) 2009 Free Software Foundation, Inc.

   This file is part of GCC.

   GCC is free software; you can redistribute it and/or modify it under
   the terms of the GNU General Public License as published by the Free
   Software Foundation; either version 3, or (at your option) any later
   version.

   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
   WARRANTY; without even the implied warranty of MERCHANTABILITY or
   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
   for more details.

   Under Section 7 of GPL version 3, you are granted additional
   permissions described in the GCC Runtime Library Exception, version
   3.1, as published by the Free Software Foundation.

   You should have received a copy of the GNU General Public License and
   a copy of the GCC Runtime Library Exception along with this program;
   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
   <http://www.gnu.org/licenses/>.  */

/* Note that we don't bother with a 64-bit version, as there we know
   that cmpxchg and lfence are both supported by the cpu.  */

/* Don't break builds configured with --with-arch=i686.  */
#undef __i686

/* Token concatenation macros.  */
#define CAT_(A,B)	A ## B
#define CAT(A,B)	CAT_(A,B)
#define CAT4_(A,B,C,D)	A ## B ## C ## D
#define CAT4(A,B,C,D)	CAT4_(A,B,C,D)

/* Redefine this to add a prefix to all symbols defined.  */
#define PREFIX
#define P(X)		CAT(PREFIX,X)

/* Redefine this to change the default alignment of subsequent functions.  */
#define ALIGN 4

/* Define the type of a symbol.  */
#ifdef __ELF__
# define TYPE(N,T)	.type P(N),T
#else
# define TYPE(N,T)
#endif

/* Define a new symbol, with appropriate type and alignment.  */
#define _DEFINE(N,T,A)	.p2align A; TYPE(N,T); P(N):

/* End the definition of a symbol.  */
#ifdef __ELF__
# define END(N)		.size P(N), .-P(N)
#else
# define END(N)
#endif

/* Redefine these to generate functions with different prefixes.  */
#define FUNC(N)		_DEFINE(N,@function,ALIGN)

/* Define an object name.  */
#define OBJECT(N,A)	_DEFINE(N,@object,A)

/* If gas cfi directives are supported, use them, otherwise do nothing.  */
#ifdef HAVE_GAS_CFI_DIRECTIVE
# define cfi_startproc			.cfi_startproc
# define cfi_endproc			.cfi_endproc
# define cfi_adjust_cfa_offset(O)	.cfi_adjust_cfa_offset O
# define cfi_rel_offset(R,O)		.cfi_rel_offset R, O
# define cfi_restore(R)			.cfi_restore R
#else
# define cfi_startproc
# define cfi_endproc
# define cfi_adjust_cfa_offset(O)
# define cfi_rel_offset(R,O)
# define cfi_restore(R)
#endif

/* Simplify generation of those cfi directives for the common cases.
   The PUSHS/POPS pair indicates the register should be saved for unwind;
   otherwise we simply adjust the CFA.  */
#define PUSH(R)			pushl R; cfi_adjust_cfa_offset(4)
#define PUSHS(R)		PUSH(R); cfi_rel_offset(R,0)
#define PUSHF			pushfl; cfi_adjust_cfa_offset(4)
#define POP(R)			popl R; cfi_adjust_cfa_offset(-4)
#define POPS(R)			POP(R); cfi_restore(R)
#define POPF			popfl; cfi_adjust_cfa_offset(-4)

/* Define a function name and do begin a new CFI proc.  */
#define FUNC_CFI(N)	FUNC(N); cfi_startproc

/* End a function (without CFI proc) or object.  */
#define END_CFI(N)	cfi_endproc; END(N)

/* Parameterize the PIC model for the target.  */
#ifdef __PIC__
# ifdef __ELF__
#  define PIC_INIT(REG)					\
	call	__i686.get_pc_thunk.REG;		\
	addl	$_GLOBAL_OFFSET_TABLE_, %CAT(e,REG)
#  define PIC_ADD(P,D)		addl P,D
#  define PIC_OFFSET(S)		S@GOTOFF
#  define PIC_ADDRESS(S,P)	S@GOTOFF(P)
# else
#  error "Unknown PIC model"
# endif
#else
# define PIC_INIT(REG)
# define PIC_ADD(P,D)
# define PIC_OFFSET(S)		S
# define PIC_ADDRESS(S,P)	S
#endif
\f
/* This variable caches the (relevant) properties of the currently
   running cpu.  It has the following values:
	-1	Uninitialized

	0	An LFENCE instruction required after any sync
		function with acquire semantics.  Given that
		lfence is an SSE2 insn, we also assume cmpxchg8b.

	1	No cmpxchg support.  Note that we're not interested
		in the 80486 XADD instruction, which we use.  If we
		have to use a spinlock for any of the routines for
		a data size, we have to use a spinlock for all of
		the routines for a data size.  Therefore XADD by
		itself isn't interesting.

	2	CMPXCHG supported
	3	CMPXCHG8B supported
*/

	.data
OBJECT(cpu_prop_index,2)
	.long	-1
END(cpu_prop_index)

	.text
/* Detect the properties of the currently running cpu, according to
   the values listed above.  Preserves all registers except EAX, which
   holds the return value.  */

FUNC_CFI(detect_cpu)
	PUSHS(%ebx)
	PUSH(%ecx)
	PUSH(%edx)
	PUSHS(%esi)
	PUSHS(%edi)

	/* Determine 386 vs 486 and presence of cpuid all at once.  */
	PUSHF
	PUSHF
	POP(%eax)
	movl	%eax, %edx
	xorl	$0x00200000, %eax
	PUSH(%eax)
	POPF
	PUSHF
	POP(%eax)
	POPF
	xorl	%edx, %eax

	/* If we weren't able to toggle the ID bit in the flags,
	   we don't have the cpuid instruction, and also don't
	   have the cmpxchg instruction.  */
	movl	$1, %esi		/* do not have cmpxchg */
	jz	.Legress

	movl	$2, %esi		/* have cmpxchg */

	xorl	%eax, %eax
	cpuid

	/* Check for AuthenticAMD.  At the end, %edi is zero for matched.  */
	xorl	$0x68747541, %ebx
	xorl	$0x444D4163, %ecx
	xorl	$0x69746E65, %edx
	movl	%ebx, %edi
	orl	%ecx, %edi
	orl	%edx, %edi

	/* If max_cpuid == 0, we can check no further.  */
	testl	%eax, %eax
	jz	.Legress

	movl	$1, %eax
	cpuid

	/* Check for cmpxchg8b support.  The CX8 bit is 1<<8 in EDX.  */
	shr	$8, %edx
	andl	$1, %edx
	addl	%edx, %esi		/* incr iff cmpxchg8b */

	/* Check for AMD cpu.  */
	testl	%edi, %edi
	jnz	.Legress

	/* Extract family (%edx) and model (%ecx).  */
	movl	%eax, %edx
	movl	%eax, %ecx
	shl	$4, %edx
	shl	$8, %ecx
	andl	$0xf, %edx
	andl	$0xf, %ecx
	cmpl	$0xf, %edx		/* if family=15... */
	jne	2f
	shl	$12, %eax		/* ... include extended fields.  */
	movl	%eax, %ebx
	andl	$0xf0, %ebx
	addl	%ebx, %ecx
	movzbl	%ah, %eax
	addl	%eax, %edx
2:
	/* Opteron Rev E has a bug in which on very rare occasions
           a locked instruction doesn't act as a read-acquire barrier
           if followed by a non-locked read-modify-write instruction.
           Rev F has this bug in pre-release versions, but not in
           versions released to customers, so we test only for Rev E,
           which is family 15, model 32..63 inclusive.  */
	cmpl	$15, %edx
	jne	.Legress
	cmpl	$32, %ecx
	jb	.Legress
	cmpl	$63, %ecx
	ja	.Legress
	xorl	%esi, %esi		/* need lfence */

.Legress:
	movl	%esi, %eax
	POPS(%edi)
	POPS(%esi)
	POP(%edx)
	POP(%ecx)
	POPS(%ebx)
	ret
END_CFI(detect_cpu)
\f
	/* Note that this CFI proc covers all of the ifuncs.  */
	.p2align ALIGN
	cfi_startproc

#if defined(HAVE_GNU_INDIRECT_FUNCTION) && defined(__PIC__)
/* If we have indirect function support in the shared libgcc, we wish
   to define the entry point symbol such that it returns the address
   of the function we wish to execute for this cpu.  The result of the
   indirect function is stored in the PLT so that future invocations
   proceed directly to the target function.

   Each entry point defines a 4-entry table according to the values
   for cpu_prop_index and we use a common routine to load the value.  */

FUNC(common_indirect_function)
	PIC_INIT(cx)
	PIC_ADD(%ecx, %edx)
	movl	PIC_ADDRESS(cpu_prop_index,%ecx), %eax
	testl	%eax, %eax
	jns	1f
	call	detect_cpu
	movl	%eax, PIC_ADDRESS(cpu_prop_index,%ecx)
1:	movl	(%edx,%eax,4), %eax
	PIC_ADD(%ecx, %eax)
	ret
END(common_indirect_function)

#define _IFUNC(N,P2,P3)						\
	_DEFINE(CAT(__,N),@gnu_indirect_function,3);		\
	movl	$PIC_OFFSET(CAT(t_,N)), %edx;			\
	jmp	P(common_indirect_function);			\
	END(CAT(__,N));						\
	.globl	CAT(__,N);					\
	.section .rodata;					\
	OBJECT(CAT(t_,N),2);					\
	.long	PIC_OFFSET(CAT(l_,N));				\
	.long	PIC_OFFSET(CAT(o_,N));				\
	.long	PIC_OFFSET(CAT(P2,N));				\
	.long	PIC_OFFSET(CAT(P3,N));				\
	END(CAT(t_,N));						\
	.text

#define IFUNC(N)	_IFUNC(N,n_,n_)
#define IFUNC8(N)	_IFUNC(N,o_,n_)

#else
/* If we don't have (or aren't using) indirect function support, define
   functions that dispatch to the correct implementation function.  */
/* ??? The question is, what's the best method for the branch predictors?
   My guess is that indirect branches are, in general, hardest.  Therefore
   separate the 3 with compares and use direct branches.  Aid the Pentium4
   static branch predictor by indicating that the "normal" function is the
   one we expect to execute.  */

#define _IFUNC(N,CX_IDX)					\
	.globl	CAT(__,N);					\
	FUNC(CAT(__,N));					\
	PIC_INIT(cx);						\
	movl	PIC_ADDRESS(cpu_prop_index,%ecx), %eax;		\
	testl	%eax, %eax;					\
	jns,pt	1f;						\
	call	detect_cpu;					\
	movl	%eax, PIC_ADDRESS(cpu_prop_index,%ecx);		\
1:	cmpl	$CX_IDX, %eax;					\
	jge,pt	CAT(n_,N);					\
	testl	%eax, %eax;					\
	jz	CAT(l_,N);					\
	jmp	CAT(o_,N);					\
	END(CAT(__,N))

#define IFUNC(N)	_IFUNC(N,2)
#define IFUNC8(N)	_IFUNC(N,3)

#endif /* HAVE_GNU_INDIRECT_FUNCTION */

	IFUNC(sync_val_compare_and_swap_1)
	IFUNC(sync_val_compare_and_swap_2)
	IFUNC(sync_val_compare_and_swap_4)
	IFUNC8(sync_val_compare_and_swap_8)

	IFUNC(sync_bool_compare_and_swap_1)
	IFUNC(sync_bool_compare_and_swap_2)
	IFUNC(sync_bool_compare_and_swap_4)
	IFUNC8(sync_bool_compare_and_swap_8)

	IFUNC(sync_fetch_and_add_1)
	IFUNC(sync_fetch_and_add_2)
	IFUNC(sync_fetch_and_add_4)
	IFUNC8(sync_fetch_and_add_8)

	IFUNC(sync_add_and_fetch_1)
	IFUNC(sync_add_and_fetch_2)
	IFUNC(sync_add_and_fetch_4)
	IFUNC8(sync_add_and_fetch_8)

	IFUNC(sync_fetch_and_sub_1)
	IFUNC(sync_fetch_and_sub_2)
	IFUNC(sync_fetch_and_sub_4)
	IFUNC8(sync_fetch_and_sub_8)

	IFUNC(sync_sub_and_fetch_1)
	IFUNC(sync_sub_and_fetch_2)
	IFUNC(sync_sub_and_fetch_4)
	IFUNC8(sync_sub_and_fetch_8)

	IFUNC(sync_fetch_and_or_1)
	IFUNC(sync_fetch_and_or_2)
	IFUNC(sync_fetch_and_or_4)
	IFUNC8(sync_fetch_and_or_8)

	IFUNC(sync_or_and_fetch_1)
	IFUNC(sync_or_and_fetch_2)
	IFUNC(sync_or_and_fetch_4)
	IFUNC8(sync_or_and_fetch_8)

	IFUNC(sync_fetch_and_and_1)
	IFUNC(sync_fetch_and_and_2)
	IFUNC(sync_fetch_and_and_4)
	IFUNC8(sync_fetch_and_and_8)

	IFUNC(sync_and_and_fetch_1)
	IFUNC(sync_and_and_fetch_2)
	IFUNC(sync_and_and_fetch_4)
	IFUNC8(sync_and_and_fetch_8)

	IFUNC(sync_fetch_and_nand_1)
	IFUNC(sync_fetch_and_nand_2)
	IFUNC(sync_fetch_and_nand_4)
	IFUNC8(sync_fetch_and_nand_8)

	IFUNC(sync_nand_and_fetch_1)
	IFUNC(sync_nand_and_fetch_2)
	IFUNC(sync_nand_and_fetch_4)
	IFUNC8(sync_nand_and_fetch_8)

	cfi_endproc
\f
/* Some macros passed to e.g. SYNC_FETCH_AND_OP macro.  */
#define OR(S,D)		or S,D
#define AND(S,D)	and S,D
#define NAND(S,D)	not D; and S,D
#define ADD(S,D)	add S,D
#define ADC(S,D)	adc S,D
#define SUB(S,D)	sub S,D
#define SBB(S,D)	sbb S,D

/* The actual body of the functions are implemented in sync.inc.
   Include it 3 times with different parameters to generate the
   "normal" (i.e. cmpxchg), "lfence", and "old" (i.e. no cmpxchg)
   versions of the code.  */

#undef	ALIGN
#define	ALIGN	4

#undef	PREFIX
#define	PREFIX	n_
#define	LFENCE
#include "sync-1.inc"

#define	PREFIX	l_
#define	LFENCE	lfence
#include "sync-1.inc"

/* Conserve space in the spinlock versions.  */
#undef	ALIGN
#define ALIGN 2

#define	PREFIX	o_
#define SPINLOCK 1

/* The spinlock variable.  */
/* ??? If this object is not going to be included in shared libgcc,
   should we make this variable global, so that it can be unified
   across different (potential) copies of this object?  */
	.local	spinlock
	.comm	spinlock,4,4

/* Common code for the beginning and end of any spinlock protected function. */

#ifdef __PIC__
#define ARG(N)	N+8(%esp)	/* Skip saved ebx and return address.  */
#define SPINLOCK_LOCK		PUSHS(%ebx); call P(spinlock_lock)
FUNC(spinlock_lock)
	/* Note that this startproc covers both lock and unlock functions.  */
	cfi_startproc
	PIC_INIT(bx)
1:	lock
	btsl	$0, PIC_ADDRESS(spinlock,%ebx)
	jc	1b
	ret
END(spinlock_lock)

#define SPINLOCK_UNLOCK_AND_RET		jmp P(spinlock_unlock)
FUNC(spinlock_unlock)
	cfi_adjust_cfa_offset(4)
	cfi_rel_offset(%ebx,0)
	xorl	%ecx, %ecx
	movl	%ecx, PIC_ADDRESS(spinlock,%ebx)
	POPS(%ebx)
	ret
	cfi_endproc
END(spinlock_unlock)
#else
#define ARG(N)	N+4(%esp)	/* Skip return address.  */
#define SPINLOCK_LOCK				\
	1: lock; btsl $0, spinlock; jc 1b
#define SPINLOCK_UNLOCK_AND_RET			\
	xorl	%ecx,%ecx; movl	%ecx,spinlock; ret
#endif /* PIC */

#include "sync-1.inc"
\f
#ifdef __ELF__
#ifdef __PIC__
	.section .text.__i686.get_pc_thunk.bx,"axG",@progbits,__i686.get_pc_thunk.bx,comdat
.globl __i686.get_pc_thunk.bx
	.hidden	__i686.get_pc_thunk.bx
	.type	__i686.get_pc_thunk.bx, @function
__i686.get_pc_thunk.bx:
	movl	(%esp), %ebx
	ret
	.section .text.__i686.get_pc_thunk.cx,"axG",@progbits,__i686.get_pc_thunk.cx,comdat
.globl __i686.get_pc_thunk.cx
	.hidden	__i686.get_pc_thunk.cx
	.type	__i686.get_pc_thunk.cx, @function
__i686.get_pc_thunk.cx:
	movl	(%esp), %ecx
	ret
#endif
	.section	.note.GNU-stack,"",@progbits
#endif

[-- Attachment #3: sync-1.inc --]
[-- Type: text/plain, Size: 4810 bytes --]

/* This file is logically a part of sync.S.  It is included 3 times
   with macros set for the different function sets.  */

/* Generate all 1 byte operations.  */
#define SIZE		1
#define MOVEXT		movzbl
#define EXT(S,D)	movzbl S,D
#define SAX		%al
#define SBX		%bl
#define SCX		%cl
#define SDX		%dl
#include "sync-2.inc"

/* Generate all 2 byte operations.  */
#define SIZE		2
#define MOVEXT		movzwl
#define EXT(S,D)	movzwl S,D
#define SAX		%ax
#define SBX		%bx
#define SCX		%cx
#define SDX		%dx
#include "sync-2.inc"

/* Generate all 4 byte operations.  */
#define SIZE		4
#define MOVEXT		mov
#define EXT(S,D)
#define SAX		%eax
#define SBX		%ebx
#define SCX		%ecx
#define SDX		%edx
#include "sync-2.inc"

/* Generate all 8 byte operations.  */
FUNC_CFI(sync_val_compare_and_swap_8)
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	movl	(%ecx), %eax
	movl	4(%ecx), %edx
	cmpl	%eax, ARG(4)
	jne	2f
	cmpl	%edx, ARG(8)
	jne	2f
	PUSHS(%esi)
	movl	ARG(12), %esi
	movl	%esi, (%ecx)
	movl	ARG(16), %esi
	movl	%esi, 4(%ecx)
	POPS(%esi)
2:	SPINLOCK_UNLOCK_AND_RET
#else
	PUSHS(%ebx)
	PUSHS(%esi)
	movl	12(%esp), %esi
	movl	16(%esp), %eax
	movl	20(%esp), %edx
	movl	24(%esp), %ebx
	movl	28(%esp), %ecx
	lock; cmpxchg8b (%esi)
	LFENCE
	POPS(%esi)
	POPS(%ebx)
	ret
#endif
END_CFI(sync_val_compare_and_swap_8)

FUNC_CFI(sync_bool_compare_and_swap_8)
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	movl	ARG(4), %eax
	movl	ARG(8), %edx
	cmpl	%eax, (%ecx)
	jne	2f
	cmpl	%edx, 4(%ecx)
	jne	2f
	movl	ARG(12), %eax
	movl	ARG(16), %edx
	movl	%eax, (%ecx)
	movl	%edx, 4(%ecx)
	movl	$1, %eax
	SPINLOCK_UNLOCK_AND_RET
2:	xorl	%eax, %eax
	SPINLOCK_UNLOCK_AND_RET
#else
	PUSHS(%ebx)
	PUSHS(%esi)
	movl	12(%esp), %esi
	movl	16(%esp), %eax
	movl	20(%esp), %edx
	movl	24(%esp), %ebx
	movl	28(%esp), %ecx
	lock; cmpxchg8b (%esi)
	LFENCE
	setz	%al
	movzbl	%al,%eax
	POPS(%esi)
	POPS(%ebx)
	ret
#endif
END_CFI(sync_bool_compare_and_swap_8)

#ifdef SPINLOCK
#define SYNC_FETCH_AND_OP_8(N, OPLO, OPHI)			\
	FUNC_CFI(CAT4(sync_fetch_and_,N,_,8));			\
	SPINLOCK_LOCK;						\
	PUSHS(%esi);						\
	PUSHS(%edi);						\
	/* Note that the ARG macro doesn't include the two	\
	   pushes that we do above, so need to bias by 8.  */	\
	movl	ARG(8), %ecx;					\
	movl	ARG(12), %esi;					\
	movl	ARG(16), %edi;					\
	movl	(%ecx), %eax;					\
	movl	4(%ecx), %edx;					\
	OPLO(%eax, %esi);					\
	OPHI(%edx, %edi);					\
	movl	%esi, (%ecx);					\
	movl	%edi, 4(%ecx);					\
	POPS(%edi);						\
	POPS(%esi);						\
	SPINLOCK_UNLOCK_AND_RET;				\
	END_CFI(CAT4(sync_fetch_and_,N,_,8))
#else
#define SYNC_FETCH_AND_OP_8(N, OPLO, OPHI)			\
	FUNC_CFI(CAT4(sync_fetch_and_,N,_,8));			\
	PUSHS(%ebx);						\
	PUSHS(%esi);						\
	PUSHS(%edi);						\
	PUSHS(%ebp);						\
	movl	20(%esp), %esi;					\
	movl	24(%esp), %edi;					\
	movl	28(%esp), %ebp;					\
	movl	(%esi), %eax;					\
	movl	4(%esi), %edx;					\
1:	movl	%eax, %ebx;					\
	movl	%edx, %ecx;					\
	OPLO(%edi, %ebx);					\
	OPHI(%ebp, %ecx);					\
	lock; cmpxchg8b (%esi);					\
	jnz	1b;						\
	LFENCE;							\
	POPS(%ebp);						\
	POPS(%edi);						\
	POPS(%esi);						\
	POPS(%ebx);						\
	ret;							\
	END_CFI(CAT4(sync_fetch_and_,N,_,8))
#endif /* SPINLOCK */

#ifdef SPINLOCK
#define SYNC_OP_AND_FETCH_8(N, OPLO, OPHI)			\
	FUNC_CFI(CAT4(sync_,N,_and_fetch_,8));			\
	SPINLOCK_LOCK;						\
	movl	ARG(0), %ecx;					\
	movl	ARG(4), %eax;					\
	movl	ARG(8), %edx;					\
	OPLO((%ecx), %eax);					\
	OPHI(4(%ecx), %edx);					\
	movl	%eax, (%ecx);					\
	movl	%edx, 4(%ecx);					\
	SPINLOCK_UNLOCK_AND_RET;				\
	END_CFI(CAT4(sync_,N,_and_fetch_,8))
#else
#define SYNC_OP_AND_FETCH_8(N, OPLO, OPHI)			\
	FUNC_CFI(CAT4(sync_,N,_and_fetch_,8));			\
	PUSHS(%ebx);						\
	PUSHS(%esi);						\
	PUSHS(%edi);						\
	PUSHS(%ebp);						\
	movl	20(%esp), %esi;					\
	movl	24(%esp), %edi;					\
	movl	28(%esp), %ebp;					\
	movl	(%esi), %eax;					\
	movl	4(%esi), %edx;					\
1:	movl	%eax, %ebx;					\
	movl	%edx, %ecx;					\
	OPLO(%edi, %ebx);					\
	OPHI(%ebp, %ecx);					\
	lock; cmpxchg8b (%esi);					\
	jnz	1b;						\
	LFENCE;							\
	movl	%ebx, %eax;					\
	movl	%ecx, %edx;					\
	POPS(%ebp);						\
	POPS(%edi);						\
	POPS(%esi);						\
	POPS(%ebx);						\
	ret;							\
	END_CFI(CAT4(sync_,N,_and_fetch_,8))
#endif /* SPINLOCK */

	SYNC_FETCH_AND_OP_8(add, ADD, ADC)
	SYNC_FETCH_AND_OP_8(sub, SUB, SBB)
	SYNC_FETCH_AND_OP_8(or, OR, OR)
	SYNC_FETCH_AND_OP_8(and, AND, AND)
	SYNC_FETCH_AND_OP_8(nand, NAND, NAND)

	SYNC_OP_AND_FETCH_8(add, ADD, ADC)
	SYNC_OP_AND_FETCH_8(sub, SUB, SBB)
	SYNC_OP_AND_FETCH_8(or, OR, OR)
	SYNC_OP_AND_FETCH_8(and, AND, AND)
	SYNC_OP_AND_FETCH_8(nand, NAND, NAND)

/* Undef all macros defined herein for reinclude.  */
#undef SYNC_FETCH_AND_OP_8
#undef SYNC_OP_AND_FETCH_8

/* Undef all the parameters to this file for reinclude.  */
#undef PREFIX
#undef LFENCE

[-- Attachment #4: sync-2.inc --]
[-- Type: text/plain, Size: 3890 bytes --]

/* This file is logically a part of sync.S.  It is included 3 times
   into sync-1.inc with macros set for various operand sizes.  */

FUNC_CFI(CAT(sync_val_compare_and_swap_,SIZE))
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	MOVEXT	(%ecx), %eax
	cmp	SAX, ARG(4)
	jne	1f
	movl	ARG(8), %edx
	mov	SDX, (%ecx)
1:	SPINLOCK_UNLOCK_AND_RET
#else
	movl	4(%esp), %ecx
	MOVEXT	8(%esp), %eax
	movl	12(%esp), %edx
	lock; cmpxchg SDX, (%ecx)
	LFENCE
	ret
#endif
END_CFI(CAT(sync_val_compare_and_swap_,SIZE))

FUNC_CFI(CAT(sync_bool_compare_and_swap_,SIZE))
#ifdef SPINLOCK
	SPINLOCK_LOCK
	movl	ARG(0), %ecx
	movl	ARG(4), %edx
	xorl	%eax, %eax
	cmp	SDX, (%ecx)
	jne	1f
	movl	ARG(8), %edx
	mov	SDX, (%ecx)
	incl	%eax
1:	SPINLOCK_UNLOCK_AND_RET
#else
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	movl	12(%esp), %edx
	lock; cmpxchg SDX, (%ecx)
	LFENCE
	setz	%al
	movzbl	%al, %eax
	ret
#endif
END_CFI(CAT(sync_bool_compare_and_swap_,SIZE))

#ifndef SPINLOCK
	/* This CFI covers all of the add and subtract functions.  */
	.p2align ALIGN
	cfi_startproc

FUNC(CAT(sync_fetch_and_add_,SIZE))
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	lock; xadd SAX, (%ecx)
	LFENCE
	EXT(SAX, %eax)
	ret
END(CAT(sync_fetch_and_add_,SIZE))

FUNC(CAT(sync_add_and_fetch_,SIZE))
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	movl	%eax, %edx
	lock; xadd SDX, (%ecx)
	LFENCE
	add	SDX, SAX
	EXT(SAX, %eax)
	ret
END(CAT(sync_add_and_fetch_,SIZE))

FUNC(CAT(sync_fetch_and_sub_,SIZE))
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	negl	%eax
	lock; xadd SAX, (%ecx)
	LFENCE
	EXT(SAX, %eax)
	ret
END(CAT(sync_fetch_and_sub_,SIZE))

FUNC(CAT(sync_sub_and_fetch_,SIZE))
	movl	4(%esp), %ecx
	movl	8(%esp), %eax
	negl	%eax
	movl	%eax, %edx
	lock; xadd SDX, (%ecx)
	LFENCE
	add	SDX, SAX
	EXT(SAX, %eax)
	ret
END(CAT(sync_sub_and_fetch_,SIZE))

	cfi_endproc
#endif /* SPINLOCK */
\f
#ifdef SPINLOCK
#define SYNC_FETCH_AND_OP(N, OP)				\
	FUNC_CFI(CAT4(sync_fetch_and_,N,_,SIZE));		\
	SPINLOCK_LOCK;						\
	movl	ARG(0), %ecx;					\
	movl	ARG(4), %edx;					\
	MOVEXT	(%ecx), %eax;					\
	OP(%eax, %edx);						\
	mov	SDX, (%ecx);					\
	SPINLOCK_UNLOCK_AND_RET;				\
	END_CFI(CAT4(sync_fetch_and_,N,_,SIZE))
#else
#define SYNC_FETCH_AND_OP(N, OP)				\
	FUNC_CFI(CAT4(sync_fetch_and_,N,_,SIZE));		\
	PUSHS(%ebx);						\
	movl	8(%esp), %ecx;					\
	movl	12(%esp), %ebx;					\
	mov	(%ecx), SAX;					\
1:	mov	SAX, SDX;					\
	OP(SBX, SDX);						\
	lock; cmpxchg SDX, (%ecx);				\
	jnz	1b;						\
	LFENCE;							\
	EXT(SAX, %eax);						\
	POPS(%ebx);						\
	ret;							\
	END_CFI(CAT4(sync_fetch_and_,N,_,S))
#endif /* SPINLOCK */

#ifdef SPINLOCK
#define SYNC_OP_AND_FETCH(N, OP)				\
	FUNC_CFI(CAT4(sync_,N,_and_fetch_,SIZE));		\
	SPINLOCK_LOCK;						\
	movl	ARG(0), %ecx;					\
	movl	ARG(4), %eax;					\
	OP((%ecx), SAX);					\
	mov	SAX, (%ecx);					\
	EXT(SAX, %eax);						\
	SPINLOCK_UNLOCK_AND_RET;				\
	END_CFI(CAT4(sync_,N,_and_fetch_,SIZE))
#else
#define SYNC_OP_AND_FETCH(N, OP)				\
	FUNC_CFI(CAT4(sync_,N,_and_fetch_,SIZE));		\
	PUSHS(%ebx);						\
	movl	8(%esp), %ecx;					\
	movl	12(%esp), %ebx;					\
	mov	(%ecx), SAX;					\
1:	mov	SAX, SDX;					\
	OP(SBX, SDX);						\
	lock; cmpxchg SDX, (%ecx);				\
	jnz	1b;						\
	LFENCE;							\
	MOVEXT	SDX, %eax;					\
	POPS(%ebx);						\
	ret;							\
	END_CFI(CAT4(sync_,N,_and_fetch_,SIZE))
#endif /* SPINLOCK */

#ifdef SPINLOCK
	SYNC_FETCH_AND_OP(add, ADD)
	SYNC_FETCH_AND_OP(sub, SUB)
#endif
	SYNC_FETCH_AND_OP(or, OR)
	SYNC_FETCH_AND_OP(and, AND)
	SYNC_FETCH_AND_OP(nand, NAND)

#ifdef SPINLOCK
	SYNC_OP_AND_FETCH(add, ADD)
	SYNC_OP_AND_FETCH(sub, SUB)
#endif
	SYNC_OP_AND_FETCH(or, OR)
	SYNC_OP_AND_FETCH(and, AND)
	SYNC_OP_AND_FETCH(nand, NAND)

/* Undef all macros defined herein for reinclude.  */
#undef SYNC_FETCH_AND_OP
#undef SYNC_OP_AND_FETCH

/* Undef all the parameters to this file for reinclude.  */
#undef SIZE
#undef MOVEXT
#undef EXT
#undef SAX
#undef SBX
#undef SCX
#undef SDX

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [CFT] i386 sync functions for PR 39677
  2009-10-16 23:48   ` Richard Henderson
@ 2009-10-17 18:14     ` Paolo Bonzini
  2009-10-17 18:51       ` Uros Bizjak
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2009-10-17 18:14 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joseph S. Myers, Richard Henderson, GCC Patches, ro, dannysmith, ubizjak

On 10/17/2009 01:27 AM, Richard Henderson wrote:
> #define NAND(S,D)	not D; and S,D

I'm not 100% positive, but shouldn't this be "not S; and S,D" (as in 
"clear the given bits")?

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [CFT] i386 sync functions for PR 39677
  2009-10-17 18:14     ` Paolo Bonzini
@ 2009-10-17 18:51       ` Uros Bizjak
  2009-10-17 22:31         ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2009-10-17 18:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Richard Henderson, Joseph S. Myers, Richard Henderson,
	GCC Patches, ro, dannysmith

On 10/17/2009 08:02 PM, Paolo Bonzini wrote:
> On 10/17/2009 01:27 AM, Richard Henderson wrote:
>> #define NAND(S,D)    not D; and S,D
>
> I'm not 100% positive, but shouldn't this be "not S; and S,D" (as in 
> "clear the given bits")?

This should be "and S,D; not D", since NAND stands for NOT AND.

Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [CFT] i386 sync functions for PR 39677
  2009-10-17 18:51       ` Uros Bizjak
@ 2009-10-17 22:31         ` Paolo Bonzini
  2009-10-18 11:58           ` Uros Bizjak
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2009-10-17 22:31 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Richard Henderson, Joseph S. Myers, Richard Henderson,
	GCC Patches, ro, dannysmith

On 10/17/2009 08:22 PM, Uros Bizjak wrote:
> On 10/17/2009 08:02 PM, Paolo Bonzini wrote:
>> On 10/17/2009 01:27 AM, Richard Henderson wrote:
>>> #define NAND(S,D) not D; and S,D
>>
>> I'm not 100% positive, but shouldn't this be "not S; and S,D" (as in
>> "clear the given bits")?
>
> This should be "and S,D; not D", since NAND stands for NOT AND.

Not in sync builtins (nand is actually andn).  You fixed that bug IIRC.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [CFT] i386 sync functions for PR 39677
  2009-10-17 22:31         ` Paolo Bonzini
@ 2009-10-18 11:58           ` Uros Bizjak
  0 siblings, 0 replies; 11+ messages in thread
From: Uros Bizjak @ 2009-10-18 11:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Richard Henderson, Joseph S. Myers, Richard Henderson,
	GCC Patches, ro, dannysmith

On 10/17/2009 11:13 PM, Paolo Bonzini wrote:
>>>> #define NAND(S,D) not D; and S,D
>>>
>>> I'm not 100% positive, but shouldn't this be "not S; and S,D" (as in
>>> "clear the given bits")?
>>
>> This should be "and S,D; not D", since NAND stands for NOT AND.
>
>
> Not in sync builtins (nand is actually andn).  You fixed that bug IIRC.

No! it is actually NAND, that is "not(a b)". Please look at [1], my 
patch fixed the bug the other way around.

[1] http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01214.html

Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [CFT] i386 sync functions for PR 39677
  2009-10-16 23:11 [CFT] i386 sync functions for PR 39677 Richard Henderson
  2009-10-16 23:27 ` Joseph S. Myers
@ 2009-10-19 14:24 ` Rainer Orth
  2009-10-24  6:51 ` Danny Smith
  2 siblings, 0 replies; 11+ messages in thread
From: Rainer Orth @ 2009-10-19 14:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, dannysmith, ubizjak

Richard Henderson <rth@twiddle.net> writes:

> Second, I'd like to ask different port maintainers (cygwin and solaris
> particularly) to try to compile the code and report any portability
> problems.  Use any relevant combinations of:
>
>   -fpic
>   -DHAVE_GAS_CFI_DIRECTIVE
>   -DHAVE_GNU_INDIRECT_FUNCTION

I've tried the updated versions on Solaris 10/x86, with current mainline
configured to use /usr/sfw/bin/gas (gas 2.15).

It compiles without any additional options, but fails with -fpic:

/var/tmp//ccQxxfbf.s: Assembler messages:
/var/tmp//ccQxxfbf.s:755: Warning: setting incorrect section attributes for .text.__i686.get_pc_thunk.bx
/var/tmp//ccQxxfbf.s:762: Warning: setting incorrect section attributes for .text.__i686.get_pc_thunk.cx

Otherwise, all combinations of HAVE_GAS_CFI_DIRECTIVE and
HAVE_GNU_INDIRECT_FUNCTION work, although the compilers' auto-host.h
defines HAVE_GAS_CFI_DIRECTIVE as 0.  The latter means that the test
should be changed from #ifdef HAVE_GAS_CFI_DIRECTIVE to #if
HAVE_GAS_CFI_DIRECTIVE.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [CFT] i386 sync functions for PR 39677
  2009-10-16 23:11 [CFT] i386 sync functions for PR 39677 Richard Henderson
  2009-10-16 23:27 ` Joseph S. Myers
  2009-10-19 14:24 ` Rainer Orth
@ 2009-10-24  6:51 ` Danny Smith
  2 siblings, 0 replies; 11+ messages in thread
From: Danny Smith @ 2009-10-24  6:51 UTC (permalink / raw)
  To: Richard Henderson, GCC Patches; +Cc: ro, ubizjak

> Second, I'd like to ask different port maintainers (cygwin and solaris 
> particularly) to try to compile the code and report any portability 
> problems.  Use any relevant combinations of:
> 
Sorry for delay
On mingw32:

gcc -c sync.S
sync.S: Assembler messages:
sync.S:416: Error: unknown pseudo-op: `.local'

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [CFT] i386 sync functions for PR 39677
  2009-10-17  6:12 Ross Ridge
@ 2009-10-17 17:14 ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2009-10-17 17:14 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc-patches

On 10/16/2009 11:07 PM, Ross Ridge wrote:
> Is really worth support supporting older CPUs?  It wouldn't seem
> unreasonble to me for GCC to require that the CPU support at least the
> CMPXCHG8 instruction (ie. Pentium or better) to use the __sync* functions.
>
> Not that it really matters, but your CPUID logic seems a bit wrong.
> All Intel 80486 CPUs supported CMPXCHG, but only the later ones supported
> CPUID.

You're quite right -- I'd misremembered cmpxchg in 586.

And having slept on the problem I'm sure I don't think we should get 
into the spinlock thing -- there's too many ways for that to go wrong. 
I'll simplify the cpuid check to only look for the amd cpu with the 
eratta, and use the atomic instructions as needed.  The resulting sigill 
will be no different than at present, when we simply compile the 
libraries with -march=i586.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [CFT] i386 sync functions for PR 39677
@ 2009-10-17  6:12 Ross Ridge
  2009-10-17 17:14 ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: Ross Ridge @ 2009-10-17  6:12 UTC (permalink / raw)
  To: gcc-patches

Richard Henderson writes:
> Given that we now have a central location for handling
> atomic synchronization, handle 80386 and 80486 via spinlock.
> This means that we'll no longer have to inject -march=i586
> for compiling some of our runtime libraries.

Is really worth support supporting older CPUs?  It wouldn't seem
unreasonble to me for GCC to require that the CPU support at least the
CMPXCHG8 instruction (ie. Pentium or better) to use the __sync* functions.

Not that it really matters, but your CPUID logic seems a bit wrong.
All Intel 80486 CPUs supported CMPXCHG, but only the later ones supported
CPUID.

					Ross Ridge

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-10-24  6:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-16 23:11 [CFT] i386 sync functions for PR 39677 Richard Henderson
2009-10-16 23:27 ` Joseph S. Myers
2009-10-16 23:48   ` Richard Henderson
2009-10-17 18:14     ` Paolo Bonzini
2009-10-17 18:51       ` Uros Bizjak
2009-10-17 22:31         ` Paolo Bonzini
2009-10-18 11:58           ` Uros Bizjak
2009-10-19 14:24 ` Rainer Orth
2009-10-24  6:51 ` Danny Smith
2009-10-17  6:12 Ross Ridge
2009-10-17 17:14 ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).