From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20425 invoked by alias); 14 Jan 2013 22:11:36 -0000 Received: (qmail 20407 invoked by uid 22791); 14 Jan 2013 22:11:34 -0000 X-SWARE-Spam-Status: No, hits=-3.2 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_BL,TW_DM,TW_OV X-Spam-Check-By: sourceware.org Received: from mail-pa0-f45.google.com (HELO mail-pa0-f45.google.com) (209.85.220.45) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 14 Jan 2013 22:11:19 +0000 Received: by mail-pa0-f45.google.com with SMTP id bg2so2476938pad.18 for ; Mon, 14 Jan 2013 14:11:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding:x-gm-message-state; bh=23N3/9hd+EjdPKaEwdk06M/gtkb3bFmY3sDRotofFE0=; b=KY2v5Zvr1Aj6j5d4rHUr39838BZ4jJivVHGMVIIMUoakZjS2q2C9tivXPDywi6xbUj tvtQPHe+D1Ifdll3IxU1GJAQLHoRYuWK7XtzR0RgeQ0KA5ZVluQVRvi/3vJi/9ojg6Mt 4Fqi+QY3xA7Gz4RLuYSE83aRlEcTHLsw3/EOG5EmRQ+iogRFjVuzXDkXwgiQH7Ijwrsh hbWxDNiCrcTkefkdjBAiozEmkXNmiSLxrrQ4q1OZyIyvvU06bs61R10XLDo5XShjF9dY YFrwt5rj21l7eFLynC00oDF4R6xukDteoaJ0/mfQhEv/i75+QG3f/KfZaJo15AnYPdo4 qtMQ== X-Received: by 10.68.190.227 with SMTP id gt3mr260499187pbc.5.1358201478175; Mon, 14 Jan 2013 14:11:18 -0800 (PST) Received: from [192.168.2.24] (bas3-ottawa23-1128747037.dsl.bell.ca. [67.71.80.29]) by mx.google.com with ESMTPS id qf7sm8869410pbb.49.2013.01.14.14.11.15 (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 14 Jan 2013 14:11:17 -0800 (PST) Message-ID: <50F48281.3070207@systemhalted.org> Date: Mon, 14 Jan 2013 22:11:00 -0000 From: Carlos O'Donell User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Jeff Law , "Joseph S. Myers" , Mike Frysinger CC: libc-alpha , libc-ports@sourceware.org, "Marcus Shawcroft" Subject: Re: [PATCH] Memory fencing problem in pthread cancellation References: <50F46969.1000305@redhat.com> In-Reply-To: <50F46969.1000305@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQnmnwi4vujbLw6k6sSHcLhII+M7f4YQj4FZIs1L/hG6k7BK0ESXbwnBQh7sAKy17Phtb4cI X-IsSubscribed: yes Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org X-SW-Source: 2013-01/txt/msg00017.txt.bz2 On 01/14/2013 03:24 PM, Jeff Law wrote: > Let's consider these two hunks of code in unwind-forcedunwind.c I'm adding libc-ports, Joseph Myers, and Marcus Shawcroft since this involves arm ARM devices that have a weakly ordered memory model. I'm also adding Mike Frysinger since this involes IA64. See my review below. > void > __attribute_noinline__ > pthread_cancel_init (void) > { > void *resume; > void *personality; > void *forcedunwind; > void *getcfa; > void *handle; > > if (__builtin_expect (libgcc_s_handle != NULL, 1)) > { > /* Force gcc to reload all values. */ > asm volatile ("" ::: "memory"); > return; > } > [ ... ] > > PTR_MANGLE (resume); > libgcc_s_resume = resume; > PTR_MANGLE (personality); > libgcc_s_personality = personality; > PTR_MANGLE (forcedunwind); > libgcc_s_forcedunwind = forcedunwind; > PTR_MANGLE (getcfa); > libgcc_s_getcfa = getcfa; > /* Make sure libgcc_s_handle is written last. Otherwise, > pthread_cancel_init might return early even when the pointer the > caller is interested in is not initialized yet. */ > atomic_write_barrier (); > libgcc_s_handle = handle; > } > > void > _Unwind_Resume (struct _Unwind_Exception *exc) > { > if (__builtin_expect (libgcc_s_handle == NULL, 0)) > pthread_cancel_init (); > else > atomic_read_barrier (); > > void (*resume) (struct _Unwind_Exception *exc) = libgcc_s_resume; > PTR_DEMANGLE (resume); > resume (exc); > } > > > If we're in Unwind_Resume and libgcc_s_handle is null, we call > pthread_cancel_init. pthread_cancel_init first checks if > libgcc_s_handle is non-null. If so, then pthread_cancel_init exits > early. Upon returning to Unwind_Resume we'll proceed to load & > demangle libgcc_s_resume and call it. > > Note carefully in this case we never executed an atomic_read_barrier > between the load of libgcc_s_handle and libgcc_s_resume loads. On a > weakly ordered target such as power7, the processor might speculate > the libgcc_s_resume load to a points prior to loading > libgcc_s_handle. > > So if another thread happens to update libgcc_s_handle between the > loads & checks of libgcc_s_handle in Unwind_Resume and > pthread_cancel_init, then we can end up using stale data for > libgcc_s_resume and blow up. > > This has been observed on a 32 processor power7 machine running 16 > instances of the attached testcase in parallel after a period of > several hours. > > There's a couple ways to fix this. One could be to eliminate the > early return from pthread_cancel_init. Unfortunately there's a call > to pthread_cancel_init from pthread_cancel. So every one of those > calls would suffer a performance penalty unless libgcc_s_handle as > exported from unwind-forcedunwind.c > > Another is to add a call to atomic_read_barrier just before the early > return from pthread_cancel_init. That's precisely what this patch > does. > > > While investigating, Carlos identified that the IA64 and ARM ports > have their own unwind-forcedunwind.c implementations and that they > were buggy in regards to fencing as well. Both fail to test > libgcc_s_handle and do not have the necessary calls to > atomic_read_barrier. This patch fixes those issues as well. > > While I have extensively tested the generic unwind-forcedunwind.c > change backported to a glibc-2.12 base, I have not checked the ARM or > IA64 bits in any way.> > > > patch > > > diff --git a/nptl/ChangeLog b/nptl/ChangeLog > index 4aacc17..90d9408 100644 > --- a/nptl/ChangeLog > +++ b/nptl/ChangeLog > @@ -1,3 +1,8 @@ > +2013-01-14 Jeff Law > + > + * sysdeps/pthrad/unwind-forcedunwind.c (pthread_cancel_init): Add > + missing call to atomic_read_barrier in early return path. > + > 2013-01-11 Carlos O'Donell > > * allocatestack.c (allocate_stack): Add comment. Remove assert > diff --git a/nptl/sysdeps/pthread/unwind-forcedunwind.c b/nptl/sysdeps/pthread/unwind-forcedunwind.c > index 9718606..6b72e3e 100644 > --- a/nptl/sysdeps/pthread/unwind-forcedunwind.c > +++ b/nptl/sysdeps/pthread/unwind-forcedunwind.c > @@ -46,6 +46,10 @@ pthread_cancel_init (void) > { > /* Force gcc to reload all values. */ > asm volatile ("" ::: "memory"); > + /* Order reads so as to prevent speculation of loads > + of libgcc_s_{resume,personality,forcedunwind,getcfa} > + to points prior to loading libgcc_s_handle. */ I'd say: s/loading libgcc_s_handle/the write barrier./g It is the writes of the other thread to globals libgcc_s_{resume,personality,forcedunwind,getcfa} which have not yet been seen by this thread which we are synchronizing using the read barrier. > + atomic_read_barrier (); > return; > } > > diff --git a/ports/ChangeLog.arm b/ports/ChangeLog.arm > index d44ea76..cdd2210 100644 > --- a/ports/ChangeLog.arm > +++ b/ports/ChangeLog.arm > @@ -1,3 +1,13 @@ > +2013-01-14 Jeff Law > + > + * sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c > + (pthread_cancel_init): Add missing call to atomic_read_barrier > + in early return path. > + (__gcc_personality_v0): Guard call to pthread_cancel_init with > + a test of libgcc_s_handle, not libgcc_s_personality. Add > + missing call to atomic_read_barrier. > + (_Unwind_ForcedUnwind, _Unwind_GetCFA): Similarly. > + > 2013-01-02 Joseph Myers > > * All files with FSF copyright notices: Update copyright dates > diff --git a/ports/ChangeLog.ia64 b/ports/ChangeLog.ia64 > index 28d5076..b6e1b9c 100644 > --- a/ports/ChangeLog.ia64 > +++ b/ports/ChangeLog.ia64 > @@ -1,3 +1,9 @@ > +2013-01-14 Jeff Law > + > + * sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c > + (_Unwind_GetBSP): Guard call to pthread_cancel_init with > + a test of libgcc_s_handle, not libgcc_s_getbsp. > + > 2013-01-02 Joseph Myers > > * All files with FSF copyright notices: Update copyright dates > diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c > index 58ca9ac..1a081c8 100644 > --- a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c > +++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c Please also fix this incorrect comment: 63 libgcc_s_getcfa = getcfa; 64 /* Make sure libgcc_s_getcfa is written last. Otherwise, 65 pthread_cancel_init might return early even when the pointer the 66 caller is interested in is not initialized yet. */ 67 atomic_write_barrier (); 68 libgcc_s_handle = handle; 69 } It should talk about libcc_s_handle not libgcc_s_getcfa. > @@ -40,6 +40,10 @@ pthread_cancel_init (void) > { > /* Force gcc to reload all values. */ > asm volatile ("" ::: "memory"); > + /* Order reads so as to prevent speculation of loads > + of libgcc_s_{resume,personality,forcedunwind,getcfa} > + to points prior to loading libgcc_s_handle. */ > + atomic_read_barrier (); Same comment as above. > return; > } > > @@ -132,8 +136,10 @@ __gcc_personality_v0 (_Unwind_State state, > struct _Unwind_Exception *ue_header, > struct _Unwind_Context *context) > { > - if (__builtin_expect (libgcc_s_personality == NULL, 0)) > + if (__builtin_expect (libgcc_s_handle == NULL, 0)) > pthread_cancel_init (); > + else > + atomic_read_barrier (); > > return libgcc_s_personality (state, ue_header, context); > } > @@ -142,8 +148,10 @@ _Unwind_Reason_Code > _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop, > void *stop_argument) > { > - if (__builtin_expect (libgcc_s_forcedunwind == NULL, 0)) > + if (__builtin_expect (libgcc_s_handle == NULL, 0)) > pthread_cancel_init (); > + else > + atomic_read_barrier (); > > return libgcc_s_forcedunwind (exc, stop, stop_argument); > } > @@ -151,8 +159,10 @@ _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop, > _Unwind_Word > _Unwind_GetCFA (struct _Unwind_Context *context) > { > - if (__builtin_expect (libgcc_s_getcfa == NULL, 0)) > + if (__builtin_expect (libgcc_s_handle == NULL, 0)) > pthread_cancel_init (); > + else > + atomic_read_barrier (); > > return libgcc_s_getcfa (context); > } Not sufficient. You need to fix the assembly also. 83 /* It's vitally important that _Unwind_Resume not have a stack frame; the 84 ARM unwinder relies on register state at entrance. So we write this in 85 assembly. */ 86 87 asm ( 88 " .globl _Unwind_Resume\n" 89 " .type _Unwind_Resume, %function\n" 90 "_Unwind_Resume:\n" 91 " .cfi_sections .debug_frame\n" 92 " " CFI_STARTPROC "\n" 93 " stmfd sp!, {r4, r5, r6, lr}\n" 94 " " CFI_ADJUST_CFA_OFFSET (16)" \n" 95 " " CFI_REL_OFFSET (r4, 0) "\n" 96 " " CFI_REL_OFFSET (r5, 4) "\n" 97 " " CFI_REL_OFFSET (r6, 8) "\n" 98 " " CFI_REL_OFFSET (lr, 12) "\n" 99 " " CFI_REMEMBER_STATE "\n" 100 " ldr r4, 1f\n" 101 " ldr r5, 2f\n" 102 "3: add r4, pc, r4\n" 103 " ldr r3, [r4, r5]\n" 104 " mov r6, r0\n" 105 " cmp r3, #0\n" 106 " beq 4f\n" The above assembly is equivalent to: if (libgcc_s_resume == NULL) pthread_cancel_init (); Which is missing the atomic_read_barrier (). 107 "5: mov r0, r6\n" 108 " ldmfd sp!, {r4, r5, r6, lr}\n" 109 " " CFI_ADJUST_CFA_OFFSET (-16) "\n" 110 " " CFI_RESTORE (r4) "\n" 111 " " CFI_RESTORE (r5) "\n" 112 " " CFI_RESTORE (r6) "\n" 113 " " CFI_RESTORE (lr) "\n" 114 " bx r3\n" 115 " " CFI_RESTORE_STATE "\n" 116 "4: bl pthread_cancel_init\n" 117 " ldr r3, [r4, r5]\n" 118 " b 5b\n" 119 " " CFI_ENDPROC "\n" 120 " .align 2\n" 121 #ifdef __thumb2__ 122 "1: .word _GLOBAL_OFFSET_TABLE_ - 3b - 4\n" 123 #else 124 "1: .word _GLOBAL_OFFSET_TABLE_ - 3b - 8\n" 125 #endif 126 "2: .word libgcc_s_resume(GOTOFF)\n" 127 " .size _Unwind_Resume, .-_Unwind_Resume\n" 128 ); Thus you need something like this completely untested patch: diff --git a/ports/sysdeps/arm/bits/atomic.h b/ports/sysdeps/arm/bits/atomic.h index 6e20df7..b0f45ad 100644 --- a/ports/sysdeps/arm/bits/atomic.h +++ b/ports/sysdeps/arm/bits/atomic.h @@ -42,6 +42,7 @@ void __arm_link_error (void); # define atomic_full_barrier() __sync_synchronize () #else # define atomic_full_barrier() __arm_assisted_full_barrier () +# define atomic_asm_full_barrier() __arm_asm_assisted_full_barrier () #endif /* An OS-specific bits/atomic.h file will define this macro if diff --git a/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h b/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h index c76b8f3..726d3ce 100644 --- a/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h +++ b/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h @@ -17,7 +17,8 @@ . */ /* If the compiler doesn't provide a primitive, we'll use this macro - to get assistance from the kernel. */ + to get assistance from the kernel. This should call the + __kuser_memory_barrier helper in the kernel. */ #ifdef __thumb2__ # define __arm_assisted_full_barrier() \ __asm__ __volatile__ \ @@ -25,6 +26,11 @@ "movt\tip, #0xffff\n\t" \ "blx\tip" \ : : : "ip", "lr", "cc", "memory"); +/* The asm variant is used as an insert into existing asm statements. */ +# define __arm_asm_assisted_full_barrier() \ + " movw ip, #0x0fa0\n" \ + " movt ip, #0xffff\n" \ + " blx ip" #else # define __arm_assisted_full_barrier() \ __asm__ __volatile__ \ @@ -32,6 +38,11 @@ "mov\tlr, pc\n\t" \ "add\tpc, ip, #(0xffff0fa0 - 0xffff0fff)" \ : : : "ip", "lr", "cc", "memory"); +/* The asm variant is used as an insert into existing asm statements. */ +# define __arm_asm_assisted_full_barrier() \ + " mov ip, #0xffff0fff\n" \ + " mov lr, pc\n" \ + " add pc, ip, #(0xffff0fa0 - 0xffff0fff)" #endif /* Atomic compare and exchange. This sequence relies on the kernel to diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c index 58ca9ac..d7ac847 100644 --- a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c +++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c @@ -104,6 +104,7 @@ asm ( " mov r6, r0\n" " cmp r3, #0\n" " beq 4f\n" + atomic_asm_full_barrier() "5: mov r0, r6\n" " ldmfd sp!, {r4, r5, r6, lr}\n" " " CFI_ADJUST_CFA_OFFSET (-16) "\n" @@ -123,7 +124,7 @@ asm ( #else "1: .word _GLOBAL_OFFSET_TABLE_ - 3b - 8\n" #endif -"2: .word libgcc_s_resume(GOTOFF)\n" +"2: .word libgcc_s_handle(GOTOFF)\n" " .size _Unwind_Resume, .-_Unwind_Resume\n" ); --- Note: * As of ARMv6 we should really have used: `MCR p15, 0, r0, c7, c10, 5' instead of calling the kernel helper. * As of ARMv7 we should really have used: `DMB' instead of calling the kernel barrier. * As of ARMv7 we should be using DMB with specific completers to implement atomic_read_barrier() and atomic_write_barrier(). > diff --git a/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c b/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c > index 8562afd..1d197e1 100644 > --- a/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c > +++ b/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c > @@ -31,8 +31,10 @@ static _Unwind_Word (*libgcc_s_getbsp) (struct _Unwind_Context *); > _Unwind_Word > _Unwind_GetBSP (struct _Unwind_Context *context) > { > - if (__builtin_expect (libgcc_s_getbsp == NULL, 0)) > + if (__builtin_expect (libgcc_s_handle == NULL, 0)) > pthread_cancel_init (); > + else > + atomic_read_barrier (); > > return libgcc_s_getbsp (context); > } The IA64 fix looks OK to me. Cheers, Carlos.