From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7419 invoked by alias); 25 Aug 2004 16:39:38 -0000 Mailing-List: contact libc-hacker-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-hacker-owner@sources.redhat.com Received: (qmail 7015 invoked from network); 25 Aug 2004 16:39:22 -0000 Received: from unknown (HELO palrel13.hp.com) (156.153.255.238) by sourceware.org with SMTP; 25 Aug 2004 16:39:22 -0000 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) by palrel13.hp.com (Postfix) with ESMTP id 193C21C003E1 for ; Wed, 25 Aug 2004 09:39:22 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.13.1/8.13.1/HPL-PA Hub) with ESMTP id i7PGdKO7024943 for ; Wed, 25 Aug 2004 09:39:20 -0700 (PDT) Received: from napali.hpl.hp.com (napali [127.0.0.1]) by napali.hpl.hp.com (8.12.11/8.12.11/Debian-3) with ESMTP id i7PGdKK0020224; Wed, 25 Aug 2004 09:39:20 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.12.11/8.12.11/Debian-3) id i7PGdJQI020221; Wed, 25 Aug 2004 09:39:19 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16684.49335.802840.212013@napali.hpl.hp.com> Date: Wed, 25 Aug 2004 16:39:00 -0000 To: libc-hacker@sources.redhat.com Cc: davidm@napali.hpl.hp.com Subject: fix ia64 longjmp() to work from alternate signal-stack Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-SW-Source: 2004-08/txt/msg00071.txt.bz2 Recently, HJ Lu noticed that tst-cancel20 is failing on ia64 when using libunwind. This failure was due to the fact that this test triggers a longjmp() from an alternate signal stack back to the main stack. The existing longjmp() cannot handle stack-crossing jumps correctly, which is the root-cause of the failure. The failure didn't show with the GCC builtin unwinder since it implicitly copies portions of the register backing-store from the alternate signal stack back to the main stack. This copying masked enough of the longjmp() bug that tst-cancel20 passed. However, despite this masking, the bug is clearly in longjmp(). In fact, tst-setjmp2 included below will crash reliably with the existing longjmp(). Unfortunately, fixing the problem is not trivial. The solution the patch below implements is as follows: longjmp() now invokes sigaltstack() to determine if we're doing a long jump from the alternate signal stack to the normal stack. If not, it will do a long jump as before. If the stacks are crossed, longjmp() now assumes the top of the alternate stack contains a sigcontext structure. It then uses the info in the sigcontext structure and in the jump-buffer to determine what portion of the backing store needs to be copied from the alternate signal stack to the main stack. After copying the minimal amount of data, it then finishes doing the long jump as before. I'm not 100% happy with the solution: o It adds a sigaltstack() system-call to the longjmp() patch. o longjmp() has to make assumptions about where the kernel puts the sigcontext on the alternate signal-stack. It's very unlikely that the kernel will change in this area, but it's still a dependency that I'd have rather avoided. Having said that, I didn't see any reasonable alternative. I considered doing a "flushrs" in setjmp() instead but as pointed out in the Software Conventions & Runtime Architecture manual, this cannot work because it's not (reasonably) possible to keep the stacked registers and their NaT-bits in sync that way. Performance-wise, the impact is as follows (each line shows the execution-time in number of cycles; the first iteration was executed with the jump-buffer flushed from the cache): Existing CVS libc (with lightweight syscalls enabled): Alternate signal disabled: sigsetjmp(save_sig=0) cyc: 1692 111 114 111 105 111 111 105 105 105 siglongjmp cyc: 2296 234 218 212 221 215 217 212 205 217 sigsetjmp(save_sig=1) cyc: 583 178 178 178 178 178 178 178 178 178 siglongjmp cyc: 1108 353 351 351 347 360 351 347 356 351 Alternate signal enabled: sigsetjmp(save_sig=0) cyc: 450 111 111 105 105 105 105 105 105 105 siglongjmp cyc: 1266 241 207 208 208 208 208 208 208 208 sigsetjmp(save_sig=1) cyc: 568 178 178 178 178 178 178 178 178 178 siglongjmp cyc: 1085 343 347 343 343 343 343 343 343 343 With patch below applied: Alternate signal disabled: sigsetjmp(save_sig=0) cyc: 1403 128 125 127 110 110 110 110 110 110 siglongjmp cyc: 3201 654 632 608 606 606 606 606 606 606 sigsetjmp(save_sig=1) cyc: 578 212 193 193 193 185 185 185 185 185 siglongjmp cyc: 1035 760 744 744 744 744 744 744 744 744 Alternate signal enabled: sigsetjmp(save_sig=0) cyc: 440 125 115 110 110 110 125 110 110 110 siglongjmp cyc: 1568 798 715 703 717 815 717 703 703 703 sigsetjmp(save_sig=1) cyc: 563 195 195 195 195 195 195 195 195 195 siglongjmp cyc: 1143 857 848 834 834 834 834 834 834 834 As you can see, the setjmp() performance is unchanged (apart from noise), as you'd expect. The impact on longjmp() is quite significant, however: about 390 cycles when not crossing stacks and about 495 cycles when crossing stacks (of course, the latter case was broken before, so it didn't help that it was faster...). The largest part of this overhead is due to the sigaltstack() system call that is now necessary. I don't see a way to avoid it, though we could at least implement the simple case of just reading the current stack info as a light-weight system-call handler, if somebody really cared a lot about longjmp() performance. I ran the test-suite before and applying the patch below and there are no changes except that tst-cancel20 and tst-setjmp2 now succeed. If there are no objections, please apply this patch. Thanks, --david File sysdeps/unix/sysv/linux/ia64/__longjmp.S should be removed! ChangeLog 2004-08-25 David Mosberger * sysdeps/unix/sysv/linux/ia64/Makefile (sysdep_routines): Mention __ia64_longjmp for setjmp directory. * setjmp/Makefile (tests): Mention tst-setjmp2. * sysdeps/unix/sysv/linux/ia64/__longjmp.c: New file. * sysdeps/unix/sysv/linux/ia64/__ia64_longjmp.S: Rename from __longjmp.S. (__ia64_flush_rbs): New function. (__ia64_longjmp): Rename from __longjmp. Simplify since some of the work is now done in the C __longjmp(). Index: setjmp/Makefile --- setjmp/Makefile +++ setjmp/Makefile @@ -26,7 +26,7 @@ routines := setjmp sigjmp bsd-setjmp bsd-_setjmp \ longjmp __longjmp jmp-unwind -tests := tst-setjmp jmpbug bug269-setjmp +tests := tst-setjmp tst-setjmp2 jmpbug bug269-setjmp include ../Rules Index: sysdeps/unix/sysv/linux/ia64/Makefile --- sysdeps/unix/sysv/linux/ia64/Makefile +++ sysdeps/unix/sysv/linux/ia64/Makefile @@ -7,6 +7,10 @@ gen-as-const-headers += sigcontext-offsets.sym endif +ifeq ($(subdir),setjmp) +sysdep_routines += __ia64_longjmp +endif + ifeq ($(subdir),misc) sysdep_headers += sys/io.h sysdep_routines += ioperm clone2 Index: sysdeps/unix/sysv/linux/ia64/__ia64_longjmp.S --- /dev/null +++ sysdeps/unix/sysv/linux/ia64/__ia64_longjmp.S @@ -0,0 +1,156 @@ +/* Copyright (C) 1999, 2000, 2001, 2004 Free Software Foundation, Inc. + Contributed by David Mosberger-Tang . + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + +#include +#include + +LEAF(__ia64_flush_rbs) + flushrs + mov r9 = ar.rsc // 12 cyc latency + ;; + mov r8 = ar.bsp // 12 cyc latency + ;; + and r16 = ~0x3, r9 // clear ar.rsc.mode + ;; + mov ar.rsc = r16 // put RSE into enforced-lazy mode + ;; + mov r10 = ar.rnat // 5 cyc latency + ret +END(__ia64_flush_rbs) + + +# define pPos p6 /* is rotate count positive? */ +# define pNeg p7 /* is rotate count negative? */ + +/* __ia64_longjmp(__jmp_buf buf, int val, long rnat, long rsc) */ + + +LEAF(__ia64_longjmp) + alloc r8=ar.pfs,4,0,0,0 + add r2=0x98,in0 // r2 <- &jmpbuf.orig_jmp_buf_addr + add r3=0x88,in0 // r3 <- &jmpbuf.ar_bsp + ;; + ld8 r8=[r2] // r8 <- orig_jmp_buf_addr + ld8 r23=[r3],8 // r23 <- jmpbuf.ar_bsp + mov r2=in0 + ;; + // + // Note: we need to redo the "flushrs" here even though it's + // already been done by __ia64_flush_rbs. It is needed to + // ensure that ar.bspstore == ar.bsp. + // + flushrs // flush dirty regs to backing store + ld8 r25=[r3] // r25 <- jmpbuf.ar_unat + sub r8=r8,in0 // r8 <- &orig_jmpbuf - &jmpbuf + ;; + add r3=8,in0 // r3 <- &jmpbuf.r1 + extr.u r8=r8,3,6 // r8 <- (&orig_jmpbuf - &jmpbuf)/8 & 0x3f + ;; + cmp.lt pNeg,pPos=r8,r0 + ;; +(pPos) mov r16=r8 +(pNeg) add r16=64,r8 +(pPos) sub r17=64,r8 +(pNeg) sub r17=r0,r8 + ;; + shr.u r8=r25,r16 + shl r9=r25,r17 + ;; + or r25=r8,r9 + ;; + mov ar.unat=r25 // setup ar.unat (NaT bits for r1, r4-r7, and r12) + ;; + ld8.fill.nta sp=[r2],16 // r12 (sp) + ld8.fill.nta gp=[r3],16 // r1 (gp) + dep r11=-1,r23,3,6 // r11 <- ia64_rse_rnat_addr(jmpbuf.ar_bsp) + ;; + ld8.nta r16=[r2],16 // caller's unat + ld8.nta r17=[r3],16 // fpsr + ;; + ld8.fill.nta r4=[r2],16 // r4 + ld8.fill.nta r5=[r3],16 // r5 (gp) + ;; + ld8.fill.nta r6=[r2],16 // r6 + ld8.fill.nta r7=[r3],16 // r7 + ;; + mov ar.unat=r16 // restore caller's unat + mov ar.fpsr=r17 // restore fpsr + ;; + ld8.nta r16=[r2],16 // b0 + ld8.nta r17=[r3],16 // b1 + ;; + mov ar.bspstore=r23 // restore ar.bspstore + ld8.nta r18=[r2],16 // b2 + ;; + mov ar.rnat=in2 // restore ar.rnat + ld8.nta r19=[r3],16 // b3 + ;; + ld8.nta r20=[r2],16 // b4 + ld8.nta r21=[r3],16 // b5 + ;; + ld8.nta r11=[r2],16 // ar.pfs + ld8.nta r22=[r3],56 // ar.lc + ;; + ld8.nta r24=[r2],32 // pr + mov ar.rsc=in3 // restore ar.rsc + mov b0=r16 + ;; + ldf.fill.nta f2=[r2],32 + ldf.fill.nta f3=[r3],32 + mov b1=r17 + ;; + ldf.fill.nta f4=[r2],32 + ldf.fill.nta f5=[r3],32 + mov b2=r18 + ;; + ldf.fill.nta f16=[r2],32 + ldf.fill.nta f17=[r3],32 + mov b3=r19 + ;; + ldf.fill.nta f18=[r2],32 + ldf.fill.nta f19=[r3],32 + mov b4=r20 + ;; + ldf.fill.nta f20=[r2],32 + ldf.fill.nta f21=[r3],32 + mov b5=r21 + ;; + ldf.fill.nta f22=[r2],32 + ldf.fill.nta f23=[r3],32 + mov ar.lc=r22 + ;; + ldf.fill.nta f24=[r2],32 + ldf.fill.nta f25=[r3],32 + cmp.eq p8,p9=0,in1 + ;; + ldf.fill.nta f26=[r2],32 + ldf.fill.nta f27=[r3],32 + mov ar.pfs=r11 + ;; + ldf.fill.nta f28=[r2],32 + ldf.fill.nta f29=[r3],32 +(p8) mov r8=1 + ;; + ldf.fill.nta f30=[r2] + ldf.fill.nta f31=[r3] +(p9) mov r8=in1 + + invala // virt. -> phys. regnum mapping may change + mov pr=r24,-1 + ret +END(__ia64_longjmp) Index: sysdeps/unix/sysv/linux/ia64/__longjmp.c --- /dev/null +++ sysdeps/unix/sysv/linux/ia64/__longjmp.c @@ -0,0 +1,174 @@ +/* Copyright (C) 2004 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by David Mosberger-Tang . + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + +/* This __longjmp() implementation is limited to jumping within the + same stack. That is, in general it is not possible to use this + __longjmp() implementation to cross from one stack to another. + However, as a special exception, we have to support the case where + __longjmp() is used to cross from the alternate signal-stack to the + normal stack. This is required by the Single Unix Spec, which says + this about longjmp(): + + "As it bypasses the usual function call and return mechanisms, + longjmp() shall execute correctly in contexts of interrupts, + signals, and any of their associated functions." + + Since the spec also defines sigaltstack(), this implies that + longjmp() from an alternate signal stack must work. */ + +#include +#include +#include +#include +#include + +#include + +#define JB_SP 0 +#define JB_BSP 17 + +struct rbs_flush_values + { + unsigned long bsp; + unsigned long rsc; + unsigned long rnat; + }; + +extern struct rbs_flush_values __ia64_flush_rbs (void); +extern void __ia64_longjmp (__jmp_buf buf, int val, long rnat, long rsc) + __attribute__ ((__noreturn__)); + +static void +copy_rbs (unsigned long *dst, unsigned long *dst_end, unsigned long dst_rnat, + unsigned long *src, unsigned long *src_end, + unsigned long current_rnat) +{ + unsigned long dst_slot, src_rnat = 0, src_slot, *src_rnat_addr, nat_bit; + int first_time = 1; + + while (dst < dst_end) + { + dst_slot = ia64_rse_slot_num (dst); + if (dst_slot == 63) + { + *dst++ = dst_rnat; + dst_rnat = 0; + } + else + { + /* read source value, including NaT bit: */ + src_slot = ia64_rse_slot_num (src); + if (src_slot == 63) + { + /* skip src RNaT slot */ + ++src; + src_slot = 0; + } + if (first_time || src_slot == 0) + { + first_time = 0; + src_rnat_addr = ia64_rse_rnat_addr (src); + if (src_rnat_addr < src_end) + src_rnat = *src_rnat_addr; + else + src_rnat = current_rnat; + } + nat_bit = (src_rnat >> src_slot) & 1; + + assert (src < src_end); + + *dst++ = *src++; + if (nat_bit) + dst_rnat |= (1UL << dst_slot); + else + dst_rnat &= ~(1UL << dst_slot); + } + } + dst_slot = ia64_rse_slot_num (dst); + if (dst_slot > 0) + *ia64_rse_rnat_addr (dst) = dst_rnat; +} + +void +__longjmp (__jmp_buf buf, int val) +{ + unsigned long *rbs_base, *bsp, *bspstore, *jb_bsp, jb_sp, ss_sp; + unsigned long ndirty, rnat, load_rnat, *jb_rnat_addr; + struct sigcontext *sc; + stack_t stk; + struct rbs_flush_values c; + + /* put RSE into enforced-lazy mode and return current bsp/rsc/rnat: */ + c = __ia64_flush_rbs (); + + jb_sp = ((unsigned long *) buf)[JB_SP]; + jb_bsp = ((unsigned long **) buf)[JB_BSP]; + + __sigaltstack (NULL, &stk); + + ss_sp = (unsigned long) stk.ss_sp; + jb_rnat_addr = ia64_rse_rnat_addr (jb_bsp); + + if ((stk.ss_flags & SS_ONSTACK) == 0 || jb_sp - ss_sp < stk.ss_size) + /* Normal non-stack-crossing longjmp; if the RNaT slot for the bsp + saved in the jump-buffer is the same as the one for the current + BSP, use the current AR.RNAT value, otherwise, load it from the + jump-buffer's RNaT-slot. */ + load_rnat = (ia64_rse_rnat_addr ((unsigned long *) c.bsp) != jb_rnat_addr); + else + { + /* If we are on the alternate signal-stack and the jump-buffer + lies outside the signal-stack, we may need to copy back the + dirty partition which was torn off and saved on the + signal-stack when the signal was delivered. + + Caveat: we assume that the top of the alternate signal-stack + stores the sigcontext structure of the signal that + caused the switch to the signal-stack. This should + be a fairly safe assumption but the kernel _could_ + do things differently.. */ + sc = ((struct sigcontext *) ((ss_sp + stk.ss_size) & -16) - 1); + + /* As a sanity-check, verify that the register-backing-store base + of the alternate signal-stack is where we expect it. */ + rbs_base = (unsigned long *) + ((ss_sp + sizeof (long) - 1) & -sizeof (long)); + + assert ((unsigned long) rbs_base == sc->sc_rbs_base); + + ndirty = ia64_rse_num_regs (rbs_base, rbs_base + (sc->sc_loadrs >> 19)); + bsp = (unsigned long *) sc->sc_ar_bsp; + bspstore = ia64_rse_skip_regs (bsp, -ndirty); + + if (bspstore < jb_bsp) + /* AR.BSPSTORE at the time of the signal was below the value + of AR.BSP saved in the jump-buffer => copy the missing + portion from the torn off dirty partition which got saved + on the alternate signal-stack. */ + copy_rbs (bspstore, jb_bsp, sc->sc_ar_rnat, + rbs_base, (unsigned long *) c.bsp, c.rnat); + + load_rnat = 1; + } + if (load_rnat) + rnat = *jb_rnat_addr; + else + rnat = c.rnat; + __ia64_longjmp (buf, val, rnat, c.rsc); +}