public inbox for libc-stable@sourceware.org
 help / color / mirror / Atom feed
* [2.31 COMMITTED 1/2] Fix data race in setting function descriptors during lazy binding on hppa.
@ 2020-05-04 19:59 Aurelien Jarno
  2020-05-04 19:59 ` [2.31 COMMITTED 2/2] Add new file missed in previous hppa commit Aurelien Jarno
  0 siblings, 1 reply; 2+ messages in thread
From: Aurelien Jarno @ 2020-05-04 19:59 UTC (permalink / raw)
  To: libc-stable; +Cc: John David Anglin, Carlos O'Donell

From: John David Anglin <danglin@gcc.gnu.org>

This addresses an issue that is present mainly on SMP machines running
threaded code.  In a typical indirect call or PLT import stub, the
target address is loaded first.  Then the global pointer is loaded into
the PIC register in the delay slot of a branch to the target address.
During lazy binding, the target address is a trampoline which transfers
to _dl_runtime_resolve().

_dl_runtime_resolve() uses the relocation offset stored in the global
pointer and the linkage map stored in the trampoline to find the
relocation.  Then, the function descriptor is updated.

In a multi-threaded application, it is possible for the global pointer
to be updated between the load of the target address and the global
pointer.  When this happens, the relocation offset has been replaced
by the new global pointer.  The function pointer has probably been
updated as well but there is no way to find the address of the function
descriptor and to transfer to the target.  So, _dl_runtime_resolve()
typically crashes.

HP-UX addressed this problem by adding an extra pc-relative branch to
the trampoline.  The descriptor is initially setup to point to the
branch.  The branch then transfers to the trampoline.  This allowed
the trampoline code to figure out which descriptor was being used
without any modification to user code.  I didn't use this approach
as it is more complex and changes function pointer canonicalization.

The order of loading the target address and global pointer in
indirect calls was not consistent with the order used in import stubs.
In particular, $$dyncall and some inline versions of it loaded the
global pointer first.  This was inconsistent with the global pointer
being updated first in dl-machine.h.  Assuming the accesses are
ordered, we want elf_machine_fixup_plt() to store the global pointer
first and calls to load it last.  Then, the global pointer will be
correct when the target function is entered.

However, just to make things more fun, HP added support for
out-of-order execution of accesses in PA 2.0.  The accesses used by
calls are weakly ordered. So, it's possibly under some circumstances
that a function might be entered with the wrong global pointer.
However, HP uses weakly ordered accesses in 64-bit HP-UX, so I assume
that loading the global pointer in the delay slot of the branch must
work consistently.

The basic fix for the race is a combination of modifying user code to
preserve the address of the function descriptor in register %r22 and
setting the least-significant bit in the relocation offset.  The
latter was suggested by Carlos as a way to distinguish relocation
offsets from global pointer values.  Conventionally, %r22 is used
as the address of the function descriptor in calls to $$dyncall.
So, it wasn't hard to preserve the address in %r22.

I have updated gcc trunk and gcc-9 branch to not clobber %r22 in
$$dyncall and inline indirect calls.  I have also modified the import
stubs in binutils trunk and the 2.33 branch to preserve %r22.  This
required making the stubs one instruction longer but we save one
relocation.  I also modified binutils to align the .plt section on
a 8-byte boundary.  This allows descriptors to be updated atomically
with a floting-point store.

With these changes, _dl_runtime_resolve() can fallback to an alternate
mechanism to find the relocation offset when it has been clobbered.
There's just one additional instruction in the fast path. I tested
the fallback function, _dl_fix_reloc_arg(), by changing the branch to
always use the fallback.  Old code still runs as it did before.

Fixes bug 23296.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 1a044511a3f9020c3f430164e0a6a77426fecd7e)
---
 NEWS                                          |  1 +
 sysdeps/hppa/dl-fptr.c                        | 26 +++++--
 sysdeps/hppa/dl-machine.h                     | 36 +++++++--
 sysdeps/hppa/dl-trampoline.S                  | 74 ++++++++++++++++---
 sysdeps/unix/sysv/linux/hppa/atomic-machine.h | 28 +++++++
 5 files changed, 143 insertions(+), 22 deletions(-)

diff --git a/NEWS b/NEWS
index 15c040a83ed..be2d8b1f617 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,7 @@ Version 2.31.1
 
 The following bugs are resolved with this release:
   [20543] Please move from .gnu.linkonce to comdat
+  [23296] Data race in setting function descriptor during lazy binding
   [25487] sinl() stack corruption from crafted input (CVE-2020-10029)
   [25523] MIPS/Linux inline syscall template is miscompiled
   [25623] test-sysvmsg, test-sysvsem, test-sysvshm fail with 2.31 on 32 bit and
diff --git a/sysdeps/hppa/dl-fptr.c b/sysdeps/hppa/dl-fptr.c
index 0a373972840..25ca8f84631 100644
--- a/sysdeps/hppa/dl-fptr.c
+++ b/sysdeps/hppa/dl-fptr.c
@@ -172,8 +172,8 @@ make_fdesc (ElfW(Addr) ip, ElfW(Addr) gp)
     }
 
  install:
-  fdesc->ip = ip;
   fdesc->gp = gp;
+  fdesc->ip = ip;
 
   return (ElfW(Addr)) fdesc;
 }
@@ -350,7 +350,9 @@ ElfW(Addr)
 _dl_lookup_address (const void *address)
 {
   ElfW(Addr) addr = (ElfW(Addr)) address;
-  unsigned int *desc, *gptr;
+  ElfW(Word) reloc_arg;
+  volatile unsigned int *desc;
+  unsigned int *gptr;
 
   /* Return ADDR if the least-significant two bits of ADDR are not consistent
      with ADDR being a linker defined function pointer.  The normal value for
@@ -367,7 +369,11 @@ _dl_lookup_address (const void *address)
   if (!_dl_read_access_allowed (desc))
     return addr;
 
-  /* Load first word of candidate descriptor.  It should be a pointer
+  /* First load the relocation offset.  */
+  reloc_arg = (ElfW(Word)) desc[1];
+  atomic_full_barrier();
+
+  /* Then load first word of candidate descriptor.  It should be a pointer
      with word alignment and point to memory that can be read.  */
   gptr = (unsigned int *) desc[0];
   if (((unsigned int) gptr & 3) != 0
@@ -377,8 +383,8 @@ _dl_lookup_address (const void *address)
   /* See if descriptor requires resolution.  The following trampoline is
      used in each global offset table for function resolution:
 
-		ldw 0(r20),r22
-		bv r0(r22)
+		ldw 0(r20),r21
+		bv r0(r21)
 		ldw 4(r20),r21
      tramp:	b,l .-12,r20
 		depwi 0,31,2,r20
@@ -389,7 +395,15 @@ _dl_lookup_address (const void *address)
   if (gptr[0] == 0xea9f1fdd			/* b,l .-12,r20     */
       && gptr[1] == 0xd6801c1e			/* depwi 0,31,2,r20 */
       && (ElfW(Addr)) gptr[2] == elf_machine_resolve ())
-    _dl_fixup ((struct link_map *) gptr[5], (ElfW(Word)) desc[1]);
+    {
+      struct link_map *l = (struct link_map *) gptr[5];
+
+      /* If gp has been resolved, we need to hunt for relocation offset.  */
+      if (!(reloc_arg & PA_GP_RELOC))
+	reloc_arg = _dl_fix_reloc_arg (addr, l);
+
+      _dl_fixup (l, reloc_arg);
+    }
 
   return (ElfW(Addr)) desc[0];
 }
diff --git a/sysdeps/hppa/dl-machine.h b/sysdeps/hppa/dl-machine.h
index 9e98366ea3b..8ecff97706f 100644
--- a/sysdeps/hppa/dl-machine.h
+++ b/sysdeps/hppa/dl-machine.h
@@ -48,6 +48,14 @@
 #define GOT_FROM_PLT_STUB (4*4)
 #define PLT_ENTRY_SIZE (2*4)
 
+/* The gp slot in the function descriptor contains the relocation offset
+   before resolution.  To distinguish between a resolved gp value and an
+   unresolved relocation offset we set an unused bit in the relocation
+   offset.  This would allow us to do a synchronzied two word update
+   using this bit (interlocked update), but instead of waiting for the
+   update we simply recompute the gp value given that we know the ip.  */
+#define PA_GP_RELOC 1
+
 /* Initialize the function descriptor table before relocations */
 static inline void
 __hppa_init_bootstrap_fdesc_table (struct link_map *map)
@@ -117,10 +125,28 @@ elf_machine_fixup_plt (struct link_map *map, lookup_t t,
   volatile Elf32_Addr *rfdesc = reloc_addr;
   /* map is the link_map for the caller, t is the link_map for the object
      being called */
-  rfdesc[1] = value.gp;
-  /* Need to ensure that the gp is visible before the code
-     entry point is updated */
-  rfdesc[0] = value.ip;
+
+  /* We would like the function descriptor to be double word aligned.  This
+     helps performance (ip and gp then reside on the same cache line) and
+     we can update the pair atomically with a single store.  The linker
+     now ensures this alignment but we still have to handle old code.  */
+  if ((unsigned int)reloc_addr & 7)
+    {
+      /* Need to ensure that the gp is visible before the code
+         entry point is updated */
+      rfdesc[1] = value.gp;
+      atomic_full_barrier();
+      rfdesc[0] = value.ip;
+    }
+  else
+    {
+      /* Update pair atomically with floating point store.  */
+      union { ElfW(Word) v[2]; double d; } u;
+
+      u.v[0] = value.ip;
+      u.v[1] = value.gp;
+      *(volatile double *)rfdesc = u.d;
+    }
   return value;
 }
 
@@ -265,7 +291,7 @@ elf_machine_runtime_setup (struct link_map *l, int lazy, int profile)
 		     here.  The trampoline code will load the proper
 		     LTP and pass the reloc offset to the fixup
 		     function.  */
-		  fptr->gp = iplt - jmprel;
+		  fptr->gp = (iplt - jmprel) | PA_GP_RELOC;
 		} /* r_sym != 0 */
 	      else
 		{
diff --git a/sysdeps/hppa/dl-trampoline.S b/sysdeps/hppa/dl-trampoline.S
index 0114ca8b194..d0804b30c03 100644
--- a/sysdeps/hppa/dl-trampoline.S
+++ b/sysdeps/hppa/dl-trampoline.S
@@ -31,7 +31,7 @@
    slow down __cffc when it attempts to call fixup to resolve function
    descriptor references. Please refer to gcc/gcc/config/pa/fptr.c
 
-   Enter with r19 = reloc offset, r20 = got-8, r21 = fixup ltp.  */
+   Enter with r19 = reloc offset, r20 = got-8, r21 = fixup ltp, r22 = fp.  */
 
 	/* RELOCATION MARKER: bl to provide gcc's __cffc with fixup loc. */
 	.text
@@ -61,17 +61,20 @@ _dl_runtime_resolve:
 	copy	%sp, %r1	/* Copy previous sp */
 	/* Save function result address (on entry) */
 	stwm	%r28,128(%sp)
-	/* Fillin some frame info to follow ABI */
+	/* Fill in some frame info to follow ABI */
 	stw	%r1,-4(%sp)	/* Previous sp */
 	stw	%r21,-32(%sp)	/* PIC register value */
 
 	/* Save input floating point registers. This must be done
 	   in the new frame since the previous frame doesn't have
 	   enough space */
-	ldo	-56(%sp),%r1
+	ldo	-64(%sp),%r1
 	fstd,ma	%fr4,-8(%r1)
 	fstd,ma	%fr5,-8(%r1)
 	fstd,ma	%fr6,-8(%r1)
+
+	/* Test PA_GP_RELOC bit.  */
+	bb,>=	%r19,31,2f		/* branch if not reloc offset */
 	fstd,ma	%fr7,-8(%r1)
 
 	/* Set up args to fixup func, needs only two arguments  */
@@ -79,7 +82,7 @@ _dl_runtime_resolve:
 	copy	%r19,%r25		/* (2) reloc offset  */
 
 	/* Call the real address resolver. */
-	bl	_dl_fixup,%rp
+3:	bl	_dl_fixup,%rp
 	copy	%r21,%r19		/* set fixup func ltp */
 
 	/* While the linker will set a function pointer to NULL when it
@@ -102,7 +105,7 @@ _dl_runtime_resolve:
 	copy	%r29, %r19
 
 	/* Reload arguments fp args */
-	ldo	-56(%sp),%r1
+	ldo	-64(%sp),%r1
 	fldd,ma	-8(%r1),%fr4
 	fldd,ma	-8(%r1),%fr5
 	fldd,ma	-8(%r1),%fr6
@@ -129,6 +132,25 @@ _dl_runtime_resolve:
 	bv	%r0(%rp)
 	ldo	-128(%sp),%sp
 
+2:
+	/* Set up args for _dl_fix_reloc_arg.  */
+	copy	%r22,%r26		/* (1) function pointer */
+	depi	0,31,2,%r26		/* clear least significant bits */
+	ldw	8+4(%r20),%r25		/* (2) got[1] == struct link_map */
+
+	/* Save ltp and link map arg for _dl_fixup.  */
+	stw	%r21,-56(%sp)		/* ltp */
+	stw	%r25,-60(%sp)		/* struct link map */
+
+	/* Find reloc offset. */
+	bl	_dl_fix_reloc_arg,%rp
+	copy	%r21,%r19		/* set func ltp */
+
+	/* Set up args for _dl_fixup.  */
+	ldw	-56(%sp),%r21		/* ltp */
+	ldw	-60(%sp),%r26		/* (1) struct link map */
+	b	3b
+	copy	%ret0,%r25		/* (2) reloc offset */
         .EXIT
         .PROCEND
 	cfi_endproc
@@ -153,7 +175,7 @@ _dl_runtime_profile:
 	copy	%sp, %r1	/* Copy previous sp */
 	/* Save function result address (on entry) */
 	stwm	%r28,192(%sp)
-	/* Fillin some frame info to follow ABI */
+	/* Fill in some frame info to follow ABI */
 	stw	%r1,-4(%sp)	/* Previous sp */
 	stw	%r21,-32(%sp)	/* PIC register value */
 
@@ -181,10 +203,11 @@ _dl_runtime_profile:
 	fstd,ma	%fr5,8(%r1)
 	fstd,ma	%fr6,8(%r1)
 	fstd,ma	%fr7,8(%r1)
-	/* 32-bit stack pointer and return register */
-	stw	%sp,-56(%sp)
-	stw	%r2,-52(%sp)
 
+	/* Test PA_GP_RELOC bit.  */
+	bb,>=	%r19,31,2f		/* branch if not reloc offset */
+	/* 32-bit stack pointer */
+	stw	%sp,-56(%sp)
 
 	/* Set up args to fixup func, needs five arguments  */
 	ldw	8+4(%r20),%r26		/* (1) got[1] == struct link_map */
@@ -197,7 +220,7 @@ _dl_runtime_profile:
 	stw	%r1, -52(%sp)		/* (5) long int *framesizep */
 
 	/* Call the real address resolver. */
-	bl	_dl_profile_fixup,%rp
+3:	bl	_dl_profile_fixup,%rp
 	copy	%r21,%r19		/* set fixup func ltp */
 
 	/* Load up the returned function descriptor */
@@ -215,7 +238,9 @@ _dl_runtime_profile:
 	fldd,ma	8(%r1),%fr5
 	fldd,ma	8(%r1),%fr6
 	fldd,ma	8(%r1),%fr7
-	ldw	-52(%sp),%rp
+
+	/* Reload rp register -(192+20) without adjusting stack */
+	ldw	-212(%sp),%rp
 
 	/* Reload static link register -(192+16) without adjusting stack */
 	ldw	-208(%sp),%r29
@@ -303,6 +328,33 @@ L(cont):
         ldw -20(%sp),%rp
 	/* Return */
 	bv,n	0(%r2)
+
+2:
+	/* Set up args for _dl_fix_reloc_arg.  */
+	copy	%r22,%r26		/* (1) function pointer */
+	depi	0,31,2,%r26		/* clear least significant bits */
+	ldw	8+4(%r20),%r25		/* (2) got[1] == struct link_map */
+
+	/* Save ltp and link map arg for _dl_fixup.  */
+	stw	%r21,-92(%sp)		/* ltp */
+	stw	%r25,-116(%sp)		/* struct link map */
+
+	/* Find reloc offset. */
+	bl	_dl_fix_reloc_arg,%rp
+	copy	%r21,%r19		/* set func ltp */
+
+	 /* Restore fixup ltp.  */
+	ldw	-92(%sp),%r21		/* ltp */
+
+	/* Set up args to fixup func, needs five arguments  */
+	ldw	-116(%sp),%r26		/* (1) struct link map */
+	copy	%ret0,%r25		/* (2) reloc offset  */
+	stw	%r25,-120(%sp)		/* Save reloc offset */
+	ldw	-212(%sp),%r24		/* (3) profile_fixup needs rp */
+	ldo	-56(%sp),%r23		/* (4) La_hppa_regs */
+	ldo	-112(%sp), %r1
+	b	3b
+	stw	%r1, -52(%sp)		/* (5) long int *framesizep */
         .EXIT
         .PROCEND
 	cfi_endproc
diff --git a/sysdeps/unix/sysv/linux/hppa/atomic-machine.h b/sysdeps/unix/sysv/linux/hppa/atomic-machine.h
index 9d8ffbe8603..bf61b66b705 100644
--- a/sysdeps/unix/sysv/linux/hppa/atomic-machine.h
+++ b/sysdeps/unix/sysv/linux/hppa/atomic-machine.h
@@ -36,9 +36,37 @@ typedef uintptr_t uatomicptr_t;
 typedef intmax_t atomic_max_t;
 typedef uintmax_t uatomic_max_t;
 
+#define atomic_full_barrier() __sync_synchronize ()
+
 #define __HAVE_64B_ATOMICS 0
 #define USE_ATOMIC_COMPILER_BUILTINS 0
 
+/* We use the compiler atomic load and store builtins as the generic
+   defines are not atomic.  In particular, we need to use compare and
+   exchange for stores as the implementation is synthesized.  */
+void __atomic_link_error (void);
+#define __atomic_check_size_ls(mem) \
+ if ((sizeof (*mem) != 1) && (sizeof (*mem) != 2) && sizeof (*mem) != 4)    \
+   __atomic_link_error ();
+
+#define atomic_load_relaxed(mem) \
+ ({ __atomic_check_size_ls((mem));                                           \
+    __atomic_load_n ((mem), __ATOMIC_RELAXED); })
+#define atomic_load_acquire(mem) \
+ ({ __atomic_check_size_ls((mem));                                           \
+    __atomic_load_n ((mem), __ATOMIC_ACQUIRE); })
+
+#define atomic_store_relaxed(mem, val) \
+ do {                                                                        \
+   __atomic_check_size_ls((mem));                                            \
+   __atomic_store_n ((mem), (val), __ATOMIC_RELAXED);                        \
+ } while (0)
+#define atomic_store_release(mem, val) \
+ do {                                                                        \
+   __atomic_check_size_ls((mem));                                            \
+   __atomic_store_n ((mem), (val), __ATOMIC_RELEASE);                        \
+ } while (0)
+
 /* XXX Is this actually correct?  */
 #define ATOMIC_EXCHANGE_USES_CAS 1
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [2.31 COMMITTED 2/2] Add new file missed in previous hppa commit.
  2020-05-04 19:59 [2.31 COMMITTED 1/2] Fix data race in setting function descriptors during lazy binding on hppa Aurelien Jarno
@ 2020-05-04 19:59 ` Aurelien Jarno
  0 siblings, 0 replies; 2+ messages in thread
From: Aurelien Jarno @ 2020-05-04 19:59 UTC (permalink / raw)
  To: libc-stable; +Cc: John David Anglin

From: John David Anglin <danglin@gcc.gnu.org>

(cherry picked from commit acdcca72940e060270e4e54d9c0457398110f409)
---
 sysdeps/hppa/dl-runtime.c | 58 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)
 create mode 100644 sysdeps/hppa/dl-runtime.c

diff --git a/sysdeps/hppa/dl-runtime.c b/sysdeps/hppa/dl-runtime.c
new file mode 100644
index 00000000000..885a3f1837c
--- /dev/null
+++ b/sysdeps/hppa/dl-runtime.c
@@ -0,0 +1,58 @@
+/* On-demand PLT fixup for shared objects.  HPPA version.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+/* Clear PA_GP_RELOC bit in relocation offset.  */
+#define reloc_offset (reloc_arg & ~PA_GP_RELOC)
+#define reloc_index  (reloc_arg & ~PA_GP_RELOC) / sizeof (PLTREL)
+
+#include <elf/dl-runtime.c>
+
+/* The caller has encountered a partially relocated function descriptor.
+   The gp of the descriptor has been updated, but not the ip.  We find
+   the function descriptor again and compute the relocation offset and
+   return that to the caller.  The caller will continue on to call
+   _dl_fixup with the relocation offset.  */
+
+ElfW(Word)
+attribute_hidden __attribute ((noinline)) ARCH_FIXUP_ATTRIBUTE
+_dl_fix_reloc_arg (struct fdesc *fptr, struct link_map *l)
+{
+  Elf32_Addr l_addr, iplt, jmprel, end_jmprel, r_type;
+  const Elf32_Rela *reloc;
+
+  l_addr = l->l_addr;
+  jmprel = D_PTR(l, l_info[DT_JMPREL]);
+  end_jmprel = jmprel + l->l_info[DT_PLTRELSZ]->d_un.d_val;
+
+  /* Look for the entry...  */
+  for (iplt = jmprel; iplt < end_jmprel; iplt += sizeof (Elf32_Rela))
+    {
+      reloc = (const Elf32_Rela *) iplt;
+      r_type = ELF32_R_TYPE (reloc->r_info);
+
+      if (__builtin_expect (r_type == R_PARISC_IPLT, 1)
+	  && fptr == (struct fdesc *) (reloc->r_offset + l_addr))
+	/* Found entry. Return the reloc offset.  */
+	return iplt - jmprel;
+    }
+
+  /* Crash if we weren't passed a valid function pointer.  */
+  ABORT_INSTRUCTION;
+  return 0;
+}
-- 
2.26.2


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-05-04 19:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04 19:59 [2.31 COMMITTED 1/2] Fix data race in setting function descriptors during lazy binding on hppa Aurelien Jarno
2020-05-04 19:59 ` [2.31 COMMITTED 2/2] Add new file missed in previous hppa commit Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).