public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Check ld.so/libc.so consistency during startup
@ 2022-08-19 10:16 Florian Weimer
  2022-08-19 10:16 ` [PATCH 1/2] scripts/glibcelf.py: Add hashing support Florian Weimer
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Florian Weimer @ 2022-08-19 10:16 UTC (permalink / raw)
  To: libc-alpha

This avoids mysterious crashes and produces a clear error message
instead.

This helps somewhat with concurrent glibc updates on a live system
because it inhibits coredumps resulting from mismatches.  This avoids
secondary crashes from a coredump catching services that in turn crashes
during launch while the update is in progress.  A full solution to the
concurrent update problem will look very different.

Tested on i686-linux-gnu and x86_64-linux-gnu.  Built with
build-many-glibcs.py.

Thanks,
Florian

Florian Weimer (2):
  scripts/glibcelf.py: Add hashing support
  Detect ld.so and libc.so version inconsistency during startup

 INSTALL                         |  9 ++++
 Makerules                       | 14 ++++++
 NEWS                            |  7 ++-
 config.make.in                  |  1 +
 configure                       | 11 +++++
 configure.ac                    |  5 ++
 csu/libc-start.c                |  1 +
 elf/Versions                    |  4 +-
 elf/dl-call-libc-early-init.c   |  9 ++--
 elf/libc-early-init.h           |  9 +++-
 elf/tst-glibcelf.py             | 19 ++++++++
 manual/install.texi             |  9 ++++
 scripts/glibcelf.py             | 19 ++++++++
 scripts/libc_early_init_name.py | 85 +++++++++++++++++++++++++++++++++
 14 files changed, 194 insertions(+), 8 deletions(-)
 create mode 100644 scripts/libc_early_init_name.py


base-commit: f7b0fc5cc61301461e3c1a278240ce78701bb9a8
-- 
2.37.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] scripts/glibcelf.py: Add hashing support
  2022-08-19 10:16 [PATCH 0/2] Check ld.so/libc.so consistency during startup Florian Weimer
@ 2022-08-19 10:16 ` Florian Weimer
  2022-08-23 15:13   ` Carlos O'Donell
  2022-08-19 10:16 ` [PATCH 2/2] Detect ld.so and libc.so version inconsistency during startup Florian Weimer
  2022-08-23 15:14 ` [PATCH 0/2] Check ld.so/libc.so consistency " Carlos O'Donell
  2 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2022-08-19 10:16 UTC (permalink / raw)
  To: libc-alpha

ELF and GNU hashes can now be computed using the elf_hash and
gnu_hash functions.
---
 elf/tst-glibcelf.py | 19 +++++++++++++++++++
 scripts/glibcelf.py | 19 +++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/elf/tst-glibcelf.py b/elf/tst-glibcelf.py
index bf15a3bad4..e5026e2289 100644
--- a/elf/tst-glibcelf.py
+++ b/elf/tst-glibcelf.py
@@ -240,6 +240,24 @@ def check_constant_values(cc):
             error('{}: glibcelf has {!r}, <elf.h> has {!r}'.format(
                 name, glibcelf_value, elf_h_value))
 
+def check_hashes():
+    for name, expected_elf, expected_gnu in (
+            ('', 0, 0x1505),
+            ('PPPPPPPPPPPP', 0, 0x9f105c45),
+            ('GLIBC_2.0', 0xd696910, 0xf66c3dd5),
+            ('GLIBC_2.34', 0x69691b4, 0xc3f3f90c),
+            ('GLIBC_PRIVATE', 0x963cf85, 0x692a260)):
+        for convert in (lambda x: x, lambda x: x.encode('UTF-8')):
+            name = convert(name)
+            actual_elf = glibcelf.elf_hash(name)
+            if actual_elf != expected_elf:
+                error('elf_hash({!r}): {:x} != 0x{:x}'.format(
+                    name, actual_elf, expected_elf))
+            actual_gnu = glibcelf.gnu_hash(name)
+            if actual_gnu != expected_gnu:
+                error('gnu_hash({!r}): {:x} != 0x{:x}'.format(
+                    name, actual_gnu, expected_gnu))
+
 def main():
     """The main entry point."""
     parser = argparse.ArgumentParser(
@@ -251,6 +269,7 @@ def main():
     check_duplicates()
     check_constant_prefixes()
     check_constant_values(cc=args.cc)
+    check_hashes()
 
     if errors_encountered > 0:
         print("note: errors encountered:", errors_encountered)
diff --git a/scripts/glibcelf.py b/scripts/glibcelf.py
index de0509130e..5c8f46f590 100644
--- a/scripts/glibcelf.py
+++ b/scripts/glibcelf.py
@@ -1158,5 +1158,24 @@ class Image:
         self._stringtab[sh_link] = strtab
         return strtab
 
+def elf_hash(s):
+    """Computes the ELF hash of the string."""
+    acc = 0
+    for ch in s:
+        if type(ch) is not int:
+            ch = ord(ch)
+        acc = ((acc << 4) + ch) & 0xffffffff
+        top = acc & 0xf0000000
+        acc = (acc ^ (top >> 24)) & ~top
+    return acc
+
+def gnu_hash(s):
+    """Computes the GNU hash of the string."""
+    h = 5381
+    for ch in s:
+        if type(ch) is not int:
+            ch = ord(ch)
+        h = (h * 33 + ch) & 0xffffffff
+    return h
 
 __all__ = [name for name in dir() if name[0].isupper()]
-- 
2.37.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/2] Detect ld.so and libc.so version inconsistency during startup
  2022-08-19 10:16 [PATCH 0/2] Check ld.so/libc.so consistency during startup Florian Weimer
  2022-08-19 10:16 ` [PATCH 1/2] scripts/glibcelf.py: Add hashing support Florian Weimer
@ 2022-08-19 10:16 ` Florian Weimer
  2022-08-23 15:13   ` Carlos O'Donell
  2022-08-23 15:14 ` [PATCH 0/2] Check ld.so/libc.so consistency " Carlos O'Donell
  2 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2022-08-19 10:16 UTC (permalink / raw)
  To: libc-alpha

The files NEWS, include/link.h, and sysdeps/generic/ldsodefs.h
contribute to the version fingerprint used for detection.  The
fingerprint can be further refined using the --with-extra-version-id
configure argument.
---
 INSTALL                         |  9 ++++
 Makerules                       | 14 ++++++
 NEWS                            |  7 ++-
 config.make.in                  |  1 +
 configure                       | 11 +++++
 configure.ac                    |  5 ++
 csu/libc-start.c                |  1 +
 elf/Versions                    |  4 +-
 elf/dl-call-libc-early-init.c   |  9 ++--
 elf/libc-early-init.h           |  9 +++-
 manual/install.texi             |  9 ++++
 scripts/libc_early_init_name.py | 85 +++++++++++++++++++++++++++++++++
 12 files changed, 156 insertions(+), 8 deletions(-)
 create mode 100644 scripts/libc_early_init_name.py

diff --git a/INSTALL b/INSTALL
index 659f75a97f..7ad8f96e63 100644
--- a/INSTALL
+++ b/INSTALL
@@ -120,6 +120,15 @@ if 'CFLAGS' is specified it must enable optimization.  For example:
      compiler flags which target a later instruction set architecture
      (ISA).
 
+'--with-extra-version-id=STRING'
+     Use STRING as part of the fingerprint that is used by the dynamic
+     linker to detect an incompatible version of 'libc.so'.  For
+     example, STRING could be the full package version and release
+     string used by a distribution build of the GNU C Library.  This
+     way, concurrent process creation during a package update will faill
+     with an error message, _Fatal glibc error: ld.so/libc.so mismatch
+     detected_, rather than crashing mysteriously.
+
 '--with-timeoutfactor=NUM'
      Specify an integer NUM to scale the timeout of test programs.  This
      factor can be changed at run time using 'TIMEOUTFACTOR' environment
diff --git a/Makerules b/Makerules
index d1e139d03c..756c1f181c 100644
--- a/Makerules
+++ b/Makerules
@@ -112,6 +112,20 @@ before-compile := $(common-objpfx)first-versions.h \
 		  $(common-objpfx)ldbl-compat-choose.h $(before-compile)
 $(common-objpfx)first-versions.h: $(common-objpfx)versions.stmp
 $(common-objpfx)ldbl-compat-choose.h: $(common-objpfx)versions.stmp
+
+# libc_early_init_name.h provides the actual name of the
+# __libc_early_init function.  It is used as a protocol version marker
+# between ld.so and libc.so
+before-compile := $(common-objpfx)libc_early_init_name.h $(before-compile)
+libc_early_init_name-deps = \
+  $(..)NEWS $(..)sysdeps/generic/ldsodefs.h $(..)include/link.h
+$(common-objpfx)libc_early_init_name.h: $(..)scripts/libc_early_init_name.py \
+  $(common-objpfx)config.make $(libc_early_init_name-deps)
+	$(PYTHON) $(..)scripts/libc_early_init_name.py \
+	  --output=$@T \
+	  --extra-version-id="$(extra-version-id)" \
+	  $(libc_early_init_name-deps)
+	$(move-if-change) $@T $@
 endif # avoid-generated
 endif # $(build-shared) = yes
 
diff --git a/NEWS b/NEWS
index f9bef48a8f..b8a9376e1e 100644
--- a/NEWS
+++ b/NEWS
@@ -9,7 +9,12 @@ Version 2.37
 
 Major new features:
 
-  [Add new features here]
+* The dynamic loader now prints an error message, "Fatal glibc error:
+  ld.so/libc.so mismatch detected" if it detects that the version of
+  libc.so it loaded comes from a different build of glibc.  The new
+  configure option --with-extra-version-id can be used to specify an
+  arbitrary string that affects the comptuation of the version
+  fingerprint.
 
 Deprecated and removed features, and other changes affecting compatibility:
 
diff --git a/config.make.in b/config.make.in
index d7c416cbea..ecaffbfd4b 100644
--- a/config.make.in
+++ b/config.make.in
@@ -98,6 +98,7 @@ build-hardcoded-path-in-tests= @hardcoded_path_in_tests@
 build-pt-chown = @build_pt_chown@
 have-tunables = @have_tunables@
 pthread-in-libc = @pthread_in_libc@
+extra-version-id = @extra_version_id@
 
 # Build tools.
 CC = @CC@
diff --git a/configure b/configure
index ff2c406b3b..c576f9f133 100755
--- a/configure
+++ b/configure
@@ -760,6 +760,7 @@ with_headers
 with_default_link
 with_nonshared_cflags
 with_rtld_early_cflags
+with_extra_version_id
 with_timeoutfactor
 enable_sanity_checks
 enable_shared
@@ -1481,6 +1482,9 @@ Optional Packages:
                           build nonshared libraries with additional CFLAGS
   --with-rtld-early-cflags=CFLAGS
                           build early initialization with additional CFLAGS
+  --extra-version-id=STRING
+                          specify an extra version string to use in internal
+                          ABI checks
   --with-timeoutfactor=NUM
                           specify an integer to scale the timeout
   --with-cpu=CPU          select code for CPU variant
@@ -3397,6 +3401,13 @@ fi
 
 
 
+# Check whether --with-extra-version-id was given.
+if test "${with_extra_version_id+set}" = set; then :
+  withval=$with_extra_version_id; extra_version_id="$withval"
+fi
+
+
+
 # Check whether --with-timeoutfactor was given.
 if test "${with_timeoutfactor+set}" = set; then :
   withval=$with_timeoutfactor; timeoutfactor=$withval
diff --git a/configure.ac b/configure.ac
index eb5bc6a131..68baeee4d7 100644
--- a/configure.ac
+++ b/configure.ac
@@ -169,6 +169,11 @@ AC_ARG_WITH([rtld-early-cflags],
 	    [rtld_early_cflags=])
 AC_SUBST(rtld_early_cflags)
 
+AC_ARG_WITH([extra-version-id],
+	    AS_HELP_STRING([--extra-version-id=STRING],
+			   [specify an extra version string to use in internal ABI checks]),
+	    [extra_version_id="$withval"])
+
 AC_ARG_WITH([timeoutfactor],
 	    AS_HELP_STRING([--with-timeoutfactor=NUM],
 			   [specify an integer to scale the timeout]),
diff --git a/csu/libc-start.c b/csu/libc-start.c
index 543560f36c..522f14bbaf 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -37,6 +37,7 @@
 #include <elf-initfini.h>
 #include <shlib-compat.h>
 
+#include <elf/libc-early-init.h>
 #include <elf/dl-tunables.h>
 
 extern void __libc_init_first (int argc, char **argv, char **envp);
diff --git a/elf/Versions b/elf/Versions
index a9ff278de7..6260c0fe03 100644
--- a/elf/Versions
+++ b/elf/Versions
@@ -29,8 +29,8 @@ libc {
     __placeholder_only_for_empty_version_map;
   }
   GLIBC_PRIVATE {
-    # functions used in other libraries
-    __libc_early_init;
+    # A pattern is needed here because the suffix is dynamically generated.
+    __libc_early_init_*;
 
     # Internal error handling support.  Interposes the functions in ld.so.
     _dl_signal_exception; _dl_catch_exception;
diff --git a/elf/dl-call-libc-early-init.c b/elf/dl-call-libc-early-init.c
index ee9860e3ab..1d9436ad50 100644
--- a/elf/dl-call-libc-early-init.c
+++ b/elf/dl-call-libc-early-init.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <assert.h>
 #include <ldsodefs.h>
 #include <libc-early-init.h>
 #include <link.h>
@@ -30,11 +29,13 @@ _dl_call_libc_early_init (struct link_map *libc_map, _Bool initial)
     return;
 
   const ElfW(Sym) *sym
-    = _dl_lookup_direct (libc_map, "__libc_early_init",
-                         0x069682ac, /* dl_new_hash output.  */
+    = _dl_lookup_direct (libc_map, LIBC_EARLY_INIT_NAME_STRING,
+                         LIBC_EARLY_INIT_GNU_HASH,
                          "GLIBC_PRIVATE",
                          0x0963cf85); /* _dl_elf_hash output.  */
-  assert (sym != NULL);
+  if (sym == NULL)
+    _dl_fatal_printf (
+"Fatal glibc error: ld.so/libc.so mismatch detected\n");
   __typeof (__libc_early_init) *early_init
     = DL_SYMBOL_ADDRESS (libc_map, sym);
   early_init (initial);
diff --git a/elf/libc-early-init.h b/elf/libc-early-init.h
index a8edfadfb0..6774c3e7c0 100644
--- a/elf/libc-early-init.h
+++ b/elf/libc-early-init.h
@@ -19,6 +19,8 @@
 #ifndef _LIBC_EARLY_INIT_H
 #define _LIBC_EARLY_INIT_H
 
+#include <libc_early_init_name.h>
+
 struct link_map;
 
 /* If LIBC_MAP is not NULL, look up the __libc_early_init symbol in it
@@ -33,6 +35,11 @@ void _dl_call_libc_early_init (struct link_map *libc_map, _Bool initial)
    startup code.  If INITIAL is true, the libc being initialized is
    the libc for the main program.  INITIAL is false for libcs loaded
    for audit modules, dlmopen, and static dlopen.  */
-void __libc_early_init (_Bool initial);
+void __libc_early_init (_Bool initial)
+#ifdef SHARED
+/* Redirect to the actual implementation name.  */
+  __asm__ (LIBC_EARLY_INIT_NAME_STRING)
+#endif
+  ;
 
 #endif /* _LIBC_EARLY_INIT_H */
diff --git a/manual/install.texi b/manual/install.texi
index c775005581..8b225ab3bb 100644
--- a/manual/install.texi
+++ b/manual/install.texi
@@ -144,6 +144,15 @@ dynamic linker diagnostics to run on CPUs which are not compatible with
 the rest of @theglibc{}, for example, due to compiler flags which target
 a later instruction set architecture (ISA).
 
+@item --with-extra-version-id=@var{string}
+Use @var{string} as part of the fingerprint that is used by the dynamic
+linker to detect an incompatible version of @file{libc.so}.  For
+example, @var{string} could be the full package version and release
+string used by a distribution build of @theglibc{}.  This way,
+concurrent process creation during a package update will faill with an
+error message, @emph{Fatal glibc error: ld.so/libc.so mismatch detected},
+rather than crashing mysteriously.
+
 @item --with-timeoutfactor=@var{NUM}
 Specify an integer @var{NUM} to scale the timeout of test programs.
 This factor can be changed at run time using @env{TIMEOUTFACTOR}
diff --git a/scripts/libc_early_init_name.py b/scripts/libc_early_init_name.py
new file mode 100644
index 0000000000..465cbe9dae
--- /dev/null
+++ b/scripts/libc_early_init_name.py
@@ -0,0 +1,85 @@
+#!/usr/bin/python3
+# Compute the hash-based name of the __libc_early_init function.
+# Copyright (C) 2022 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+#
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <https://www.gnu.org/licenses/>.
+
+"""Compute the name of the __libc_early_init function, which is used
+as a protocol version marker between ld.so and libc.so.
+
+The name contains a hash suffix, and the hash changes if certain key
+files in the source tree change.  Distributions can also configure
+with --with-extra-version-id, to make the computed hash dependent on
+the package version.
+
+"""
+
+import argparse
+import hashlib
+import os
+import string
+import sys
+
+# Make available glibc Python modules.
+sys.path.append(os.path.dirname(os.path.realpath(__file__)))
+
+import glibcelf
+
+# Parse the command line.
+parser = argparse.ArgumentParser(description=__doc__)
+parser.add_argument('--output', metavar='PATH',
+                    help='path to header file this tool generates')
+parser.add_argument('--extra-version-id', metavar='ID',
+                    help='extra string to influence hash computation')
+parser.add_argument('inputs', metavar='PATH', nargs='*',
+                    help='files whose contents influences the generated hash')
+opts = parser.parse_args()
+
+# Obtain the blobs that affect the generated hash.
+blobs = [(opts.extra_version_id or '').encode('UTF-8')]
+for path in opts.inputs:
+    with open(path, 'rb') as inp:
+        blobs.append(inp.read())
+
+# Hash the file boundaries.
+md = hashlib.sha256()
+md.update(repr([len(blob) for blob in blobs]).encode('UTF-8'))
+
+# And then hash the file contents.  Do not hash the paths, to avoid
+# impacting reproducibility.
+for blob in blobs:
+    md.update(blob)
+
+# These are the bits used to compute the suffix.
+derived_bits = int.from_bytes(md.digest(), byteorder='big', signed=False)
+
+# These digits are used in the suffix (should result in base-62 encoding).
+# They must be valid in C identifiers.
+digits = string.digits + string.ascii_letters
+
+# Generate eight digits as a suffix.  They should provide enough
+# uniqueness (47.6 bits).
+name = '__libc_early_init_'
+for n in range(8):
+    name += digits[derived_bits % len(digits)]
+    derived_bits //= len(digits)
+
+# Write the output file.
+with open(opts.output, 'w') if opts.output else sys.stdout as out:
+    out.write('#define LIBC_EARLY_INIT_NAME {}\n'.format(name))
+    out.write('#define LIBC_EARLY_INIT_NAME_STRING "{}"\n'.format(name))
+    out.write('#define LIBC_EARLY_INIT_GNU_HASH {}\n'.format(
+        glibcelf.gnu_hash(name)))
-- 
2.37.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] Detect ld.so and libc.so version inconsistency during startup
  2022-08-19 10:16 ` [PATCH 2/2] Detect ld.so and libc.so version inconsistency during startup Florian Weimer
@ 2022-08-23 15:13   ` Carlos O'Donell
  0 siblings, 0 replies; 7+ messages in thread
From: Carlos O'Donell @ 2022-08-23 15:13 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 8/19/22 06:16, Florian Weimer via Libc-alpha wrote:
> The files NEWS, include/link.h, and sysdeps/generic/ldsodefs.h
> contribute to the version fingerprint used for detection.  The
> fingerprint can be further refined using the --with-extra-version-id
> configure argument.

It makes sense to use link.h, ldsodefs.h because they define the interface between
ld.so and libc. NEWS and the additional --with-extra-version-id provide further
differentiation, which allows per-distro NEVRA bumping to ensure this works
reliably during upgrade.

This solution will fail process starts during the upgrade window where there
is a mismatch and the system will need to restart those processes.

Needs a v2 to fix typos.

Needs a v2 to fix the implementation.

This solution doesn't quite work and I haven't debugged it.

Works:
[carlos@athas glibc-review]$ ./elf/ld.so --library-path /lib64 /bin/true
Fatal glibc error: ld.so/libc.so mismatch detected

Fails:
[carlos@athas glibc-review]$ ./elf/ld.so --library-path /lib64 /bin/bash
Segmentation fault (core dumped)

I would expect the same mismatch detected message. This is possibly as
difficult a problem as the early ISA detection diagnostic since it can
involve resolving early ifuncs and running that code early means running
mismatched ld.so/libc ifuncs.

LD_DEBUG=all shows:
    886519:	symbol=gettimeofday;  lookup in file=/bin/bash [0]
    886519:	symbol=gettimeofday;  lookup in file=/lib64/libtinfo.so.6 [0]
    886519:	symbol=gettimeofday;  lookup in file=/lib64/libc.so.6 [0]
Segmentation fault (core dumped)

I'm not sure what is going wrong here. Early vDSO setup? Early ifunc?

> ---
>  INSTALL                         |  9 ++++
>  Makerules                       | 14 ++++++
>  NEWS                            |  7 ++-
>  config.make.in                  |  1 +
>  configure                       | 11 +++++
>  configure.ac                    |  5 ++
>  csu/libc-start.c                |  1 +
>  elf/Versions                    |  4 +-
>  elf/dl-call-libc-early-init.c   |  9 ++--
>  elf/libc-early-init.h           |  9 +++-
>  manual/install.texi             |  9 ++++
>  scripts/libc_early_init_name.py | 85 +++++++++++++++++++++++++++++++++
>  12 files changed, 156 insertions(+), 8 deletions(-)
>  create mode 100644 scripts/libc_early_init_name.py
> 
> diff --git a/INSTALL b/INSTALL
> index 659f75a97f..7ad8f96e63 100644
> --- a/INSTALL
> +++ b/INSTALL
> @@ -120,6 +120,15 @@ if 'CFLAGS' is specified it must enable optimization.  For example:
>       compiler flags which target a later instruction set architecture
>       (ISA).
>  
> +'--with-extra-version-id=STRING'
> +     Use STRING as part of the fingerprint that is used by the dynamic
> +     linker to detect an incompatible version of 'libc.so'.  For
> +     example, STRING could be the full package version and release
> +     string used by a distribution build of the GNU C Library.  This
> +     way, concurrent process creation during a package update will faill

s/fail/fail/g

> +     with an error message, _Fatal glibc error: ld.so/libc.so mismatch
> +     detected_, rather than crashing mysteriously.

OK.

> +
>  '--with-timeoutfactor=NUM'
>       Specify an integer NUM to scale the timeout of test programs.  This
>       factor can be changed at run time using 'TIMEOUTFACTOR' environment
> diff --git a/Makerules b/Makerules
> index d1e139d03c..756c1f181c 100644
> --- a/Makerules
> +++ b/Makerules
> @@ -112,6 +112,20 @@ before-compile := $(common-objpfx)first-versions.h \
>  		  $(common-objpfx)ldbl-compat-choose.h $(before-compile)
>  $(common-objpfx)first-versions.h: $(common-objpfx)versions.stmp
>  $(common-objpfx)ldbl-compat-choose.h: $(common-objpfx)versions.stmp
> +
> +# libc_early_init_name.h provides the actual name of the
> +# __libc_early_init function.  It is used as a protocol version marker
> +# between ld.so and libc.so

OK. Good comment.

> +before-compile := $(common-objpfx)libc_early_init_name.h $(before-compile)
> +libc_early_init_name-deps = \
> +  $(..)NEWS $(..)sysdeps/generic/ldsodefs.h $(..)include/link.h
> +$(common-objpfx)libc_early_init_name.h: $(..)scripts/libc_early_init_name.py \
> +  $(common-objpfx)config.make $(libc_early_init_name-deps)
> +	$(PYTHON) $(..)scripts/libc_early_init_name.py \
> +	  --output=$@T \
> +	  --extra-version-id="$(extra-version-id)" \
> +	  $(libc_early_init_name-deps)
> +	$(move-if-change) $@T $@

OK.

>  endif # avoid-generated
>  endif # $(build-shared) = yes
>  
> diff --git a/NEWS b/NEWS
> index f9bef48a8f..b8a9376e1e 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,7 +9,12 @@ Version 2.37
>  
>  Major new features:
>  
> -  [Add new features here]
> +* The dynamic loader now prints an error message, "Fatal glibc error:
> +  ld.so/libc.so mismatch detected" if it detects that the version of
> +  libc.so it loaded comes from a different build of glibc.  The new
> +  configure option --with-extra-version-id can be used to specify an
> +  arbitrary string that affects the comptuation of the version
> +  fingerprint.

OK.

>  
>  Deprecated and removed features, and other changes affecting compatibility:
>  
> diff --git a/config.make.in b/config.make.in
> index d7c416cbea..ecaffbfd4b 100644
> --- a/config.make.in
> +++ b/config.make.in
> @@ -98,6 +98,7 @@ build-hardcoded-path-in-tests= @hardcoded_path_in_tests@
>  build-pt-chown = @build_pt_chown@
>  have-tunables = @have_tunables@
>  pthread-in-libc = @pthread_in_libc@
> +extra-version-id = @extra_version_id@

OK.

>  
>  # Build tools.
>  CC = @CC@
> diff --git a/configure b/configure
> index ff2c406b3b..c576f9f133 100755
> --- a/configure
> +++ b/configure
> @@ -760,6 +760,7 @@ with_headers
>  with_default_link
>  with_nonshared_cflags
>  with_rtld_early_cflags
> +with_extra_version_id
>  with_timeoutfactor
>  enable_sanity_checks
>  enable_shared
> @@ -1481,6 +1482,9 @@ Optional Packages:
>                            build nonshared libraries with additional CFLAGS
>    --with-rtld-early-cflags=CFLAGS
>                            build early initialization with additional CFLAGS
> +  --extra-version-id=STRING
> +                          specify an extra version string to use in internal
> +                          ABI checks

OK.

>    --with-timeoutfactor=NUM
>                            specify an integer to scale the timeout
>    --with-cpu=CPU          select code for CPU variant
> @@ -3397,6 +3401,13 @@ fi
>  
>  
>  
> +# Check whether --with-extra-version-id was given.
> +if test "${with_extra_version_id+set}" = set; then :
> +  withval=$with_extra_version_id; extra_version_id="$withval"
> +fi
> +
> +
> +
>  # Check whether --with-timeoutfactor was given.
>  if test "${with_timeoutfactor+set}" = set; then :
>    withval=$with_timeoutfactor; timeoutfactor=$withval
> diff --git a/configure.ac b/configure.ac
> index eb5bc6a131..68baeee4d7 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -169,6 +169,11 @@ AC_ARG_WITH([rtld-early-cflags],
>  	    [rtld_early_cflags=])
>  AC_SUBST(rtld_early_cflags)
>  
> +AC_ARG_WITH([extra-version-id],
> +	    AS_HELP_STRING([--extra-version-id=STRING],
> +			   [specify an extra version string to use in internal ABI checks]),
> +	    [extra_version_id="$withval"])
> +

OK.

>  AC_ARG_WITH([timeoutfactor],
>  	    AS_HELP_STRING([--with-timeoutfactor=NUM],
>  			   [specify an integer to scale the timeout]),
> diff --git a/csu/libc-start.c b/csu/libc-start.c
> index 543560f36c..522f14bbaf 100644
> --- a/csu/libc-start.c
> +++ b/csu/libc-start.c
> @@ -37,6 +37,7 @@
>  #include <elf-initfini.h>
>  #include <shlib-compat.h>
>  
> +#include <elf/libc-early-init.h>

OK.

>  #include <elf/dl-tunables.h>
>  
>  extern void __libc_init_first (int argc, char **argv, char **envp);
> diff --git a/elf/Versions b/elf/Versions
> index a9ff278de7..6260c0fe03 100644
> --- a/elf/Versions
> +++ b/elf/Versions
> @@ -29,8 +29,8 @@ libc {
>      __placeholder_only_for_empty_version_map;
>    }
>    GLIBC_PRIVATE {
> -    # functions used in other libraries
> -    __libc_early_init;
> +    # A pattern is needed here because the suffix is dynamically generated.
> +    __libc_early_init_*;

OK.

I see this in the final build:

   158: 0000000000148ed0   185 FUNC    GLOBAL DEFAULT   10 __libc_early_init_iw1yORkW@@GLIBC_PRIVATE

>  
>      # Internal error handling support.  Interposes the functions in ld.so.
>      _dl_signal_exception; _dl_catch_exception;
> diff --git a/elf/dl-call-libc-early-init.c b/elf/dl-call-libc-early-init.c
> index ee9860e3ab..1d9436ad50 100644
> --- a/elf/dl-call-libc-early-init.c
> +++ b/elf/dl-call-libc-early-init.c
> @@ -16,7 +16,6 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -#include <assert.h>
>  #include <ldsodefs.h>
>  #include <libc-early-init.h>
>  #include <link.h>
> @@ -30,11 +29,13 @@ _dl_call_libc_early_init (struct link_map *libc_map, _Bool initial)
>      return;
>  
>    const ElfW(Sym) *sym
> -    = _dl_lookup_direct (libc_map, "__libc_early_init",
> -                         0x069682ac, /* dl_new_hash output.  */
> +    = _dl_lookup_direct (libc_map, LIBC_EARLY_INIT_NAME_STRING,
> +                         LIBC_EARLY_INIT_GNU_HASH,
>                           "GLIBC_PRIVATE",

OK.

>                           0x0963cf85); /* _dl_elf_hash output.  */
> -  assert (sym != NULL);
> +  if (sym == NULL)
> +    _dl_fatal_printf (
> +"Fatal glibc error: ld.so/libc.so mismatch detected\n");

OK. Look for the new version-specific symbol in libc.

>    __typeof (__libc_early_init) *early_init
>      = DL_SYMBOL_ADDRESS (libc_map, sym);
>    early_init (initial);
> diff --git a/elf/libc-early-init.h b/elf/libc-early-init.h
> index a8edfadfb0..6774c3e7c0 100644
> --- a/elf/libc-early-init.h
> +++ b/elf/libc-early-init.h
> @@ -19,6 +19,8 @@
>  #ifndef _LIBC_EARLY_INIT_H
>  #define _LIBC_EARLY_INIT_H
>  
> +#include <libc_early_init_name.h>
> +
>  struct link_map;
>  
>  /* If LIBC_MAP is not NULL, look up the __libc_early_init symbol in it
> @@ -33,6 +35,11 @@ void _dl_call_libc_early_init (struct link_map *libc_map, _Bool initial)
>     startup code.  If INITIAL is true, the libc being initialized is
>     the libc for the main program.  INITIAL is false for libcs loaded
>     for audit modules, dlmopen, and static dlopen.  */
> -void __libc_early_init (_Bool initial);
> +void __libc_early_init (_Bool initial)
> +#ifdef SHARED
> +/* Redirect to the actual implementation name.  */
> +  __asm__ (LIBC_EARLY_INIT_NAME_STRING)

OK.

> +#endif
> +  ;
>  
>  #endif /* _LIBC_EARLY_INIT_H */
> diff --git a/manual/install.texi b/manual/install.texi
> index c775005581..8b225ab3bb 100644
> --- a/manual/install.texi
> +++ b/manual/install.texi
> @@ -144,6 +144,15 @@ dynamic linker diagnostics to run on CPUs which are not compatible with
>  the rest of @theglibc{}, for example, due to compiler flags which target
>  a later instruction set architecture (ISA).
>  
> +@item --with-extra-version-id=@var{string}
> +Use @var{string} as part of the fingerprint that is used by the dynamic
> +linker to detect an incompatible version of @file{libc.so}.  For
> +example, @var{string} could be the full package version and release
> +string used by a distribution build of @theglibc{}.  This way,
> +concurrent process creation during a package update will faill with an

s/faill/fail/g

> +error message, @emph{Fatal glibc error: ld.so/libc.so mismatch detected},
> +rather than crashing mysteriously.
> +
>  @item --with-timeoutfactor=@var{NUM}
>  Specify an integer @var{NUM} to scale the timeout of test programs.
>  This factor can be changed at run time using @env{TIMEOUTFACTOR}
> diff --git a/scripts/libc_early_init_name.py b/scripts/libc_early_init_name.py
> new file mode 100644
> index 0000000000..465cbe9dae
> --- /dev/null
> +++ b/scripts/libc_early_init_name.py
> @@ -0,0 +1,85 @@
> +#!/usr/bin/python3
> +# Compute the hash-based name of the __libc_early_init function.
> +# Copyright (C) 2022 Free Software Foundation, Inc.
> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <https://www.gnu.org/licenses/>.
> +
> +"""Compute the name of the __libc_early_init function, which is used
> +as a protocol version marker between ld.so and libc.so.
> +
> +The name contains a hash suffix, and the hash changes if certain key
> +files in the source tree change.  Distributions can also configure
> +with --with-extra-version-id, to make the computed hash dependent on
> +the package version.

OK.

> +
> +"""
> +
> +import argparse
> +import hashlib
> +import os
> +import string
> +import sys
> +
> +# Make available glibc Python modules.
> +sys.path.append(os.path.dirname(os.path.realpath(__file__)))
> +
> +import glibcelf
> +
> +# Parse the command line.
> +parser = argparse.ArgumentParser(description=__doc__)
> +parser.add_argument('--output', metavar='PATH',
> +                    help='path to header file this tool generates')
> +parser.add_argument('--extra-version-id', metavar='ID',
> +                    help='extra string to influence hash computation')
> +parser.add_argument('inputs', metavar='PATH', nargs='*',
> +                    help='files whose contents influences the generated hash')
> +opts = parser.parse_args()
> +
> +# Obtain the blobs that affect the generated hash.
> +blobs = [(opts.extra_version_id or '').encode('UTF-8')]
> +for path in opts.inputs:
> +    with open(path, 'rb') as inp:
> +        blobs.append(inp.read())
> +
> +# Hash the file boundaries.
> +md = hashlib.sha256()
> +md.update(repr([len(blob) for blob in blobs]).encode('UTF-8'))
> +
> +# And then hash the file contents.  Do not hash the paths, to avoid
> +# impacting reproducibility.
> +for blob in blobs:
> +    md.update(blob)
> +
> +# These are the bits used to compute the suffix.
> +derived_bits = int.from_bytes(md.digest(), byteorder='big', signed=False)
> +
> +# These digits are used in the suffix (should result in base-62 encoding).
> +# They must be valid in C identifiers.
> +digits = string.digits + string.ascii_letters
> +
> +# Generate eight digits as a suffix.  They should provide enough
> +# uniqueness (47.6 bits).
> +name = '__libc_early_init_'
> +for n in range(8):
> +    name += digits[derived_bits % len(digits)]
> +    derived_bits //= len(digits)
> +
> +# Write the output file.
> +with open(opts.output, 'w') if opts.output else sys.stdout as out:
> +    out.write('#define LIBC_EARLY_INIT_NAME {}\n'.format(name))
> +    out.write('#define LIBC_EARLY_INIT_NAME_STRING "{}"\n'.format(name))
> +    out.write('#define LIBC_EARLY_INIT_GNU_HASH {}\n'.format(
> +        glibcelf.gnu_hash(name)))


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] scripts/glibcelf.py: Add hashing support
  2022-08-19 10:16 ` [PATCH 1/2] scripts/glibcelf.py: Add hashing support Florian Weimer
@ 2022-08-23 15:13   ` Carlos O'Donell
  2022-08-23 17:34     ` Florian Weimer
  0 siblings, 1 reply; 7+ messages in thread
From: Carlos O'Donell @ 2022-08-23 15:13 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 8/19/22 06:16, Florian Weimer via Libc-alpha wrote:
> ELF and GNU hashes can now be computed using the elf_hash and
> gnu_hash functions.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>

> ---
>  elf/tst-glibcelf.py | 19 +++++++++++++++++++
>  scripts/glibcelf.py | 19 +++++++++++++++++++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/elf/tst-glibcelf.py b/elf/tst-glibcelf.py
> index bf15a3bad4..e5026e2289 100644
> --- a/elf/tst-glibcelf.py
> +++ b/elf/tst-glibcelf.py
> @@ -240,6 +240,24 @@ def check_constant_values(cc):
>              error('{}: glibcelf has {!r}, <elf.h> has {!r}'.format(
>                  name, glibcelf_value, elf_h_value))
>  
> +def check_hashes():
> +    for name, expected_elf, expected_gnu in (
> +            ('', 0, 0x1505),
> +            ('PPPPPPPPPPPP', 0, 0x9f105c45),
> +            ('GLIBC_2.0', 0xd696910, 0xf66c3dd5),
> +            ('GLIBC_2.34', 0x69691b4, 0xc3f3f90c),
> +            ('GLIBC_PRIVATE', 0x963cf85, 0x692a260)):
> +        for convert in (lambda x: x, lambda x: x.encode('UTF-8')):
> +            name = convert(name)
> +            actual_elf = glibcelf.elf_hash(name)
> +            if actual_elf != expected_elf:
> +                error('elf_hash({!r}): {:x} != 0x{:x}'.format(
> +                    name, actual_elf, expected_elf))
> +            actual_gnu = glibcelf.gnu_hash(name)
> +            if actual_gnu != expected_gnu:
> +                error('gnu_hash({!r}): {:x} != 0x{:x}'.format(
> +                    name, actual_gnu, expected_gnu))
> +
>  def main():
>      """The main entry point."""
>      parser = argparse.ArgumentParser(
> @@ -251,6 +269,7 @@ def main():
>      check_duplicates()
>      check_constant_prefixes()
>      check_constant_values(cc=args.cc)
> +    check_hashes()
>  
>      if errors_encountered > 0:
>          print("note: errors encountered:", errors_encountered)
> diff --git a/scripts/glibcelf.py b/scripts/glibcelf.py
> index de0509130e..5c8f46f590 100644
> --- a/scripts/glibcelf.py
> +++ b/scripts/glibcelf.py
> @@ -1158,5 +1158,24 @@ class Image:
>          self._stringtab[sh_link] = strtab
>          return strtab
>  
> +def elf_hash(s):
> +    """Computes the ELF hash of the string."""
> +    acc = 0
> +    for ch in s:
> +        if type(ch) is not int:
> +            ch = ord(ch)
> +        acc = ((acc << 4) + ch) & 0xffffffff
> +        top = acc & 0xf0000000
> +        acc = (acc ^ (top >> 24)) & ~top
> +    return acc
> +
> +def gnu_hash(s):
> +    """Computes the GNU hash of the string."""
> +    h = 5381
> +    for ch in s:
> +        if type(ch) is not int:
> +            ch = ord(ch)
> +        h = (h * 33 + ch) & 0xffffffff
> +    return h
>  
>  __all__ = [name for name in dir() if name[0].isupper()]


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] Check ld.so/libc.so consistency during startup
  2022-08-19 10:16 [PATCH 0/2] Check ld.so/libc.so consistency during startup Florian Weimer
  2022-08-19 10:16 ` [PATCH 1/2] scripts/glibcelf.py: Add hashing support Florian Weimer
  2022-08-19 10:16 ` [PATCH 2/2] Detect ld.so and libc.so version inconsistency during startup Florian Weimer
@ 2022-08-23 15:14 ` Carlos O'Donell
  2 siblings, 0 replies; 7+ messages in thread
From: Carlos O'Donell @ 2022-08-23 15:14 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 8/19/22 06:16, Florian Weimer via Libc-alpha wrote:
> This avoids mysterious crashes and produces a clear error message
> instead.

The high-level idea here makes complete sense to me.

In my opinion the gold standard would be:

- If the interface between libc and ld.so was more formally defined then we would
  have a way to know if ld.so could use the the libc that the system was providing.

- The new ld.so is always installed first and knows about past libc interfaces and
  could choose to use a slightly older interface if possible.

We don't have that today, and I am not asking for you to implement this.

What I want to make sure is that what we implement retains some of the invariants
that such a design would have:

- ld.so is always installed first is generally maintained in rpm-based installs
  because of the file name ordering ld.so < libc.so
  - True for Fedora testing.

- ld.so needs to know if libc is new enough that it can work together.
  - Your patch uses a known symbol with a hash as the discriminator.

- ld.so provides better diagnostics if the requirements cannot be met.
  - Your patch provides a diagnostic.

I think your patch meets all of these conditions.

Where we loose a little bit here is that if a developer is making a non-ABI
change to the interface, and just rebuilding ld.so, this will immediately cause
that ld.so not to work with the libc in the build. This is a limited use case
that we can fix if developers report that their workloads are impacted. We
might change how to enable/disable this feature.

Of key importance is going to be the quality of the diagnostic message here
because that will be the first thing users see when they try to use ld.so
from outside of a container with a libc inside a container, and what used
to work by sheer luck may now fail with a diagnostic because of non-ABI
related changes.

We have always said this was unsupported, and now we try to enforce this,
and I can stand behind that technical enforcement.
 
> This helps somewhat with concurrent glibc updates on a live system
> because it inhibits coredumps resulting from mismatches.  This avoids
> secondary crashes from a coredump catching services that in turn crashes
> during launch while the update is in progress.  A full solution to the
> concurrent update problem will look very different.

Agreed.

> Tested on i686-linux-gnu and x86_64-linux-gnu.  Built with
> build-many-glibcs.py.
> 
> Thanks,
> Florian
> 
> Florian Weimer (2):
>   scripts/glibcelf.py: Add hashing support
>   Detect ld.so and libc.so version inconsistency during startup
> 
>  INSTALL                         |  9 ++++
>  Makerules                       | 14 ++++++
>  NEWS                            |  7 ++-
>  config.make.in                  |  1 +
>  configure                       | 11 +++++
>  configure.ac                    |  5 ++
>  csu/libc-start.c                |  1 +
>  elf/Versions                    |  4 +-
>  elf/dl-call-libc-early-init.c   |  9 ++--
>  elf/libc-early-init.h           |  9 +++-
>  elf/tst-glibcelf.py             | 19 ++++++++
>  manual/install.texi             |  9 ++++
>  scripts/glibcelf.py             | 19 ++++++++
>  scripts/libc_early_init_name.py | 85 +++++++++++++++++++++++++++++++++
>  14 files changed, 194 insertions(+), 8 deletions(-)
>  create mode 100644 scripts/libc_early_init_name.py
> 
> 
> base-commit: f7b0fc5cc61301461e3c1a278240ce78701bb9a8


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] scripts/glibcelf.py: Add hashing support
  2022-08-23 15:13   ` Carlos O'Donell
@ 2022-08-23 17:34     ` Florian Weimer
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Weimer @ 2022-08-23 17:34 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

* Carlos O'Donell:

> On 8/19/22 06:16, Florian Weimer via Libc-alpha wrote:
>> ELF and GNU hashes can now be computed using the elf_hash and
>> gnu_hash functions.
>
> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> Tested-by: Carlos O'Donell <carlos@redhat.com>

I've pushed this separately.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-23 17:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-19 10:16 [PATCH 0/2] Check ld.so/libc.so consistency during startup Florian Weimer
2022-08-19 10:16 ` [PATCH 1/2] scripts/glibcelf.py: Add hashing support Florian Weimer
2022-08-23 15:13   ` Carlos O'Donell
2022-08-23 17:34     ` Florian Weimer
2022-08-19 10:16 ` [PATCH 2/2] Detect ld.so and libc.so version inconsistency during startup Florian Weimer
2022-08-23 15:13   ` Carlos O'Donell
2022-08-23 15:14 ` [PATCH 0/2] Check ld.so/libc.so consistency " Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).