public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
From: Gioele Barabucci <gioele@svario.it>
To: systemtap@sourceware.org
Subject: [PATCH] dtrace: Use deterministic temp file creation for all temp files
Date: Mon, 27 Feb 2023 13:13:16 +0100	[thread overview]
Message-ID: <6bfff40c-2c4b-c119-116d-7834310299d7@svario.it> (raw)

`dtrace -G -C` creates temporary files with random filenames. The name
of these temporary files gets embedded in the ELF `.symtab` of the final
object files, making them always slightly different.

This behavior makes all packages that use `dtrace`-produced object files
inherently non reproducible.

To reproduce this issue:

```
$ git clone https://salsa.debian.org/sssd-team/sssd.git
$ cd sssd
$ mkdir -p build && cd build/

$ dtrace -C -G -s ../src/systemtap/sssd_probes.d -o stap_generated_probes.o
$ readelf --wide --symbols stap_generated_probes.o > sym1.txt

$ dtrace -C -G -s ../src/systemtap/sssd_probes.d -o stap_generated_probes.o
$ readelf --wide --symbols stap_generated_probes.o > sym2.txt

$ diff -u sym1.txt sym2.txt
--- sym1.txt    2023-02-27 08:38:48.955299234 +0100
+++ sym2.txt    2023-02-27 08:38:49.103303352 +0100
@@ -2,7 +2,7 @@
  Symbol table '.symtab' contains 59 entries:
  Num:    Value  Size Type    Bind   Vis      Ndx Name
    0: 0000000000   0 NOTYPE  LOCAL  DEFAULT  UND
-  1: 0000000000   0 FILE    LOCAL  DEFAULT  ABS .dtrace-temp.4f0bbdda.c
+  1: 0000000000   0 FILE    LOCAL  DEFAULT  ABS .dtrace-temp.d20e76c7.c
    2: 0000000000   0 SECTION LOCAL  DEFAULT    1 .text
    3: 0000000000   7 FUNC    LOCAL  DEFAULT    1 __dtrace
    4: 0000000000   0 SECTION LOCAL  DEFAULT    5 .debug_info
```

The root cause of this issue is that, although the name of the temporary
file is created in a deterministic way (from the SHA256 of the source
file), the name of the source file is overwritten with a random name
then the `-C` option (`use_cpp`) is used:

```
if s_filename != "" and use_cpp:
     (ignore, fname) = mkstemp(suffix=".d")
     cpp = os.environ.get("CPP", "cpp")
     retcode = call(split(cpp) + [...] + [s_filename, '-o', fname])
     if retcode != 0:
         print("\"cpp includes s_filename\" failed")
         usage()
         return 1
     s_filename = fname

[...]

sha = hashlib.sha256()
sha.update(s_filename.encode('utf-8'))
sha.update(filename.encode('utf-8'))
fname = ".dtrace-temp." + sha.hexdigest()[:8] + ".c"
```

To fix this issue, all temporary files are now created using
the same deterministic procedure currently used only for the
temporary ".c" files.

Fixes: https://bugs.debian.org/1032055
Fixes: https://bugs.debian.org/1032056
Signed-off-by: Gioele Barabucci <gioele@svario.it>
---
  dtrace.in | 50 +++++++++++++++++++++++++++-----------------------
  1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/dtrace.in b/dtrace.in
index adad99bdb..22c1a9d03 100644
--- a/dtrace.in
+++ b/dtrace.in
@@ -27,7 +27,6 @@ import time
  import atexit
  from shlex import split
  from subprocess import call
-from tempfile import mkstemp
  try:
      from pyparsing import alphas, cStyleComment, delimitedList, Group, \
          Keyword, lineno, Literal, nestedExpr, nums, oneOf, OneOrMore, \
@@ -278,6 +277,28 @@ class _ReProvider(_HeaderCreator):
          hdr.close()


+def mktemp_determ(sources, suffix):
+    # for reproducible-builds purposes, use a predictable tmpfile path
+    sha = hashlib.sha256()
+    for source in sources:
+        sha.update(source.encode('utf-8'))
+    fname = ".dtrace-temp." + sha.hexdigest()[:8] + suffix
+    tries = 0
+    while True:
+        tries += 1
+        if tries > 100: # if file exists due to previous crash or whatever
+            raise Exception("cannot create temporary file \""+fname+"\"")
+        try:
+            wxmode = 'x' if sys.version_info > (3,0) else 'wx'
+            fdesc = open(fname, mode=wxmode)
+            break
+        except FileExistsError:
+            time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
+            pass # Try again
+
+    return fdesc, fname
+
+
  def usage():
      print("Usage " + sys.argv[0] + " [--help] [-h | -G] [-C [-I<Path>]] -s File.d [-o <File>]")

@@ -360,7 +381,7 @@ def main():
          return 1

      if s_filename != "" and use_cpp:
-        (ignore, fname) = mkstemp(suffix=".d")
+        (ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
          cpp = os.environ.get("CPP", "cpp")
          retcode = call(split(cpp) + includes + defines + [s_filename, '-o', fname])
          if retcode != 0:
@@ -399,7 +420,7 @@ def main():
              providers = _PypProvider()
          else:
              providers = _ReProvider()
-        (ignore, fname) = mkstemp(suffix=".h")
+        (fdesc, fname) = mktemp_determ(["build_source", s_filename], suffix=".h")
          while True:
              try:
                  providers.probe_write(s_filename, fname)
@@ -413,26 +434,9 @@ def main():
          else:
              print("header: " + fname)

-        # for reproducible-builds purposes, use a predictable tmpfile path
-        sha = hashlib.sha256()
-        sha.update(s_filename.encode('utf-8'))
-        sha.update(filename.encode('utf-8'))
-        fname = ".dtrace-temp." + sha.hexdigest()[:8] + ".c"
-        tries = 0
-        while True:
-            tries += 1
-            if tries > 100: # if file exists due to previous crash or whatever
-                print("cannot create temporary file \""+fname+"\"")
-                return 1
-            try:
-                wxmode = 'x' if sys.version_info > (3,0) else 'wx'
-                fdesc = open(fname, mode=wxmode)
-                if not keep_temps:
-                   atexit.register(os.remove, fname) # delete generated source at exit, even if error
-                break
-            except:
-                time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
-                pass # Try again
+        (fdesc, fname) = mktemp_determ(["build_source", s_filename, filename], suffix=".c")
+        if not keep_temps:
+            atexit.register(os.remove, fname) # delete generated source at exit, even if error
          providers.semaphore_write(fdesc)
          fdesc.close()
          cc1 = os.environ.get("CC", "gcc")
-- 
2.39.2

             reply	other threads:[~2023-02-27 12:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-27 12:13 Gioele Barabucci [this message]
2023-02-27 15:49 ` Florian Weimer
2023-02-27 15:59   ` Gioele Barabucci
2023-02-27 16:47     ` Florian Weimer
2023-02-27 17:15       ` Gioele Barabucci
2023-02-27 17:34         ` Florian Weimer
2023-02-28  3:46           ` Gioele Barabucci
2023-02-28 10:12             ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6bfff40c-2c4b-c119-116d-7834310299d7@svario.it \
    --to=gioele@svario.it \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).