public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] dtrace: Use deterministic temp file creation for all temp files
@ 2023-02-27 12:13 Gioele Barabucci
  2023-02-27 15:49 ` Florian Weimer
  0 siblings, 1 reply; 8+ messages in thread
From: Gioele Barabucci @ 2023-02-27 12:13 UTC (permalink / raw)
  To: systemtap

`dtrace -G -C` creates temporary files with random filenames. The name
of these temporary files gets embedded in the ELF `.symtab` of the final
object files, making them always slightly different.

This behavior makes all packages that use `dtrace`-produced object files
inherently non reproducible.

To reproduce this issue:

```
$ git clone https://salsa.debian.org/sssd-team/sssd.git
$ cd sssd
$ mkdir -p build && cd build/

$ dtrace -C -G -s ../src/systemtap/sssd_probes.d -o stap_generated_probes.o
$ readelf --wide --symbols stap_generated_probes.o > sym1.txt

$ dtrace -C -G -s ../src/systemtap/sssd_probes.d -o stap_generated_probes.o
$ readelf --wide --symbols stap_generated_probes.o > sym2.txt

$ diff -u sym1.txt sym2.txt
--- sym1.txt    2023-02-27 08:38:48.955299234 +0100
+++ sym2.txt    2023-02-27 08:38:49.103303352 +0100
@@ -2,7 +2,7 @@
  Symbol table '.symtab' contains 59 entries:
  Num:    Value  Size Type    Bind   Vis      Ndx Name
    0: 0000000000   0 NOTYPE  LOCAL  DEFAULT  UND
-  1: 0000000000   0 FILE    LOCAL  DEFAULT  ABS .dtrace-temp.4f0bbdda.c
+  1: 0000000000   0 FILE    LOCAL  DEFAULT  ABS .dtrace-temp.d20e76c7.c
    2: 0000000000   0 SECTION LOCAL  DEFAULT    1 .text
    3: 0000000000   7 FUNC    LOCAL  DEFAULT    1 __dtrace
    4: 0000000000   0 SECTION LOCAL  DEFAULT    5 .debug_info
```

The root cause of this issue is that, although the name of the temporary
file is created in a deterministic way (from the SHA256 of the source
file), the name of the source file is overwritten with a random name
then the `-C` option (`use_cpp`) is used:

```
if s_filename != "" and use_cpp:
     (ignore, fname) = mkstemp(suffix=".d")
     cpp = os.environ.get("CPP", "cpp")
     retcode = call(split(cpp) + [...] + [s_filename, '-o', fname])
     if retcode != 0:
         print("\"cpp includes s_filename\" failed")
         usage()
         return 1
     s_filename = fname

[...]

sha = hashlib.sha256()
sha.update(s_filename.encode('utf-8'))
sha.update(filename.encode('utf-8'))
fname = ".dtrace-temp." + sha.hexdigest()[:8] + ".c"
```

To fix this issue, all temporary files are now created using
the same deterministic procedure currently used only for the
temporary ".c" files.

Fixes: https://bugs.debian.org/1032055
Fixes: https://bugs.debian.org/1032056
Signed-off-by: Gioele Barabucci <gioele@svario.it>
---
  dtrace.in | 50 +++++++++++++++++++++++++++-----------------------
  1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/dtrace.in b/dtrace.in
index adad99bdb..22c1a9d03 100644
--- a/dtrace.in
+++ b/dtrace.in
@@ -27,7 +27,6 @@ import time
  import atexit
  from shlex import split
  from subprocess import call
-from tempfile import mkstemp
  try:
      from pyparsing import alphas, cStyleComment, delimitedList, Group, \
          Keyword, lineno, Literal, nestedExpr, nums, oneOf, OneOrMore, \
@@ -278,6 +277,28 @@ class _ReProvider(_HeaderCreator):
          hdr.close()


+def mktemp_determ(sources, suffix):
+    # for reproducible-builds purposes, use a predictable tmpfile path
+    sha = hashlib.sha256()
+    for source in sources:
+        sha.update(source.encode('utf-8'))
+    fname = ".dtrace-temp." + sha.hexdigest()[:8] + suffix
+    tries = 0
+    while True:
+        tries += 1
+        if tries > 100: # if file exists due to previous crash or whatever
+            raise Exception("cannot create temporary file \""+fname+"\"")
+        try:
+            wxmode = 'x' if sys.version_info > (3,0) else 'wx'
+            fdesc = open(fname, mode=wxmode)
+            break
+        except FileExistsError:
+            time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
+            pass # Try again
+
+    return fdesc, fname
+
+
  def usage():
      print("Usage " + sys.argv[0] + " [--help] [-h | -G] [-C [-I<Path>]] -s File.d [-o <File>]")

@@ -360,7 +381,7 @@ def main():
          return 1

      if s_filename != "" and use_cpp:
-        (ignore, fname) = mkstemp(suffix=".d")
+        (ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
          cpp = os.environ.get("CPP", "cpp")
          retcode = call(split(cpp) + includes + defines + [s_filename, '-o', fname])
          if retcode != 0:
@@ -399,7 +420,7 @@ def main():
              providers = _PypProvider()
          else:
              providers = _ReProvider()
-        (ignore, fname) = mkstemp(suffix=".h")
+        (fdesc, fname) = mktemp_determ(["build_source", s_filename], suffix=".h")
          while True:
              try:
                  providers.probe_write(s_filename, fname)
@@ -413,26 +434,9 @@ def main():
          else:
              print("header: " + fname)

-        # for reproducible-builds purposes, use a predictable tmpfile path
-        sha = hashlib.sha256()
-        sha.update(s_filename.encode('utf-8'))
-        sha.update(filename.encode('utf-8'))
-        fname = ".dtrace-temp." + sha.hexdigest()[:8] + ".c"
-        tries = 0
-        while True:
-            tries += 1
-            if tries > 100: # if file exists due to previous crash or whatever
-                print("cannot create temporary file \""+fname+"\"")
-                return 1
-            try:
-                wxmode = 'x' if sys.version_info > (3,0) else 'wx'
-                fdesc = open(fname, mode=wxmode)
-                if not keep_temps:
-                   atexit.register(os.remove, fname) # delete generated source at exit, even if error
-                break
-            except:
-                time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
-                pass # Try again
+        (fdesc, fname) = mktemp_determ(["build_source", s_filename, filename], suffix=".c")
+        if not keep_temps:
+            atexit.register(os.remove, fname) # delete generated source at exit, even if error
          providers.semaphore_write(fdesc)
          fdesc.close()
          cc1 = os.environ.get("CC", "gcc")
-- 
2.39.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
  2023-02-27 12:13 [PATCH] dtrace: Use deterministic temp file creation for all temp files Gioele Barabucci
@ 2023-02-27 15:49 ` Florian Weimer
  2023-02-27 15:59   ` Gioele Barabucci
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2023-02-27 15:49 UTC (permalink / raw)
  To: Gioele Barabucci via Systemtap; +Cc: Gioele Barabucci

* Gioele Barabucci via Systemtap:

> +def mktemp_determ(sources, suffix):
> +    # for reproducible-builds purposes, use a predictable tmpfile path
> +    sha = hashlib.sha256()
> +    for source in sources:
> +        sha.update(source.encode('utf-8'))
> +    fname = ".dtrace-temp." + sha.hexdigest()[:8] + suffix
> +    tries = 0
> +    while True:
> +        tries += 1
> +        if tries > 100: # if file exists due to previous crash or whatever
> +            raise Exception("cannot create temporary file \""+fname+"\"")
> +        try:
> +            wxmode = 'x' if sys.version_info > (3,0) else 'wx'
> +            fdesc = open(fname, mode=wxmode)
> +            break
> +        except FileExistsError:
> +            time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
> +            pass # Try again
> +
> +    return fdesc, fname

This looks like creating a file with a suitable name may block forward
progress indefinitely?  Like from a previous crash of the tool?

It might be more robust to use a dedicated temporary directory and a
predictable file name under that directory.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
  2023-02-27 15:49 ` Florian Weimer
@ 2023-02-27 15:59   ` Gioele Barabucci
  2023-02-27 16:47     ` Florian Weimer
  0 siblings, 1 reply; 8+ messages in thread
From: Gioele Barabucci @ 2023-02-27 15:59 UTC (permalink / raw)
  To: Florian Weimer, Gioele Barabucci via Systemtap

On 27/02/23 16:49, Florian Weimer wrote:
>> +def mktemp_determ(sources, suffix):
>> +    # for reproducible-builds purposes, use a predictable tmpfile path
>> +    sha = hashlib.sha256()
>> +    for source in sources:
>> +        sha.update(source.encode('utf-8'))
>> +    fname = ".dtrace-temp." + sha.hexdigest()[:8] + suffix
>> +    tries = 0
>> +    while True:
>> +        tries += 1
>> +        if tries > 100: # if file exists due to previous crash or whatever
>> +            raise Exception("cannot create temporary file \""+fname+"\"")
>> +        try:
>> +            wxmode = 'x' if sys.version_info > (3,0) else 'wx'
>> +            fdesc = open(fname, mode=wxmode)
>> +            break
>> +        except FileExistsError:
>> +            time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
>> +            pass # Try again
>> +
>> +    return fdesc, fname
> 
> This looks like creating a file with a suitable name may block forward
> progress indefinitely?  Like from a previous crash of the tool?

Hi Florian,

yes, that's correct. This code will exit with a FileExistsError if a 
temporary file from a previous run are still around.

However this is not a new behavior: this piece of code is already in 
dtrace. What this patch does is using it for all intermediate tempfiles, 
not just for the final temporary files.

> It might be more robust to use a dedicated temporary directory and a
> predictable file name under that directory.

Doesn't this suffer from the same issue? If dtrace finds that 
predictable dir/file path it will exit (impeding a second run). Or am I 
missing something?

Regards,

-- 
Gioele Barabucci


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
  2023-02-27 15:59   ` Gioele Barabucci
@ 2023-02-27 16:47     ` Florian Weimer
  2023-02-27 17:15       ` Gioele Barabucci
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2023-02-27 16:47 UTC (permalink / raw)
  To: Gioele Barabucci; +Cc: Gioele Barabucci via Systemtap

* Gioele Barabucci:

>> It might be more robust to use a dedicated temporary directory and a
>> predictable file name under that directory.
>
> Doesn't this suffer from the same issue? If dtrace finds that
> predictable dir/file path it will exit (impeding a second run). Or am
> I missing something?

The idea is to use a predictable file name in a temporary directory with
an unpredictable name (created using mkdtemp or one of the higher-level
facilities).

Thanks,
Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
  2023-02-27 16:47     ` Florian Weimer
@ 2023-02-27 17:15       ` Gioele Barabucci
  2023-02-27 17:34         ` Florian Weimer
  0 siblings, 1 reply; 8+ messages in thread
From: Gioele Barabucci @ 2023-02-27 17:15 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Gioele Barabucci via Systemtap

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

On 27/02/23 17:47, Florian Weimer wrote:
>>> It might be more robust to use a dedicated temporary directory and a
>>> predictable file name under that directory.
>>
>> Doesn't this suffer from the same issue? If dtrace finds that
>> predictable dir/file path it will exit (impeding a second run). Or am
>> I missing something?
> 
> The idea is to use a predictable file name in a temporary directory with
> an unpredictable name (created using mkdtemp or one of the higher-level
> facilities).

You're right, that could work: only the basename is embedded in the ELF 
data, not the full path.

Untested patch attached

-- 
Gioele Barabucci

[-- Attachment #2: 0002-dtrace-Use-directory-random-directory-instead-of-bus.patch --]
[-- Type: text/x-patch, Size: 2353 bytes --]

From c371329cc20ee4be00c6e85fabdc0d61b34bf063 Mon Sep 17 00:00:00 2001
From: Gioele Barabucci <gioele@svario.it>
Date: Mon, 27 Feb 2023 18:12:12 +0100
Subject: [PATCH 2/2] dtrace: Use directory random directory instead of busy
 waiting

---
 dtrace.in | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/dtrace.in b/dtrace.in
index 22c1a9d03..a20fb5c4a 100644
--- a/dtrace.in
+++ b/dtrace.in
@@ -26,6 +26,7 @@ import sys
 import time
 import atexit
 from shlex import split
+from tempfile import mkdtemp
 from subprocess import call
 try:
     from pyparsing import alphas, cStyleComment, delimitedList, Group, \
@@ -36,6 +37,8 @@ try:
 except ImportError:
     HAVE_PYP = False
 
+TEMP_DIR = None
+
 
 # Common file creation methods for pyparsing and string pattern matching
 
@@ -282,20 +285,9 @@ def mktemp_determ(sources, suffix):
     sha = hashlib.sha256()
     for source in sources:
         sha.update(source.encode('utf-8'))
-    fname = ".dtrace-temp." + sha.hexdigest()[:8] + suffix
-    tries = 0
-    while True:
-        tries += 1
-        if tries > 100: # if file exists due to previous crash or whatever
-            raise Exception("cannot create temporary file \""+fname+"\"")
-        try:
-            wxmode = 'x' if sys.version_info > (3,0) else 'wx'
-            fdesc = open(fname, mode=wxmode)
-            break
-        except FileExistsError:
-            time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
-            pass # Try again
-
+    fname = TEMP_DIR + "/.dtrace-temp." + sha.hexdigest()[:8] + suffix
+    wxmode = 'x' if sys.version_info > (3,0) else 'wx'
+    fdesc = open(fname, mode=wxmode)
     return fdesc, fname
 
 
@@ -326,6 +318,7 @@ def main():
         return 1
 
     global HAVE_PYP
+    global TEMP_DIR
     i = 1
     build_header = False
     build_source = False
@@ -380,6 +373,7 @@ def main():
         usage()
         return 1
 
+    TEMP_DIR = mkdtemp()
     if s_filename != "" and use_cpp:
         (ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
         cpp = os.environ.get("CPP", "cpp")
@@ -458,6 +452,9 @@ def main():
         else:
             print("cpp: " + s_filename)
 
+    if not keep_temps:
+        os.rmdir(TEMP_DIR)
+
     return 0
 
 if __name__ == "__main__":
-- 
2.39.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
  2023-02-27 17:15       ` Gioele Barabucci
@ 2023-02-27 17:34         ` Florian Weimer
  2023-02-28  3:46           ` Gioele Barabucci
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2023-02-27 17:34 UTC (permalink / raw)
  To: Gioele Barabucci; +Cc: Gioele Barabucci via Systemtap

* Gioele Barabucci:

> @@ -380,6 +373,7 @@ def main():
>          usage()
>          return 1
>  
> +    TEMP_DIR = mkdtemp()
>      if s_filename != "" and use_cpp:
>          (ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
>          cpp = os.environ.get("CPP", "cpp")
> @@ -458,6 +452,9 @@ def main():
>          else:
>              print("cpp: " + s_filename)
>  
> +    if not keep_temps:
> +        os.rmdir(TEMP_DIR)

This should probably use try:/finally: or a context manager
with tempfile.TemporaryDirectory.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
  2023-02-27 17:34         ` Florian Weimer
@ 2023-02-28  3:46           ` Gioele Barabucci
  2023-02-28 10:12             ` Florian Weimer
  0 siblings, 1 reply; 8+ messages in thread
From: Gioele Barabucci @ 2023-02-28  3:46 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Gioele Barabucci via Systemtap

On 27/02/23 18:34, Florian Weimer wrote:
> * Gioele Barabucci:
> 
>> @@ -380,6 +373,7 @@ def main():
>>           usage()
>>           return 1
>>   
>> +    TEMP_DIR = mkdtemp()
>>       if s_filename != "" and use_cpp:
>>           (ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
>>           cpp = os.environ.get("CPP", "cpp")
>> @@ -458,6 +452,9 @@ def main():
>>           else:
>>               print("cpp: " + s_filename)
>>   
>> +    if not keep_temps:
>> +        os.rmdir(TEMP_DIR)
> 
> This should probably use try:/finally:

You mean wrapping the whole function in a try:/finally: or just the 
`os.rmdir`?

The former implicates a quite invasive refactoring of the code 
(patch-wise a complete rewrite); the latter seems overkill given that 
the other `os.remove` calls are not checked and the only possibilities 
are either silencing the error (pass) or exit with an error code 
(similar to what the exception would do anyway).

> or a context manager with tempfile.TemporaryDirectory.

Wouldn't that greatly complicate the handling of `-k`/`keep_temps`? a 
context manager with `TemporaryDirectory` will unconditionally remove 
the directory and all its content on completion.

BTW, TemporaryDirectory is a Python 3 feature. I understood from the 
code that compatibility with 2.6 is still important in dtrace 
(`sys.version_info > (3,0)`...). Isn't that the case?

Regards,

-- 
Gioele Barabucci

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
  2023-02-28  3:46           ` Gioele Barabucci
@ 2023-02-28 10:12             ` Florian Weimer
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Weimer @ 2023-02-28 10:12 UTC (permalink / raw)
  To: Gioele Barabucci; +Cc: Gioele Barabucci via Systemtap

* Gioele Barabucci:

> On 27/02/23 18:34, Florian Weimer wrote:
>> * Gioele Barabucci:
>> 
>>> @@ -380,6 +373,7 @@ def main():
>>>           usage()
>>>           return 1
>>>   +    TEMP_DIR = mkdtemp()
>>>       if s_filename != "" and use_cpp:
>>>           (ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
>>>           cpp = os.environ.get("CPP", "cpp")
>>> @@ -458,6 +452,9 @@ def main():
>>>           else:
>>>               print("cpp: " + s_filename)
>>>   +    if not keep_temps:
>>> +        os.rmdir(TEMP_DIR)
>> This should probably use try:/finally:
>
> You mean wrapping the whole function in a try:/finally: or just the
> `os.rmdir`?

The whole thing.

> The former implicates a quite invasive refactoring of the code
> (patch-wise a complete rewrite); the latter seems overkill given that
> the other `os.remove` calls are not checked and the only possibilities
> are either silencing the error (pass) or exit with an error code
> (similar to what the exception would do anyway).

Okay, if the code isn't ready for this …

>> or a context manager with tempfile.TemporaryDirectory.
>
> Wouldn't that greatly complicate the handling of `-k`/`keep_temps`? a
> context manager with `TemporaryDirectory` will unconditionally remove
> the directory and all its content on completion.

Looks like it, TemporyFile has a no-delete option, but
TemporaryDirectory does not.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-28 10:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-27 12:13 [PATCH] dtrace: Use deterministic temp file creation for all temp files Gioele Barabucci
2023-02-27 15:49 ` Florian Weimer
2023-02-27 15:59   ` Gioele Barabucci
2023-02-27 16:47     ` Florian Weimer
2023-02-27 17:15       ` Gioele Barabucci
2023-02-27 17:34         ` Florian Weimer
2023-02-28  3:46           ` Gioele Barabucci
2023-02-28 10:12             ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).