* [PATCH] dtrace: Use deterministic temp file creation for all temp files
@ 2023-02-27 12:13 Gioele Barabucci
2023-02-27 15:49 ` Florian Weimer
0 siblings, 1 reply; 8+ messages in thread
From: Gioele Barabucci @ 2023-02-27 12:13 UTC (permalink / raw)
To: systemtap
`dtrace -G -C` creates temporary files with random filenames. The name
of these temporary files gets embedded in the ELF `.symtab` of the final
object files, making them always slightly different.
This behavior makes all packages that use `dtrace`-produced object files
inherently non reproducible.
To reproduce this issue:
```
$ git clone https://salsa.debian.org/sssd-team/sssd.git
$ cd sssd
$ mkdir -p build && cd build/
$ dtrace -C -G -s ../src/systemtap/sssd_probes.d -o stap_generated_probes.o
$ readelf --wide --symbols stap_generated_probes.o > sym1.txt
$ dtrace -C -G -s ../src/systemtap/sssd_probes.d -o stap_generated_probes.o
$ readelf --wide --symbols stap_generated_probes.o > sym2.txt
$ diff -u sym1.txt sym2.txt
--- sym1.txt 2023-02-27 08:38:48.955299234 +0100
+++ sym2.txt 2023-02-27 08:38:49.103303352 +0100
@@ -2,7 +2,7 @@
Symbol table '.symtab' contains 59 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000 0 NOTYPE LOCAL DEFAULT UND
- 1: 0000000000 0 FILE LOCAL DEFAULT ABS .dtrace-temp.4f0bbdda.c
+ 1: 0000000000 0 FILE LOCAL DEFAULT ABS .dtrace-temp.d20e76c7.c
2: 0000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000 7 FUNC LOCAL DEFAULT 1 __dtrace
4: 0000000000 0 SECTION LOCAL DEFAULT 5 .debug_info
```
The root cause of this issue is that, although the name of the temporary
file is created in a deterministic way (from the SHA256 of the source
file), the name of the source file is overwritten with a random name
then the `-C` option (`use_cpp`) is used:
```
if s_filename != "" and use_cpp:
(ignore, fname) = mkstemp(suffix=".d")
cpp = os.environ.get("CPP", "cpp")
retcode = call(split(cpp) + [...] + [s_filename, '-o', fname])
if retcode != 0:
print("\"cpp includes s_filename\" failed")
usage()
return 1
s_filename = fname
[...]
sha = hashlib.sha256()
sha.update(s_filename.encode('utf-8'))
sha.update(filename.encode('utf-8'))
fname = ".dtrace-temp." + sha.hexdigest()[:8] + ".c"
```
To fix this issue, all temporary files are now created using
the same deterministic procedure currently used only for the
temporary ".c" files.
Fixes: https://bugs.debian.org/1032055
Fixes: https://bugs.debian.org/1032056
Signed-off-by: Gioele Barabucci <gioele@svario.it>
---
dtrace.in | 50 +++++++++++++++++++++++++++-----------------------
1 file changed, 27 insertions(+), 23 deletions(-)
diff --git a/dtrace.in b/dtrace.in
index adad99bdb..22c1a9d03 100644
--- a/dtrace.in
+++ b/dtrace.in
@@ -27,7 +27,6 @@ import time
import atexit
from shlex import split
from subprocess import call
-from tempfile import mkstemp
try:
from pyparsing import alphas, cStyleComment, delimitedList, Group, \
Keyword, lineno, Literal, nestedExpr, nums, oneOf, OneOrMore, \
@@ -278,6 +277,28 @@ class _ReProvider(_HeaderCreator):
hdr.close()
+def mktemp_determ(sources, suffix):
+ # for reproducible-builds purposes, use a predictable tmpfile path
+ sha = hashlib.sha256()
+ for source in sources:
+ sha.update(source.encode('utf-8'))
+ fname = ".dtrace-temp." + sha.hexdigest()[:8] + suffix
+ tries = 0
+ while True:
+ tries += 1
+ if tries > 100: # if file exists due to previous crash or whatever
+ raise Exception("cannot create temporary file \""+fname+"\"")
+ try:
+ wxmode = 'x' if sys.version_info > (3,0) else 'wx'
+ fdesc = open(fname, mode=wxmode)
+ break
+ except FileExistsError:
+ time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
+ pass # Try again
+
+ return fdesc, fname
+
+
def usage():
print("Usage " + sys.argv[0] + " [--help] [-h | -G] [-C [-I<Path>]] -s File.d [-o <File>]")
@@ -360,7 +381,7 @@ def main():
return 1
if s_filename != "" and use_cpp:
- (ignore, fname) = mkstemp(suffix=".d")
+ (ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
cpp = os.environ.get("CPP", "cpp")
retcode = call(split(cpp) + includes + defines + [s_filename, '-o', fname])
if retcode != 0:
@@ -399,7 +420,7 @@ def main():
providers = _PypProvider()
else:
providers = _ReProvider()
- (ignore, fname) = mkstemp(suffix=".h")
+ (fdesc, fname) = mktemp_determ(["build_source", s_filename], suffix=".h")
while True:
try:
providers.probe_write(s_filename, fname)
@@ -413,26 +434,9 @@ def main():
else:
print("header: " + fname)
- # for reproducible-builds purposes, use a predictable tmpfile path
- sha = hashlib.sha256()
- sha.update(s_filename.encode('utf-8'))
- sha.update(filename.encode('utf-8'))
- fname = ".dtrace-temp." + sha.hexdigest()[:8] + ".c"
- tries = 0
- while True:
- tries += 1
- if tries > 100: # if file exists due to previous crash or whatever
- print("cannot create temporary file \""+fname+"\"")
- return 1
- try:
- wxmode = 'x' if sys.version_info > (3,0) else 'wx'
- fdesc = open(fname, mode=wxmode)
- if not keep_temps:
- atexit.register(os.remove, fname) # delete generated source at exit, even if error
- break
- except:
- time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
- pass # Try again
+ (fdesc, fname) = mktemp_determ(["build_source", s_filename, filename], suffix=".c")
+ if not keep_temps:
+ atexit.register(os.remove, fname) # delete generated source at exit, even if error
providers.semaphore_write(fdesc)
fdesc.close()
cc1 = os.environ.get("CC", "gcc")
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
2023-02-27 12:13 [PATCH] dtrace: Use deterministic temp file creation for all temp files Gioele Barabucci
@ 2023-02-27 15:49 ` Florian Weimer
2023-02-27 15:59 ` Gioele Barabucci
0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2023-02-27 15:49 UTC (permalink / raw)
To: Gioele Barabucci via Systemtap; +Cc: Gioele Barabucci
* Gioele Barabucci via Systemtap:
> +def mktemp_determ(sources, suffix):
> + # for reproducible-builds purposes, use a predictable tmpfile path
> + sha = hashlib.sha256()
> + for source in sources:
> + sha.update(source.encode('utf-8'))
> + fname = ".dtrace-temp." + sha.hexdigest()[:8] + suffix
> + tries = 0
> + while True:
> + tries += 1
> + if tries > 100: # if file exists due to previous crash or whatever
> + raise Exception("cannot create temporary file \""+fname+"\"")
> + try:
> + wxmode = 'x' if sys.version_info > (3,0) else 'wx'
> + fdesc = open(fname, mode=wxmode)
> + break
> + except FileExistsError:
> + time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
> + pass # Try again
> +
> + return fdesc, fname
This looks like creating a file with a suitable name may block forward
progress indefinitely? Like from a previous crash of the tool?
It might be more robust to use a dedicated temporary directory and a
predictable file name under that directory.
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
2023-02-27 15:49 ` Florian Weimer
@ 2023-02-27 15:59 ` Gioele Barabucci
2023-02-27 16:47 ` Florian Weimer
0 siblings, 1 reply; 8+ messages in thread
From: Gioele Barabucci @ 2023-02-27 15:59 UTC (permalink / raw)
To: Florian Weimer, Gioele Barabucci via Systemtap
On 27/02/23 16:49, Florian Weimer wrote:
>> +def mktemp_determ(sources, suffix):
>> + # for reproducible-builds purposes, use a predictable tmpfile path
>> + sha = hashlib.sha256()
>> + for source in sources:
>> + sha.update(source.encode('utf-8'))
>> + fname = ".dtrace-temp." + sha.hexdigest()[:8] + suffix
>> + tries = 0
>> + while True:
>> + tries += 1
>> + if tries > 100: # if file exists due to previous crash or whatever
>> + raise Exception("cannot create temporary file \""+fname+"\"")
>> + try:
>> + wxmode = 'x' if sys.version_info > (3,0) else 'wx'
>> + fdesc = open(fname, mode=wxmode)
>> + break
>> + except FileExistsError:
>> + time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
>> + pass # Try again
>> +
>> + return fdesc, fname
>
> This looks like creating a file with a suitable name may block forward
> progress indefinitely? Like from a previous crash of the tool?
Hi Florian,
yes, that's correct. This code will exit with a FileExistsError if a
temporary file from a previous run are still around.
However this is not a new behavior: this piece of code is already in
dtrace. What this patch does is using it for all intermediate tempfiles,
not just for the final temporary files.
> It might be more robust to use a dedicated temporary directory and a
> predictable file name under that directory.
Doesn't this suffer from the same issue? If dtrace finds that
predictable dir/file path it will exit (impeding a second run). Or am I
missing something?
Regards,
--
Gioele Barabucci
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
2023-02-27 15:59 ` Gioele Barabucci
@ 2023-02-27 16:47 ` Florian Weimer
2023-02-27 17:15 ` Gioele Barabucci
0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2023-02-27 16:47 UTC (permalink / raw)
To: Gioele Barabucci; +Cc: Gioele Barabucci via Systemtap
* Gioele Barabucci:
>> It might be more robust to use a dedicated temporary directory and a
>> predictable file name under that directory.
>
> Doesn't this suffer from the same issue? If dtrace finds that
> predictable dir/file path it will exit (impeding a second run). Or am
> I missing something?
The idea is to use a predictable file name in a temporary directory with
an unpredictable name (created using mkdtemp or one of the higher-level
facilities).
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
2023-02-27 16:47 ` Florian Weimer
@ 2023-02-27 17:15 ` Gioele Barabucci
2023-02-27 17:34 ` Florian Weimer
0 siblings, 1 reply; 8+ messages in thread
From: Gioele Barabucci @ 2023-02-27 17:15 UTC (permalink / raw)
To: Florian Weimer; +Cc: Gioele Barabucci via Systemtap
[-- Attachment #1: Type: text/plain, Size: 640 bytes --]
On 27/02/23 17:47, Florian Weimer wrote:
>>> It might be more robust to use a dedicated temporary directory and a
>>> predictable file name under that directory.
>>
>> Doesn't this suffer from the same issue? If dtrace finds that
>> predictable dir/file path it will exit (impeding a second run). Or am
>> I missing something?
>
> The idea is to use a predictable file name in a temporary directory with
> an unpredictable name (created using mkdtemp or one of the higher-level
> facilities).
You're right, that could work: only the basename is embedded in the ELF
data, not the full path.
Untested patch attached
--
Gioele Barabucci
[-- Attachment #2: 0002-dtrace-Use-directory-random-directory-instead-of-bus.patch --]
[-- Type: text/x-patch, Size: 2353 bytes --]
From c371329cc20ee4be00c6e85fabdc0d61b34bf063 Mon Sep 17 00:00:00 2001
From: Gioele Barabucci <gioele@svario.it>
Date: Mon, 27 Feb 2023 18:12:12 +0100
Subject: [PATCH 2/2] dtrace: Use directory random directory instead of busy
waiting
---
dtrace.in | 25 +++++++++++--------------
1 file changed, 11 insertions(+), 14 deletions(-)
diff --git a/dtrace.in b/dtrace.in
index 22c1a9d03..a20fb5c4a 100644
--- a/dtrace.in
+++ b/dtrace.in
@@ -26,6 +26,7 @@ import sys
import time
import atexit
from shlex import split
+from tempfile import mkdtemp
from subprocess import call
try:
from pyparsing import alphas, cStyleComment, delimitedList, Group, \
@@ -36,6 +37,8 @@ try:
except ImportError:
HAVE_PYP = False
+TEMP_DIR = None
+
# Common file creation methods for pyparsing and string pattern matching
@@ -282,20 +285,9 @@ def mktemp_determ(sources, suffix):
sha = hashlib.sha256()
for source in sources:
sha.update(source.encode('utf-8'))
- fname = ".dtrace-temp." + sha.hexdigest()[:8] + suffix
- tries = 0
- while True:
- tries += 1
- if tries > 100: # if file exists due to previous crash or whatever
- raise Exception("cannot create temporary file \""+fname+"\"")
- try:
- wxmode = 'x' if sys.version_info > (3,0) else 'wx'
- fdesc = open(fname, mode=wxmode)
- break
- except FileExistsError:
- time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
- pass # Try again
-
+ fname = TEMP_DIR + "/.dtrace-temp." + sha.hexdigest()[:8] + suffix
+ wxmode = 'x' if sys.version_info > (3,0) else 'wx'
+ fdesc = open(fname, mode=wxmode)
return fdesc, fname
@@ -326,6 +318,7 @@ def main():
return 1
global HAVE_PYP
+ global TEMP_DIR
i = 1
build_header = False
build_source = False
@@ -380,6 +373,7 @@ def main():
usage()
return 1
+ TEMP_DIR = mkdtemp()
if s_filename != "" and use_cpp:
(ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
cpp = os.environ.get("CPP", "cpp")
@@ -458,6 +452,9 @@ def main():
else:
print("cpp: " + s_filename)
+ if not keep_temps:
+ os.rmdir(TEMP_DIR)
+
return 0
if __name__ == "__main__":
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
2023-02-27 17:15 ` Gioele Barabucci
@ 2023-02-27 17:34 ` Florian Weimer
2023-02-28 3:46 ` Gioele Barabucci
0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2023-02-27 17:34 UTC (permalink / raw)
To: Gioele Barabucci; +Cc: Gioele Barabucci via Systemtap
* Gioele Barabucci:
> @@ -380,6 +373,7 @@ def main():
> usage()
> return 1
>
> + TEMP_DIR = mkdtemp()
> if s_filename != "" and use_cpp:
> (ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
> cpp = os.environ.get("CPP", "cpp")
> @@ -458,6 +452,9 @@ def main():
> else:
> print("cpp: " + s_filename)
>
> + if not keep_temps:
> + os.rmdir(TEMP_DIR)
This should probably use try:/finally: or a context manager
with tempfile.TemporaryDirectory.
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
2023-02-27 17:34 ` Florian Weimer
@ 2023-02-28 3:46 ` Gioele Barabucci
2023-02-28 10:12 ` Florian Weimer
0 siblings, 1 reply; 8+ messages in thread
From: Gioele Barabucci @ 2023-02-28 3:46 UTC (permalink / raw)
To: Florian Weimer; +Cc: Gioele Barabucci via Systemtap
On 27/02/23 18:34, Florian Weimer wrote:
> * Gioele Barabucci:
>
>> @@ -380,6 +373,7 @@ def main():
>> usage()
>> return 1
>>
>> + TEMP_DIR = mkdtemp()
>> if s_filename != "" and use_cpp:
>> (ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
>> cpp = os.environ.get("CPP", "cpp")
>> @@ -458,6 +452,9 @@ def main():
>> else:
>> print("cpp: " + s_filename)
>>
>> + if not keep_temps:
>> + os.rmdir(TEMP_DIR)
>
> This should probably use try:/finally:
You mean wrapping the whole function in a try:/finally: or just the
`os.rmdir`?
The former implicates a quite invasive refactoring of the code
(patch-wise a complete rewrite); the latter seems overkill given that
the other `os.remove` calls are not checked and the only possibilities
are either silencing the error (pass) or exit with an error code
(similar to what the exception would do anyway).
> or a context manager with tempfile.TemporaryDirectory.
Wouldn't that greatly complicate the handling of `-k`/`keep_temps`? a
context manager with `TemporaryDirectory` will unconditionally remove
the directory and all its content on completion.
BTW, TemporaryDirectory is a Python 3 feature. I understood from the
code that compatibility with 2.6 is still important in dtrace
(`sys.version_info > (3,0)`...). Isn't that the case?
Regards,
--
Gioele Barabucci
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] dtrace: Use deterministic temp file creation for all temp files
2023-02-28 3:46 ` Gioele Barabucci
@ 2023-02-28 10:12 ` Florian Weimer
0 siblings, 0 replies; 8+ messages in thread
From: Florian Weimer @ 2023-02-28 10:12 UTC (permalink / raw)
To: Gioele Barabucci; +Cc: Gioele Barabucci via Systemtap
* Gioele Barabucci:
> On 27/02/23 18:34, Florian Weimer wrote:
>> * Gioele Barabucci:
>>
>>> @@ -380,6 +373,7 @@ def main():
>>> usage()
>>> return 1
>>> + TEMP_DIR = mkdtemp()
>>> if s_filename != "" and use_cpp:
>>> (ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
>>> cpp = os.environ.get("CPP", "cpp")
>>> @@ -458,6 +452,9 @@ def main():
>>> else:
>>> print("cpp: " + s_filename)
>>> + if not keep_temps:
>>> + os.rmdir(TEMP_DIR)
>> This should probably use try:/finally:
>
> You mean wrapping the whole function in a try:/finally: or just the
> `os.rmdir`?
The whole thing.
> The former implicates a quite invasive refactoring of the code
> (patch-wise a complete rewrite); the latter seems overkill given that
> the other `os.remove` calls are not checked and the only possibilities
> are either silencing the error (pass) or exit with an error code
> (similar to what the exception would do anyway).
Okay, if the code isn't ready for this …
>> or a context manager with tempfile.TemporaryDirectory.
>
> Wouldn't that greatly complicate the handling of `-k`/`keep_temps`? a
> context manager with `TemporaryDirectory` will unconditionally remove
> the directory and all its content on completion.
Looks like it, TemporyFile has a no-delete option, but
TemporaryDirectory does not.
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-02-28 10:12 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-27 12:13 [PATCH] dtrace: Use deterministic temp file creation for all temp files Gioele Barabucci
2023-02-27 15:49 ` Florian Weimer
2023-02-27 15:59 ` Gioele Barabucci
2023-02-27 16:47 ` Florian Weimer
2023-02-27 17:15 ` Gioele Barabucci
2023-02-27 17:34 ` Florian Weimer
2023-02-28 3:46 ` Gioele Barabucci
2023-02-28 10:12 ` Florian Weimer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).