From: Gioele Barabucci <gioele@svario.it>
To: systemtap@sourceware.org
Subject: [PATCH] dtrace: Use deterministic temp file creation for all temp files
Date: Mon, 27 Feb 2023 13:13:16 +0100 [thread overview]
Message-ID: <6bfff40c-2c4b-c119-116d-7834310299d7@svario.it> (raw)
`dtrace -G -C` creates temporary files with random filenames. The name
of these temporary files gets embedded in the ELF `.symtab` of the final
object files, making them always slightly different.
This behavior makes all packages that use `dtrace`-produced object files
inherently non reproducible.
To reproduce this issue:
```
$ git clone https://salsa.debian.org/sssd-team/sssd.git
$ cd sssd
$ mkdir -p build && cd build/
$ dtrace -C -G -s ../src/systemtap/sssd_probes.d -o stap_generated_probes.o
$ readelf --wide --symbols stap_generated_probes.o > sym1.txt
$ dtrace -C -G -s ../src/systemtap/sssd_probes.d -o stap_generated_probes.o
$ readelf --wide --symbols stap_generated_probes.o > sym2.txt
$ diff -u sym1.txt sym2.txt
--- sym1.txt 2023-02-27 08:38:48.955299234 +0100
+++ sym2.txt 2023-02-27 08:38:49.103303352 +0100
@@ -2,7 +2,7 @@
Symbol table '.symtab' contains 59 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000 0 NOTYPE LOCAL DEFAULT UND
- 1: 0000000000 0 FILE LOCAL DEFAULT ABS .dtrace-temp.4f0bbdda.c
+ 1: 0000000000 0 FILE LOCAL DEFAULT ABS .dtrace-temp.d20e76c7.c
2: 0000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000 7 FUNC LOCAL DEFAULT 1 __dtrace
4: 0000000000 0 SECTION LOCAL DEFAULT 5 .debug_info
```
The root cause of this issue is that, although the name of the temporary
file is created in a deterministic way (from the SHA256 of the source
file), the name of the source file is overwritten with a random name
then the `-C` option (`use_cpp`) is used:
```
if s_filename != "" and use_cpp:
(ignore, fname) = mkstemp(suffix=".d")
cpp = os.environ.get("CPP", "cpp")
retcode = call(split(cpp) + [...] + [s_filename, '-o', fname])
if retcode != 0:
print("\"cpp includes s_filename\" failed")
usage()
return 1
s_filename = fname
[...]
sha = hashlib.sha256()
sha.update(s_filename.encode('utf-8'))
sha.update(filename.encode('utf-8'))
fname = ".dtrace-temp." + sha.hexdigest()[:8] + ".c"
```
To fix this issue, all temporary files are now created using
the same deterministic procedure currently used only for the
temporary ".c" files.
Fixes: https://bugs.debian.org/1032055
Fixes: https://bugs.debian.org/1032056
Signed-off-by: Gioele Barabucci <gioele@svario.it>
---
dtrace.in | 50 +++++++++++++++++++++++++++-----------------------
1 file changed, 27 insertions(+), 23 deletions(-)
diff --git a/dtrace.in b/dtrace.in
index adad99bdb..22c1a9d03 100644
--- a/dtrace.in
+++ b/dtrace.in
@@ -27,7 +27,6 @@ import time
import atexit
from shlex import split
from subprocess import call
-from tempfile import mkstemp
try:
from pyparsing import alphas, cStyleComment, delimitedList, Group, \
Keyword, lineno, Literal, nestedExpr, nums, oneOf, OneOrMore, \
@@ -278,6 +277,28 @@ class _ReProvider(_HeaderCreator):
hdr.close()
+def mktemp_determ(sources, suffix):
+ # for reproducible-builds purposes, use a predictable tmpfile path
+ sha = hashlib.sha256()
+ for source in sources:
+ sha.update(source.encode('utf-8'))
+ fname = ".dtrace-temp." + sha.hexdigest()[:8] + suffix
+ tries = 0
+ while True:
+ tries += 1
+ if tries > 100: # if file exists due to previous crash or whatever
+ raise Exception("cannot create temporary file \""+fname+"\"")
+ try:
+ wxmode = 'x' if sys.version_info > (3,0) else 'wx'
+ fdesc = open(fname, mode=wxmode)
+ break
+ except FileExistsError:
+ time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
+ pass # Try again
+
+ return fdesc, fname
+
+
def usage():
print("Usage " + sys.argv[0] + " [--help] [-h | -G] [-C [-I<Path>]] -s File.d [-o <File>]")
@@ -360,7 +381,7 @@ def main():
return 1
if s_filename != "" and use_cpp:
- (ignore, fname) = mkstemp(suffix=".d")
+ (ignore, fname) = mktemp_determ(["use_cpp", s_filename], suffix=".d")
cpp = os.environ.get("CPP", "cpp")
retcode = call(split(cpp) + includes + defines + [s_filename, '-o', fname])
if retcode != 0:
@@ -399,7 +420,7 @@ def main():
providers = _PypProvider()
else:
providers = _ReProvider()
- (ignore, fname) = mkstemp(suffix=".h")
+ (fdesc, fname) = mktemp_determ(["build_source", s_filename], suffix=".h")
while True:
try:
providers.probe_write(s_filename, fname)
@@ -413,26 +434,9 @@ def main():
else:
print("header: " + fname)
- # for reproducible-builds purposes, use a predictable tmpfile path
- sha = hashlib.sha256()
- sha.update(s_filename.encode('utf-8'))
- sha.update(filename.encode('utf-8'))
- fname = ".dtrace-temp." + sha.hexdigest()[:8] + ".c"
- tries = 0
- while True:
- tries += 1
- if tries > 100: # if file exists due to previous crash or whatever
- print("cannot create temporary file \""+fname+"\"")
- return 1
- try:
- wxmode = 'x' if sys.version_info > (3,0) else 'wx'
- fdesc = open(fname, mode=wxmode)
- if not keep_temps:
- atexit.register(os.remove, fname) # delete generated source at exit, even if error
- break
- except:
- time.sleep(0.1) # vague estimate of elapsed time for concurrent identical gcc job
- pass # Try again
+ (fdesc, fname) = mktemp_determ(["build_source", s_filename, filename], suffix=".c")
+ if not keep_temps:
+ atexit.register(os.remove, fname) # delete generated source at exit, even if error
providers.semaphore_write(fdesc)
fdesc.close()
cc1 = os.environ.get("CC", "gcc")
--
2.39.2
next reply other threads:[~2023-02-27 12:13 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-27 12:13 Gioele Barabucci [this message]
2023-02-27 15:49 ` Florian Weimer
2023-02-27 15:59 ` Gioele Barabucci
2023-02-27 16:47 ` Florian Weimer
2023-02-27 17:15 ` Gioele Barabucci
2023-02-27 17:34 ` Florian Weimer
2023-02-28 3:46 ` Gioele Barabucci
2023-02-28 10:12 ` Florian Weimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6bfff40c-2c4b-c119-116d-7834310299d7@svario.it \
--to=gioele@svario.it \
--cc=systemtap@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).