On Mon, 2023-03-20 at 18:28 +0100, Shengyu Huang wrote: > Hi Dave, > > Thanks for always getting back to me so promptly! I am drafting the > proposal today. Here is the link: > > https://docs.google.com/document/d/1MRI1R5DaX8kM6DaqRQsEri5Mx2FvHmWv13qe1W0Bj0g/ > > (The proposal was first written in markdown and then copied pasted to > Google Docs, so some formatting may look ugly...) Thanks. > > In the timeline section, I mention your name twice where I expect > your input can help me speed up the work. For example, the mentioned > paper (https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf) has > a section “performance” on page 12 that lists out several solutions > to mitigate the exponential blow up in straightforward implementation > of symbolic execution, but the current implementation may have some > clever tricks already (e.g., purging the states?) that some of the > solutions may not be applicable to us. One note on the timeline: I'm planning to take a vacation from July 1st-16th, and will only have intermittent access to a computer during that time, but hopefully will at least be able to respond to questions. > > I can further polish this proposal based on your feedback. I may not > be as responsive as you are because I have several deadlines from > coursework every week. I seem to have spent the whole day looking at bugzilla reports, so I hope to look at your draft properly later in the week. > > > > 1. What should I do with the integration tests? > > > > First of all, AFAIK I'm the only person who's tried running the > > integration tests.  They're the test scripts I wrote to help me > > validate my own patches, so there will be rough edges; please let > > me > > know as you run into them, so I can fix/document them. > > You append the path “../sarifdump” in results.py, but this path is > not in the repo. Sorry about this; it's this repo: https://github.com/davidmalcolm/sarif-dump which I'm simply checking out to a sister directory for now. > > > > 2. I ran gcc -fanalyzer -fanalyzer-checker=taint ./gcc- > > > src/gcc/testsuite/gcc.dg/analyzer/pr93032-mztools-signed-char.c , > > > but > > > I got different results from what you documented in PR103533: > > > > > > /usr/bin/ld: /lib/x86_64-linux-gnu/crt1.o: in function `_start': > > > (.text+0x17): undefined reference to `main' > > > collect2: error: ld returned 1 exit status > > > > gcc's default is to try to compile, assemble, and link into an > > executable.  This testcase doesn't have a "main" function, hence > > the > > linker complains.  If you pass "-S", it will merely compile the .c > > to a > > .s assembler file whilst still running the analyzer. > > > > In terms of actually running the test suite via DejaGnu, see: > > https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html > > > > I typically use: > > > >  make -k -jN \ > >  && time make check-gcc \ > >         RUNTESTFLAGS="-v -v --target_board=unix\{-m32,-m64\} > > analyzer-torture.exp=*.c analyzer.exp=*.c" > > > > when testing the analyzer regression test suite, where N is the > > number > > of cores on my box > > > > When I run an individual testcase, I do something like: > > > > ./xgcc -B. -S -fanalyzer ../../src/PATH_TO_TEST_CASE > > > > in the "gcc" subdirectory of the build directory. > > Yeah sorry for not taking a good look at the testcase before sending > you this question…the tips were very helpful still, thanks a lot! > > Under latest trunk, all the individual testcases documented in > PR103533 compile with no error (no ICE or state explosion). I double > checked that I did turn on -fanalyzer-checker=taint (although it is a > bit annoying there is no error or warning when I mistyped it as - > fanalyzer-checker=tai8nt).  > bit annoying there is no error or warning when I mistyped it as - > fanalyzer-checker=tai8nt).  Oops, sorry about that; good catch. I've filed that as: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109220 > I also ran the test suite via DejaGNU, and there are only four > unexpected failures (no unexpected successes) and some unsupported > tests: > > ``` > FAIL: gcc.dg/analyzer/file-CWE-1341-example.c (test for excess > errors) > FAIL: gcc.dg/analyzer/pipe-glibc.c (test for excess errors) > FAIL: gcc.dg/analyzer/file-CWE-1341-example.c (test for excess > errors) > FAIL: gcc.dg/analyzer/pipe-glibc.c (test for excess errors) > ``` > > (Why is the same file reported twice in the summary?) If you're running with -m32,-m64 as above, it's running all the tests twice: once for 32-bit, and again for 64-bit x86 targets. > > These testcases are not relevant for taint analysis, but indeed when > I turned on the taint mode other checkers are suppressed without any > warnings (I guess this should be one of the goals if we don’t manage > to turn on the taint mode by default in the end). Yeah, -fanalyzer-checker=NAME means "just run state machine NAME". I've been testing the "run with taint-checking by default" via: diff --git a/gcc/analyzer/sm.cc b/gcc/analyzer/sm.cc index 1f329cb11d0..31bcf6e2f8e 100644 --- a/gcc/analyzer/sm.cc +++ b/gcc/analyzer/sm.cc @@ -188,10 +188,14 @@ make_checkers (auto_delete_vec &out, logger *logger) out.safe_push (make_malloc_state_machine (logger)); out.safe_push (make_fileptr_state_machine (logger)); out.safe_push (make_fd_state_machine (logger)); +#if 1 + out.safe_push (make_taint_state_machine (logger)); +#else /* The "taint" checker must be explicitly enabled (as it currently leads to state explosions that stop the other checkers working). */ if (flag_analyzer_checker) out.safe_push (make_taint_state_machine (logger)); +#endif out.safe_push (make_sensitive_state_machine (logger)); out.safe_push (make_signal_state_machine (logger)); out.safe_push (make_va_list_state_machine (logger)); i.e. add the taint machine to the bank of state machines already being tracked, so that there's that many more opportunities for differences to occur between state nodes, and thus more risk of "blow up" of the size of the graph being explored. I'm attaching a work-in-progress patch that I wrote for this, that does the above, plus some other related things, but be wary that it's from an old working copy on my hard drive: it's at least a year out of date, judging by the copyright dates :-/ > > Does it mean there are no small testcases that will cause state > explosion at the moment? It is a bit tricky for me to have an > intuition for where the problem stems when I don’t have a concrete > example to investigate…During the project, how often do you expect we > need to run the integration tests? I guess we run it whenever we > don’t have a small example to work at hand, and iteratively we use > the integration test results to construct a minimal example to fix > the next encountered issue? I think if you try the patch to sm.cc above, then you'll see various existing DejaGnu tests below gcc.dg/analyzer will fail with state explosions. > > By the way, I have applied for the compile farm account after the > first email exchanges and I have been working on compile farm for a > while now. Excellent. Dave > > > Best, > Shengyu > > P.S. There is no more `pr93032-mztools.c` in the testsuit, and the > two files `pr93032-mztools-{simplified, signed-char, unsigned- > char}.c` do not incur state explosion. > >