From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by sourceware.org (Postfix) with ESMTPS id 674E63858D28 for ; Sat, 29 Jan 2022 14:52:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 674E63858D28 Received: by mail-ej1-x62b.google.com with SMTP id o12so26509875eju.13 for ; Sat, 29 Jan 2022 06:52:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=SSiomeS5MeL1lssIHMoJlzfGl9pHQGN6SR1vN56EHps=; b=FKNcJbBf/XZuAKSNEG357amQhf6mvJexwl/mwcGkQ8xTGlDZoD87Gqq4U2fgxekUVt LBmqKpfwh4x407TWLJTdmjS4rx3TV2tlZO4AobSbtWd07UWqvX0ZbT0TDjzzserY3hL4 aF+6N5lGw4exmj4CgOUKNwsuTi9BXpGrSFU5plo1o44r1YGa6Br5q78dTSEuPD/OL+P4 5MSys8YMY+OdxKjDIJkl2Kar2wKBTHtXRW98ufCiOueApmWwuSSOrwyuJJqKy1vbJyfb fRSXaOAAsO6cGOgxLb3bBsznFhixMW9skkbIz2xm0FCuHAey+XtCfV3i0l8/4fv9mIkT qhrg== X-Gm-Message-State: AOAM5301Xsf53ArhiAgfFT5bTX2ujRztH+ELxSRRdFujkwEvBDETtZj5 LZf09YxuPylq21PRyL900rzGvOG8Cu+Ck/Vqmqk= X-Google-Smtp-Source: ABdhPJydlV5QljqwKWkiupV0GF/f6SxP/uusdPun7plfe7hFiZivMCQFXHcaYfdJ0cxrmy3fJUn44d5C+RBtSrFj5W4= X-Received: by 2002:a17:906:eb89:: with SMTP id mh9mr10846068ejb.399.1643467959208; Sat, 29 Jan 2022 06:52:39 -0800 (PST) MIME-Version: 1.0 References: <4eec5fa69b9daedcec5361c2cc18df7f1ef397af.camel@redhat.com> In-Reply-To: From: Mir Immad Date: Sat, 29 Jan 2022 20:22:25 +0530 Message-ID: Subject: Re: GSoC: Working on the static analyzer To: David Malcolm , gcc@gcc.gnu.org Content-Type: multipart/mixed; boundary="000000000000e8c1cf05d6b9b3ac" X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, HTML_MESSAGE, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Jan 2022 14:52:43 -0000 --000000000000e8c1cf05d6b9b3ac Content-Type: text/plain; charset="UTF-8" Thank you for the detailed information. I've been looking into the integer posix file descriptor APIs and I decided to write proof-of-concept checker for them. (not caring about errno). The checker tracks the fd returned by open(), warns if dup() is called with closed fd otherwise tracks the fd returned by dup(), it also warns if read() and write() functions were called on closed fd. I'm attaching a text file that lists some c sources and warnings by the static analyzer. I've used the diagnostic meta-data from sm-file. Is this something that could also be added to the analyzer? About the fd leak, that's the next thing I'll try to get working. Since you've mentioned that it could be a GSoC project, this is what I'm going to focus on. Regards. On Wed, Jan 26, 2022 at 7:56 PM David Malcolm wrote: > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote: > > Hi, sir. > > > > I've been trying to understand the static analyzer's code. I spent most > > of > > my time learning the state machine's API. I learned how state machine's > > on_stmt is supposed to "recognize" specific functions and how > > on_transition > > takes a specific tree from one state to another, and how the captured > > states are used by pending_diagnostics to report the errors. > > Furthermore, I > > was able to create a dummy checker that mimicked the behaviour of sm- > > file's > > double_fclose and compile GCC with these changes. Is this the right way > > of > > learning? > > This sounds great. > > > > > As you've mentioned on the projects page that you would like to add > > more > > support for some POSIX APIs. Can you please write (or refer me to a) a > > simple C program that uses such an API (and also what the analyzer > > should > > have done) so that I can attempt to add such a checker to the analyzer. > > A couple of project ideas: > > (i) treat data coming from a network connection as tainted, by somehow > teaching the analyzer about networking APIs. Ideally: look at some > subset of historical CVEs involving network-facing attacks on user- > space daemons, and find a way to detect them in the analyzer (need to > find a way to mark the incoming data as tainted, so that the analyer > "know" about the trust boundary - that the incoming data needs to be > sanitized and treated with extra caution; see > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html > for my attempts to do this for the Linux kernel). > > Obviously this is potentially a huge project, so maybe just picking a > tiny subset and getting that working as a proof-of-concept would be a > good GSoC project. Maybe find an old CVE that someone has written a > good write-up for, and think about "how could GCC/-fanalyzer have > spotted it?" > > (ii) add leak-detection for POSIX file descriptors: i.e. the integer > values returned by "open", "dup", etc. It would be good to have a > check that the user's code doesn't leak these values e.g. on error- > handling paths, by failing to close a file-descriptor (and not storing > it anywhere). I think that much of this could be done by analogy with > the sm-file.cc code. > > > > > > Also, I didn't realize the complexity of adding SARIF when I mentioned > > it. > > I'd rather work on adding more checkers. > > Fair enough. > > Hope this above is constructive. > > Dave > > > > > Regards. > > > > Mir Immad > > > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad wrote: > > > > > Hi Sir, > > > > > > I've been trying to understand the static analyzer's code. I spent > > > most of > > > my time learning the state machine's API. I learned how state > > > machine's > > > on_stmt is supposed to "recognize" specific functions and takes a > > > specific > > > tree from one state to another, and how the captured states are used > > > by > > > pending_diagnostics to report the errors. Furthermore, I was able to > > > create > > > a dummy checker that mimicked the behaviour of sm-file's > > > double_fclose and > > > compile GCC with these changes. Is this the right way of learning? > > > > > > As you've mentioned on the projects page that you would like to add > > > more > > > support for some POSIX APIs. Can you please write (or refer me to a) > > > a > > > simple C program that uses such an API (and also what the analyzer > > > should > > > have done) so that I can attempt to add such a checker to the > > > analyzer. > > > > > > Also, I didn't realize the complexity of adding SARIF when I > > > mentioned it. > > > I'd rather work on adding more checkers. > > > > > > Regards. > > > Mir Immad > > > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm > > > wrote: > > > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote: > > > > > HI David, > > > > > I've been tinkering with the static analyzer for the last few > > > > > days. I > > > > > find > > > > > the project of adding SARIF output to the analyzer intresting. > > > > > I'm > > > > > writing > > > > > this to let you know that I'm trying to learn the codebase. > > > > > Thank you. > > > > > > > > Excellent. > > > > > > > > BTW, I think adding SARIF output would involve working more with > > > > GCC's > > > > diagnostics subsystem than with the static analyzer, since (in > > > > theory) > > > > all of the static analyzer's output is passing through the > > > > diagnostics > > > > subsystem - though the static analyzer is probably the only GCC > > > > component generating diagnostic paths. > > > > > > > > I'm happy to mentor such a project as I maintain both subsystems > > > > and > > > > SARIF output would benefit both - but it would be rather tangential > > > > to > > > > the analyzer - so if you had specifically wanted to be working on > > > > the > > > > guts of the analyzer itself, you may want to pick a different > > > > subproject. > > > > > > > > The SARIF standard is rather long and complicated, and we would > > > > want to > > > > be compatible with clang's implementation. > > > > > > > > It would be very cool if gcc could also accept SARIF files as an > > > > *input* format, and emit them as diagnostics; that might help with > > > > debugging SARIF output. (I have a old patch for adding JSON > > > > parsing > > > > support to GCC that could be used as a starting point for this). > > > > > > > > Hope the above makes sense > > > > Dave > > > > > > > > > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm < > > > > > dmalcolm@redhat.com> > > > > > wrote: > > > > > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote: > > > > > > > Hi everyone, > > > > > > > > > > > > Hi, and welcome. > > > > > > > > > > > > > I intend to work on the static analyzer. Are these documents > > > > > > > enough to > > > > > > > get > > > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and > > > > > > > > > > > > > > > > > > > > > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals > > > > > > > > > > > > Yes. > > > > > > > > > > > > There are also some high-level notes here: > > > > > > https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer > > > > > > > > > > > > Also, given that the analyzer is part of GCC, the more general > > > > > > introductions to hacking on GCC will be useful. > > > > > > > > > > > > I recommend creating a trivial C source file with a bug in it > > > > > > (e.g. > > > > > > a > > > > > > 3-line function with a use-after-free), and stepping through > > > > > > the > > > > > > analyzer to get a sense of how it works. > > > > > > > > > > > > Hope this is helpful; don't hesitate to ask questions. > > > > > > Dave > > > > > > > > > > > > > > > > > > > > > > > > > > > --000000000000e8c1cf05d6b9b3ac Content-Type: text/plain; charset="UTF-8"; name="gcc-output.txt" Content-Disposition: attachment; filename="gcc-output.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_kyzy3bhy0 JCBjYXQgZG91YmxlLWNsb3NlLmMKCiNpbmNsdWRlIDxmY250bC5oPgojaW5jbHVkZSA8dW5pc3Rk Lmg+CnZvaWQgdGVzdCgpCnsKICBpbnQgZmQgPSBvcGVuKCJ0ZXN0LnR4dCIsIE9fUkRPTkxZKTsK ICBjbG9zZShmZCk7CiAgY2xvc2UoZmQpOwp9CgppbnQgbWFpbigpIHt9CgokIGdjYy0xMS4yLjAg ZG91YmxlLWNsb3NlLmMgLWZhbmFseXplcgoKZG91YmxlLWNsb3NlLmM6IEluIGZ1bmN0aW9uIOKA mHRlc3TigJk6CmRvdWJsZS1jbG9zZS5jOjc6Mzogd2FybmluZzogZG91YmxlIGNsb3NlIG9uIGZk IFstV2FuYWx5emVyLWRvdWJsZS1mY2xvc2VdCiAgICA3IHwgICBjbG9zZShmZCk7CiAgICAgIHwg ICBefn5+fn5+fn4KICDigJh0ZXN04oCZOiBldmVudHMgMS0zCiAgICB8CiAgICB8ICAgIDUgfCAg IGludCBmZCA9IG9wZW4oInRlc3QudHh0IiwgT19SRE9OTFkpOwogICAgfCAgICAgIHwgICAgICAg ICAgICBefn5+fn5+fn5+fn5+fn5+fn5+fn5+fn5+fgogICAgfCAgICAgIHwgICAgICAgICAgICB8 CiAgICB8ICAgICAgfCAgICAgICAgICAgICgxKSBvcGVuZWQgaGVyZQogICAgfCAgICA2IHwgICBj bG9zZShmZCk7CiAgICB8ICAgICAgfCAgIH5+fn5+fn5+fiAKICAgIHwgICAgICB8ICAgfAogICAg fCAgICAgIHwgICAoMikgZmlyc3Qg4oCYY2xvc2XigJkgaGVyZQogICAgfCAgICA3IHwgICBjbG9z ZShmZCk7CiAgICB8ICAgICAgfCAgIH5+fn5+fn5+fiAKICAgIHwgICAgICB8ICAgfAogICAgfCAg ICAgIHwgICAoMykgc2Vjb25kIOKAmGNsb3Nl4oCZIHdhcyBoZXJlOyBmaXJzdCDigJhjbG9zZeKA mSB3YXMgYXQgKDIpCiAgICB8CgokIGNhdCBkdXAuYwoKI2luY2x1ZGUgPGZjbnRsLmg+CiNpbmNs dWRlIDx1bmlzdGQuaD4KCnZvaWQgdGVzdCgpIHsKCWludCBmZCA9IG9wZW4oInRlc3QudHh0Iiwg T19SRE9OTFkpOwoJY2xvc2UoZmQpOwoJaW50IGZkMiA9IGR1cChmZCk7Cn0KaW50IG1haW4oKXt9 CgokIGdjYy0xMS4yLjAgZHVwLmMgLWZhbmFseXplcgoKZHVwLmM6IEluIGZ1bmN0aW9uIOKAmHRl c3TigJk6CmR1cC5jOjc6MTk6IHdhcm5pbmc6IGR1cCBvbiBjbG9zZWQgZmQgWy1XYW5hbHl6ZXIt ZG91YmxlLWZjbG9zZV0KICAgIDcgfCAgICAgICAgIGludCBmZDIgPSBkdXAoZmQpOwogICAgICB8 ICAgICAgICAgICAgICAgICAgIF5+fn5+fn4KICDigJh0ZXN04oCZOiBldmVudHMgMS0zCiAgICB8 CiAgICB8ICAgIDUgfCAgICAgICAgIGludCBmZCA9IG9wZW4oInRlc3QudHh0IiwgT19SRE9OTFkp OwogICAgfCAgICAgIHwgICAgICAgICAgICAgICAgICBefn5+fn5+fn5+fn5+fn5+fn5+fn5+fn5+ fgogICAgfCAgICAgIHwgICAgICAgICAgICAgICAgICB8CiAgICB8ICAgICAgfCAgICAgICAgICAg ICAgICAgICgxKSBvcGVuZWQgaGVyZQogICAgfCAgICA2IHwgICAgICAgICBjbG9zZShmZCk7CiAg ICB8ICAgICAgfCAgICAgICAgIH5+fn5+fn5+fiAKICAgIHwgICAgICB8ICAgICAgICAgfAogICAg fCAgICAgIHwgICAgICAgICAoMikgZmlyc3Qg4oCYY2xvc2XigJkgaGVyZQogICAgfCAgICA3IHwg ICAgICAgICBpbnQgZmQyID0gZHVwKGZkKTsKICAgIHwgICAgICB8ICAgICAgICAgICAgICAgICAg IH5+fn5+fn4KICAgIHwgICAgICB8ICAgICAgICAgICAgICAgICAgIHwKICAgIHwgICAgICB8ICAg ICAgICAgICAgICAgICAgICgzKSBkdXAgb24gY2xvc2VkIGZpbGUtZGVzY3JpcHRvciDigJhmZOKA mSB3YXMgY2FsbGVkIGhlcmUgd2hpY2ggd2FzIGNsb3NlZCBhdCAoMikKICAgIHwKCiQgY2F0IHJl YWQtd3JpdGUuYwoKI2luY2x1ZGUgPGZjbnRsLmg+DQojaW5jbHVkZSA8dW5pc3RkLmg+DQojaW5j bHVkZSA8c3RkbGliLmg+DQoNCnZvaWQgdGVzdCAoKQ0Kew0KCWludCBmZCA9IG9wZW4gKCJ0ZXh0 IiwgT19SRE9OTFkpOw0KCWNsb3NlKGZkKTsNCgl3cml0ZShmZCwgImhlbGxvIiwgNSk7IC8vd3Jp dGUgaW4gY2xvc2VkIGZkDQoJY2hhciAqIHMgPSAoY2hhciAqKSBtYWxsb2MoMik7DQoJcmVhZChm ZCwgcywgMik7IC8vcmVhZCBvbiBjbG9zZWQgZmQNCglmcmVlKHMpOw0KfQ0KDQp2b2lkIG1haW4o KXt9DQoKJCBnY2MtMTEuMi4wIHJlYWQtd3JpdGUuYyAtZmFuYWx5emVyCgpyZWFkLXdyaXRlLmM6 IEluIGZ1bmN0aW9uIOKAmHRlc3TigJk6CnJlYWQtd3JpdGUuYzo5Ojk6IHdhcm5pbmc6IHdyaXRl IG9uIGNsb3NlZCBmZCBbLVdhbmFseXplci1kb3VibGUtZmNsb3NlXQogICAgOSB8ICAgICAgICAg d3JpdGUoZmQsICJoZWxsbyIsIDUpOyAvL3dyaXRlIGluIGNsb3NlZCBmZAogICAgICB8ICAgICAg ICAgXn5+fn5+fn5+fn5+fn5+fn5+fn5+CiAg4oCYdGVzdOKAmTogZXZlbnRzIDEtMwogICAgfAog ICAgfCAgICA3IHwgICAgICAgICBpbnQgZmQgPSBvcGVuICgidGV4dCIsIE9fUkRPTkxZKTsKICAg IHwgICAgICB8ICAgICAgICAgICAgICAgICAgXn5+fn5+fn5+fn5+fn5+fn5+fn5+fn4KICAgIHwg ICAgICB8ICAgICAgICAgICAgICAgICAgfAogICAgfCAgICAgIHwgICAgICAgICAgICAgICAgICAo MSkgb3BlbmVkIGhlcmUKICAgIHwgICAgOCB8ICAgICAgICAgY2xvc2UoZmQpOwogICAgfCAgICAg IHwgICAgICAgICB+fn5+fn5+fn4gCiAgICB8ICAgICAgfCAgICAgICAgIHwKICAgIHwgICAgICB8 ICAgICAgICAgKDIpIGZpcnN0IOKAmGNsb3Nl4oCZIGhlcmUKICAgIHwgICAgOSB8ICAgICAgICAg d3JpdGUoZmQsICJoZWxsbyIsIDUpOyAvL3dyaXRlIGluIGNsb3NlZCBmZAogICAgfCAgICAgIHwg ICAgICAgICB+fn5+fn5+fn5+fn5+fn5+fn5+fn4KICAgIHwgICAgICB8ICAgICAgICAgfAogICAg fCAgICAgIHwgICAgICAgICAoMykgd3JpdGUgb24gY2xvc2VkIGZpbGUtZGVzY3JpcHRvciDigJhm ZOKAmSB3YXMgY2FsbGVkIGhlcmUgd2hpY2ggd2FzIGNsb3NlZCBhdCAoMikKICAgIHwKcmVhZC13 cml0ZS5jOjExOjk6IHdhcm5pbmc6IHJlYWQgb24gY2xvc2VkIGZkIFstV2FuYWx5emVyLWRvdWJs ZS1mY2xvc2VdCiAgIDExIHwgICAgICAgICByZWFkKGZkLCBzLCAyKTsgLy9yZWFkIG9uIGNsb3Nl ZCBmZAogICAgICB8ICAgICAgICAgXn5+fn5+fn5+fn5+fn4KICDigJh0ZXN04oCZOiBldmVudHMg MS0zCiAgICB8CiAgICB8ICAgIDcgfCAgICAgICAgIGludCBmZCA9IG9wZW4gKCJ0ZXh0IiwgT19S RE9OTFkpOwogICAgfCAgICAgIHwgICAgICAgICAgICAgICAgICBefn5+fn5+fn5+fn5+fn5+fn5+ fn5+fgogICAgfCAgICAgIHwgICAgICAgICAgICAgICAgICB8CiAgICB8ICAgICAgfCAgICAgICAg ICAgICAgICAgICgxKSBvcGVuZWQgaGVyZQogICAgfCAgICA4IHwgICAgICAgICBjbG9zZShmZCk7 CiAgICB8ICAgICAgfCAgICAgICAgIH5+fn5+fn5+fiAKICAgIHwgICAgICB8ICAgICAgICAgfAog ICAgfCAgICAgIHwgICAgICAgICAoMikgZmlyc3Qg4oCYY2xvc2XigJkgaGVyZQogICAgfC4uLi4u LgogICAgfCAgIDExIHwgICAgICAgICByZWFkKGZkLCBzLCAyKTsgLy9yZWFkIG9uIGNsb3NlZCBm ZAogICAgfCAgICAgIHwgICAgICAgICB+fn5+fn5+fn5+fn5+fgogICAgfCAgICAgIHwgICAgICAg ICB8CiAgICB8ICAgICAgfCAgICAgICAgICgzKSByZWFkIG9uIGNsb3NlZCBmaWxlLWRlc2NyaXB0 b3Ig4oCYZmTigJkgd2FzIGNhbGxlZCBoZXJlIHdoaWNoIHdhcyBjbG9zZWQgYXQgKDIpCiAgICB8 Cg== --000000000000e8c1cf05d6b9b3ac--