From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id E6BFA3858C33 for ; Sun, 2 Apr 2023 22:54:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E6BFA3858C33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680476043; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eSySrZscAeCazdtAebE4x7UhnrSP+lmpcy7gvRz3FSM=; b=P2EAInf75MuiLFi5yBceXz8yfDHBkt7/9lluVleo40hR2Dotz8YRw2FyxGdBqKL14o6bN7 +m1I3gyEGX7X2xFBZyhIS76IKGrtGTa6SRMCUW4XLu8vaTaMzeJjTLnZ7xOYTlBXZiciUT VgfZYu9pzOVtoZXltO9nV49jug0ekls= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-621-Tfzs9mSDOAmiwb4R6ejJRQ-1; Sun, 02 Apr 2023 18:54:02 -0400 X-MC-Unique: Tfzs9mSDOAmiwb4R6ejJRQ-1 Received: by mail-qt1-f200.google.com with SMTP id h6-20020ac85846000000b003e3c23d562aso18660739qth.1 for ; Sun, 02 Apr 2023 15:54:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680476041; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=eSySrZscAeCazdtAebE4x7UhnrSP+lmpcy7gvRz3FSM=; b=RTMsQDuRemXGLCald7fJ+QMlc7xq84Moz6FqZkD5Mdm75yHD2IjzXd2iaJY+QXC88t V6jdzAG6ZPpi2IktvA3PoWdOuilvfnmyxP7GqvdoPI8YPVL1JFzpU+gaJYKCR28+Nc0C K3D8sE3Uom7sKyvicgnB+cRvecUhGdQ+AMAprMaang6Vrjy6tgEuYjvmwwy+M7DnYqT6 Qv76eLTs5d8sCJ9CqhjVdUAaMs8C1x71booD6i61v7NpryeeO0QclTIfcvpZ7tFGWCTl xSS/vA7Yov5Kcx51+J1gWH4e4+K/3aCDRhtGthlBFfw7Urnj9krrPQGYar2Ps86GYtD9 1Bag== X-Gm-Message-State: AAQBX9c5tiXtEXDkFeyuDZOFZwLrwVNzXGcIjlLQW/OvtnehzFHyOoi3 uEP9W1QadI0bs0UOqAdA1D+uOGYR6xyO82iLGTD1fdVbfhWxGzxJD/GtsmFDXVS0iGEeb0jYJKi /E7f2DNA= X-Received: by 2002:a05:622a:2:b0:3e4:dfb6:29a8 with SMTP id x2-20020a05622a000200b003e4dfb629a8mr48311814qtw.44.1680476041304; Sun, 02 Apr 2023 15:54:01 -0700 (PDT) X-Google-Smtp-Source: AKy350amPkR6tsMdM0y9Lpt3vNG/2RjqW1/XgwUYH5yp9rB2ru2WpYCeo2Gigxgt0rc9kLmEfxRlKQ== X-Received: by 2002:a05:622a:2:b0:3e4:dfb6:29a8 with SMTP id x2-20020a05622a000200b003e4dfb629a8mr48311800qtw.44.1680476040972; Sun, 02 Apr 2023 15:54:00 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.nh.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id s9-20020ac87589000000b003e4f1b3ce43sm2146259qtq.50.2023.04.02.15.54.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Apr 2023 15:54:00 -0700 (PDT) Message-ID: <8c9a526ae6ce6868d21da887ee5cecd5c5fa28ac.camel@redhat.com> Subject: Re: [GSoC][Static Analyzer] First proposal draft and a few more questions/requests From: David Malcolm To: Shengyu Huang Cc: GCC Development Date: Sun, 02 Apr 2023 18:53:59 -0400 In-Reply-To: <7B619AA7-C42D-4486-8017-72BB54297005@gmail.com> References: <960EE623-1B17-4321-B77E-FBCD9496BE1F@gmail.com> <40fbb064f56845908f797400e5d9443b6cf97fe4.camel@redhat.com> <0e6a972dac60ad290d21a82b428cc76c4e8565e9.camel@redhat.com> <4CBE37A2-7D50-4ECC-9B70-951AB7176D9B@gmail.com> <3dfad33dec50c9f8bfb13e42a29cfb41b6aab457.camel@redhat.com> <2344350B-6AD2-46A5-A335-BD3ECBBAA4DF@gmail.com> <0BED8B95-9F1A-4350-A63C-616D31E405C3@gmail.com> <0DEEAED6-46F3-4339-B600-AD35C92929B9@gmail.com> <7B619AA7-C42D-4486-8017-72BB54297005@gmail.com> User-Agent: Evolution 3.44.4 (3.44.4-1.fc36) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sat, 2023-04-01 at 16:19 +0200, Shengyu Huang wrote: > Hi Dave, >=20 > > >=20 > > > I has looked into compiling those files with the patch some time > > > ago; > > > looking at my notes, one issue was with this on-stack buffer: > > > =C2=A0=C2=A0 char extra[1024]; > > > declared outside the loop.=C2=A0 Inside the loop, it gets modified in > > > various ways: > > > =C2=A0=C2=A0 extra[0] =3D '\0'; > > > and > > > =C2=A0=C2=A0 if (fread(extra, 1, extsize, fpZip) =3D=3D extsize) { > > > where the latter means "extra" becomes tainted. > > >=20 > > > However "extra" is barely used, and is effectively reset each > > > time > > > through the loop - but the analyzer doesn't figure that out.=C2=A0 So > > > the > > > loop analysis explodes, as it tries to keep track of the > > > possibility > > > that "extra" is still tainted from previous iteration(s), despite > > > the > > > fact that it's going to be clobbered before it ever gets used. > > >=20 > > > So one fix might be to extend the state-purging code so that it > > > somehow > > > "sees" that "extra" gets clobbered before it gets used, and thus > > > we can > > > purge the tainted state from it. > >=20 > > Thanks for your notes. I think we may be talking about the same > > thing? If you look at the updated proposal (I have changed it quite > > a lot since I first sent it out), you=E2=80=99ll see there is one relev= ant > > paper for state merging (although it is slightly different from > > state purging, I think the goal and general methodology is > > similar): https://dslab.epfl.ch/pubs/stateMerging.pdf=C2=A0 > >=20 > > I was trying to say if some similar situation happened for other > > types of checkers, I expected state explosion would also happen. I > > tried to construct a similar example (with the same kind of reset > > and nested conditionals + a loop) but for double-free, so far no > > success yet. I=E2=80=99ll pick it up afterwards, at latest by next > > Saturday, because I need to prepare for a coming midterm on Friday. > > I will also put this test case to the proposal because it seems > > like a very good starting point for the project. >=20 > As promised, below is a small example that causes state explosion > without taint state machine involved. >=20 > void test() > { > =C2=A0 void *p; > =C2=A0 int a; > =C2=A0 scanf(=E2=80=9C%d", &a); > =C2=A0=20 > =C2=A0 while (a > 0) > =C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0 p =3D malloc (1024); > =C2=A0=C2=A0=C2=A0=C2=A0 if (a > 1) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 free(p); > =C2=A0=C2=A0=C2=A0=C2=A0 a--; > =C2=A0=C2=A0 } > =C2=A0=C2=A0=20 > =C2=A0=C2=A0 if (a >0) > =C2=A0=C2=A0=C2=A0=C2=A0 free(p); > } >=20 > This example not only causes state explosion, but also reports false > positive of double-free. (nods) Yeah, our handling of loops isn't great. There's plenty of opportunity within a GSoC project for tackling that. >=20 > By the way, do you have any feedback regarding my proposal > (https://docs.google.com/document/d/1MRI1R5DaX8kM6DaqRQsEri5Mx2FvHmWv > 13qe1W0Bj0g=C2=A0< > https://docs.google.com/document/d/1MRI1R5DaX8kM6DaqRQsEri5Mx2FvHmWv1 > 3qe1W0Bj0g/edit>)? I am happy to allocate more time polishing the > proposal if you find anything off there. If you prefer me sending it > via email again (for ease of reference in the future maybe?), I am > happy to do so as well. Thanks for the proposal.=C2=A0 Overall, it looks great. Some notes: - maybe specify the *GCC* static analyzer you first mention it - you talk about "timeout" warnings. The analyzer already can emit a "timeout" warning of sorts, via -Wanalyzer-too-complex, though this is based on the complexity of the exploded graph (e.g. # of nodes), rather than actual timings. Is the latter the kind of thing you had in mind, or where you thinking about ways of making the "too complex" heuristics smarter? (I confess that you seem much more familiar with the theory of this than I am!) - the numbering of your references seems to have gotten out-of-sync; I see references to [3] as a paper "Schwartz et al", but that's a link to one of my blog posts. - do you a link to a github account, or somewhere else that demonstrates code you've written? In particular, how is your C++ ? Note that the deadline for submitting proposals to the official GSoC website is April 4 - 18:00 UTC (i.e. this coming Tuesday) and that Google are very strict about that deadline; see: https://developers.google.com/open-source/gsoc/timeline Good luck Dave